Detect who is talking in the room


#1

Consider there are multiple speakers in the room. Is there a way to detect the active speaker(s) so we can for example show who is talking and bring his/her video up?


#2

Technically this could be possible since in the rtp packets header there’s an extension indicating the audio level of that specific audio packet. But the implementation server side is completely missing. It will be nice to include this feature in the roadmap.


#3

Hey,
I solved this by using the uncompressed erizo.js then added analyser code in the init function. I use the javascript audio api to get the current audio level. Also added a gain node their to mute clients very easy and smooth.

The code:

that.init = (gainNodeCallback) => { //Added callback to mute yourself

[...]

that.Connection.GetUserMedia(opt, (stream) => {
        // navigator.webkitGetUserMedia("audio, video", (stream) => {

        if(spec.audio) {  

              //INJECTED CODE <  Start

              var audioAontext = window.AudioContext || window.webkitAudioContext;  
              var context = new audioAontext();  
              var microphone = context.createMediaStreamSource(stream);  
              var dest = context.createMediaStreamDestination();  

              var gainNode = context.createGain();  
                
              var analyser = context.createAnalyser();  
              analyser.fftSize = 2048;  
              var bufferLength = analyser.frequencyBinCount;  
              var dataArray = new Uint8Array(bufferLength);  
              analyser.getByteTimeDomainData(dataArray);  

              var audioVolume = 0;  
              var oldAudioVolume = 0;  
              function calcVolume() {  
                requestAnimationFrame(calcVolume);  
                analyser.getByteTimeDomainData(dataArray);  
                  var mean = 0;  
                  for(var i=0;i<dataArray.length;i++) {  
                      mean += Math.abs(dataArray[i]-127);  
                  }  
                  mean /= dataArray.length;  
                  mean = Math.round(mean);  
                  if(mean < 2)   
                    audioVolume = 0;  
                  else if(mean < 5)  
                    audioVolume = 1;  
                  else  
                    audioVolume = 2;  

                  if(audioVolume != oldAudioVolume) {  
                    sendAudioVolume(audioVolume);  //Call the function with current audio level
                    oldAudioVolume = audioVolume;  
                  }  
              }  
              calcVolume();  
              microphone.connect(gainNode);  
              gainNode.connect(analyser); //get sound  
              analyser.connect(dest);  
              that.stream = dest.stream;
              if(gainNodeCallback) {
                gainNodeCallback(gainNode);
              }
          } else {  
            that.stream = stream;  
          }

          //INJECTED CODE < END

      __WEBPACK_IMPORTED_MODULE_4__utils_Logger__["a" /* default */].info('User has granted access to local media.');
      // that.stream = stream;
[...]
}

this will call the global function “sendAudioVolume” with 0 for silence, 1 for medium audio level, and 2 for high audio volume.
Not an easy solution but it works for me :wink:


#4

Interesting! I’ll give it a try. Thanks for sharing. :pray:


#5

we also implemented the “same” approach client side to detect speaking and sending to server events of speaking activity.
This is not optimal solution since is a client side based calculation, but this is enough until we finish implementing the server side one I described in the post before


#6

Of course a server side implementation is preferred. Looking forward for such a great feature :thumbsup: