Audio Isolation for Testing Audio

Related posts:

As part of testing is putting elements in isolation, this approach can be used in the previous posts I made about testing WebRTC & RTMFP (  For example, if you want to know the audio quality difference of audio reaching an end point, through the internet to another machine’s browser – you could play the audio in isolation, have it echo back, and record the audio coming back in isolation.  Then using some utilities, compare the audio recorded vs. the original audio file.

By setting up two virtual lines, we can isolate audio going in to the test as well as audio coming back.  This allows us to use one machine for both the sending and receiving of audio.

Applications of the Framework

  • Audio Quality Scoring
  • Audio Delay Analysis

Example of How It Works

  • Your test sends audio to a server of some sort, through a virtual line in
  • The server echo’s back your audio, which goes to the other virtual line
  • You record the audio coming back via the 2nd virtual line
  • Using PESQ, you compare the two audio files (the one you sent and the one you got back) for quality difference

Set Up of Audio Isolation

In my situation I was writing code that was using Linux tools (SOX), so I built this test framework to run on OSX.  That said, I used a OSX opensource tool called Soundflower. If you are not using OSX, then you’ll need to find a tool that creates at least 2 virtual sound lines.  If you are using OSX you can follow the SoundFlower instructions below…

Soundflower is a musician’s tool that lets audio be pipped through different virtual lines.  By default you have two virtual lines:

Soundflower (2 Channel) & Soundflower (64 Channel)

  • Download SoundFlower:
  • Install SoundFlower
  • Restart your machine
  • After restart you will see a Soundflower icon in the top bar, you can click that (Or if the icon doesn’t load, you can open Applications / Utilities / Audio MIDI Setup
  • In this device option, you’ll see the soundflower channels listed.  What I prefer to do is create handles for these channels.  That way I can reference then by a unique name rather then “Soundflower (2ch)”
    • To do this, click the + sign at the bottom left of the Audio MIDI window.
    • Then click Create Aggregate Device.  A new Aggregate Device will be listed.
    • Rename it (by clicking on the title) to Presenter.  This will be the audio channel we will use to send audio in
    • Click the + sign again and create another Aggregate Device
    • Name the second one Listener.  This will be the where the audio coming back will be isolated
  • Now that we have the two Aggregate Devices made, click on Presenter.
  • On the Right side of the screen there’s the audio choices avail. to your computer. Check Soundflower (64ch)
  • Click on Listener and check Soundflower (2ch)
  • Finally, Right-Click on Listener and choose “Use this device for Sound Output.”

If you use this computer for other tasks, you can re-enable audio through your speakers via the System Preferences / Sound / Output tab.

Just remember that before you run any tests, you’ll want to change the Sound Output to the Listener channel.

Browser Audio Testing Setup

I have numerous posts on how to test audio and get a quality score.  I won’t repeat what’s there, but will point out that

  • with this new audio isolation you will have your audio recording device set to listen to the Listener channel
  • your audio you will play through the browser will be playing to the Presenter channel
  • your browser needs to pre-configured to use the right virtual mic (i.e. Presenter.)

To pre-configure your browser, the easiest way I know, is to do the following:


For RTMFP, go to your Flash Phoner interface.  Either Right-Click and choose Audio Settings, or wait for Flash to prompt you for the audio security settings (this prompt may not occur till you place a call through the browser.)

The Flash Security prompt will come in one of two varieties.  In the first it will show a popup with some tabs. One tab is a microphone.  Click the microphone and then change the mic, to be “Presenter.”  Sometimes Flash first prompts you to “allow” audio to be heard. In that case, click “allow” and then change the virtual mic.

If you will always use this browser for this type of tests, and do not need to change the virtual mic often, then check off to save these settings as default.  If you prefer to set these at test time, you’ll need a tool like Sikuli that can mitigate the Flash prompts – or, a developer to set a function call that will set the mic.

For WebRTC

For WebRTC, you would do the same thing… open Chrome, go to the page interface that makes the call. Start the call, and notice in right side of the URL bar, is a icon for what looks like a video camera.  Click that.  You will get an interface that asks you what mic you want to use, and choose Presenter.

Again, if you are using this browser the same way each time, you can simply choose Presenter and Chrome should save this as a default.  If you want to make this choice at test time, you’ll need a tool like Sikuli to select that and make the choice for you.

Playing the Audio in Issolation

You can play the audio through to your server, once the call has connected in browser.  I use Sox to do this and in code execute a statement like:

sox [your original.wav file] -t coreaudio Presenter

For example, to execute that in Groovy, it’s: def playAudio = “sox my_orig.wav -t coreaudio Presenter”.execute()

Coreaudio is a mac specific audio board reference.  If you are not using a mac, you’ll need to find out what available audio options you have.

This will play the audio through the presenter.  If you want to check this, you can use a audio recording device (like Piezo) that can listen/record off all your audio channels available.  Point it to the Presenter channel and run that command. You should see the audio needle start moving. You can record it off and listen to it later to verify you get the right audio.

Recording the Audio Coming Back

Since the goal here is to see how bad the audio degrades coming back through the system, we want to capture the audio coming back in isolation.  So we will use a recording device set to the audio channel: Listener.  I start the record process before I run the play method.  This insures I get all of the audio. I will also have dead air on the front of the audio though… which can be cleaned up with some post processing.

To clean up the dead air, you would run a command like this:

sox webrtc_temp.wav webrtc_01.wav silence 1 0.1 0.1% reverse silence 1 0.1 0.1% reverse

That tells sox to convert the wav file into *_01, and remove all the silence off the front and back.

Analysis of the Audio

As mentioned in other posts, you can use PESQ to get a score. To install PESQ, you can follow my guidelines at: (just go to the PESQ section.)

You will need to grab the PESQ code, and compile it… then call it with

PESQ +[your bit rate, either 8000 or 16000] [original file played]

PESQ requires both the original and recorded audio to be the same bit rate… and PESQ only accepts bitrates of 8000 or 16000.

What does a PESQ score mean?

I get asked this question from time to time.  It’s a test of the difference in the audio itself.  It outputs a score that runs from 0 to 5.  Although in practicality I found that the highest I could ever get is 4.5 (comparing an audio file to itself.)

What I tell people is that PESQ scores should be thought of as relative and not absolute.  So I recommend that you copy each recording you make for future reference.  In the filename of the recording, put the pesq score and relevant info about it… like rtmfp_2.563_12022014.wav.  Listen to the audio and compare it.  Build your own SLA based on what you determine to be the quality fail points.  You might hear audio drop off  in spots at 2.5, or hear garbled audio at 2.3  … it may sound “ok” at 3.0 but really good at 3.8 and above.  It requires some team effort to agree upon what values you wish to fail a test for.

Test Methodology

I’ve created multiple frameworks to test this flow.  I have a slim test and a fat test.  The fat test, makes use of two browsers and uses your front end. It requires some switching back and forth between browsers.  The slim test, uses one browser, which sends audio to a server that echo’s the audio back.  The slim test isn’t real world in the sense of making use of two browsers or the front end.  But it’s useful to get faster scores and more tests in a given time period.  I’ve included a high level walk through of each of these, below:

Slim Methodology (One Browser)

  • Your automation kicks off one browser
  • This browser is pre-configured as using the Presenter Line for it’s audio mic
  • The browser goes to your web interface to make the RTMFP or WebRTC call
  • We start the recording on the Listener line, which should be dead air, until audio comes back from the server
  • When the call is connected, you play audio through the browser, by making the sox call to play audio via that audio channel (Presenter) that the browser is using for a mic
  • The audio is going to a server of yours.  You need your server/technology to echo back the audio coming in.
  • When the echo’d audio comes back, since your sound output (which is where it goes to) is set to the Listener channel, our recorder will pick it up and save it out.
  • After the recording is made, we run PESQ to compare the Original Audio vs. the Audio Recorded

What we are validating is the audio quality after going from machine 1, through the internet to server A, back to machine 1.

Fat Methodology (Two Browsers)

  • You need to have a handle on how to spawn multiple browsers and make use of them
  • One browser needs to be pre-configured as using the Presenter Line and the other as using the Listener Line
  • The browser for the presenter goes to the UI to make the call to the Listener browser
  • The Listener browser will accept the call
  • We start the recording
  • We play the audio through the Presenter line, which is the mic of the browser acting as Presenter
  • The audio reaches your other browser and is recorded on the Listener Line
  • Same as before, we compare the Original Audio vs the Recorded Audio.

In this scenario we are using two different browsers. The first has audio played through on the call, which probably goes to a server.  The server passes the audio to the recipient (Listener) browser and it’s recorded on that end.

Variations to Testing

You will want to have the ability to select different server(environments) as well as different codecs.  Your tests may produce great quality with a server on the LAN but be terrible with a server in your colo.  So having this ability to choose the environment is extremely useful.  Also, having the ability to choose the codec is useful for testing: a) different codecs and b) transcoding.

Transcoding is when you have one audio codec coming in, that is converted to another codec going out.  Like Speex to G711.  This process will be work on the server and may impact quality.

Codec examples:

  • PCMU is a lossless codec.  You will get your highest scores here on this one.
  • Speex is a common audio codec:
  • G.711 another common codec

What developers have given me to help me test these variations, is a UI with selectors (dropdowns, etc.) that let me choose the environment as well as the codec.


I like to go all out and produce a UI pulling in data from a database.  But you don’t have to.  You could simply write to a CSV file after each test is concluded.

You can capture data like, environment, codec, browser, rtmfp/webrtc, pesq score, date/time of test.

You might have data like:

3.576,firefox,myEnv,rtmfp,speex ,Tue May 13 17:57:37 PDT 2014
3.598,firefox,myEnv,rtmfp,speex ,Wed May 14 12:48:11 PDT 2014
3.524,firefox,myEnv,rtmfp,speex ,Wed May 14 12:50:10 PDT 2014
4.150,firefox,myEnv,rtmfp,pcmu ,Wed May 14 13:01:19 PDT 2014
4.085,firefox,myEnv,rtmfp,pcmu ,Wed May 14 13:03:17 PDT 2014
4.124,firefox,myEnv,rtmfp,pcmu ,Wed May 14 13:05:17 PDT 2014
4.041,firefox,myEnv,rtmfp,pcmu ,Wed May 14 13:07:16 PDT 2014

The above is real data from testing I’ve done.  The PCMU always will score higher because it is a lossless codec.

You can also save the a database.  In Grails you can use GORM to save out your results, which is pretty easy, but I wont’ go into it here… I have other posts on that.  Then you’re UI could simply make calls to a controller that has defined variables to the data results.  When presenting data to the UI, I find it useful to include a running average over the past hour, as well as the actual results.

Audio Delay

What if you want to know how long it takes an audio stream to reach it’s destination?

Audio delay can be a big problem.  I found a Youtube video on how to set sox up to listen interactively for audio coming in… The sox command is:

sox -t coreaudio Listener myfile_01.wav silence 1 0.1 5% 1 1.0 5%

That tells sox to listen to the coreaudio board’s channel “Listener” and record out a file called myfile_01.wav then to wait for audio above silence by 5%.  The other values tell sox to not wait to long. So really once it hears audio, it saves a file and stops.

In my test, I use Grails and Grails has the ability to run asynchronous actions… called tasks.  So I have some code like this:

audioDelay = task {
        def audioListen = “sox -t coreaudio Listener myfile_01.wav silence 1 0.1 5% 1 1.0 5%”.execute()
        def audioDelay.watiFor()
        if(audioDelay.exitValue() == 0){
              “touch web-app/audiodelay/audio_heard_at_${timeNow}”.execute()

When the method holding that task above is called, it sends that task to a background job… it just waits for audio. If Sox gets audio, it will exit with code 0. If the exit code is good, we touch a file on the file system with the current time.  Now we know when the audio was heard.

When we then call the play audio method, I touch a file as well, right before I play the file… like:

“touch web-app/audiodelay/audio_played_at_${timeNow}”.execute()

Now we get two files on the file system with two different time stamps.  The difference of which is the time it took for the audio to get picked up after playing it and it echo’ing back from the server.


Audio Isolation for Testing Audio
User Rating: 0 (0 votes)