I was following the blog over at the Google teams blog, specifically this post:


I had a need of automating WEBRTC where I work, along with RTMFP (An Adobe Flash audio solution.)  What was desired was a automation framework that would create a baseline score from which we could continuously validate the audio quality.  Anything below that score fails the test, anything above it is a pass.  As work is done on soft PBX’s (like Asterix/FreeSwitch) or on the web application serving WebRTC, these automated tests could be rerun to ensure audio quality standards are not dropping.

The post from the Google team is awesome because they display their workflow to test audio quality.  They get a metric score and can use that metric as a SLA.  I followed their methodology, but with my own implementation.

My Implementation

In the linked article above, the Google team used C, Windows & PESQ.  They made use of Web Audio’s api to send the audio stream (thereby bypassing the onboard mic and speakers.)

I went a different route.  Since I don’t have a background in C, I used Groovy, Sikuli, Piezo, Sox and PESQ.

I didn’t want to develop on Windows so I used Piezo to handle the virtual recording (the Google team used the Windows audio recorder, set to virtual mic.)

Here’s a video demonstrating the automation flow and the final PESQ score:


While Selenium/Webdriver is great at managing the Web API, it can’t help me for launching applications.  It’s also nearly impossible to test Flash Phoner (RTMFP) with webdriver, as you have to access Flash security warnings/buttons.  Since my tests would also include testing RTMFP, I needed a solution that was all encompassing.  Sikuli would be my choice – but run from within a Groovy test harness.


Sikuli has the ability to be called from Java based frameworks – like Groovy/Grails/Scala/Java.  I imported the Sikuli Jar, and created Groovy scripts that called Sikuli to automate applications.  I used this methodology as I know Groovy pretty well, and could also update this in the future to become a full Grails app, giving a nice UI driven test harness.


Groovy was the core of the test – as it handled all the logic.  What to click, when to record, how to wait for some trigger, etc.  This Groovy script could be modified to reside in a Grails application in the future, to offer a web app / ui to trigger tests from.


This OSX app is designed to record audio streams from specific applications.  In my case, I had it set to use the Chrome Browser. This means, that the on board mic is bypassed and the audio recorded is straight from the browser.

Originally I attempted to do these tests with the onboard speakers and mic, but the results were so varied, a Person sneezing in the hallway, or a plane flying overhead would change the score.  Without a sound proof room, I couldn’t really run tests like that.

The Google team’s blog, used WebAudio to stream directly into the browser, and the Virtual Mic of the Windows sound recorder to bypass the onboard mic.  I used the same methodology here, but on the Mac, I used Piezo for the virtual mic.


Piezo drops mp3’s, so I had to convert them to a common file format that the PESQ utility could understand. PESQ prefers wav files at 8000 or 16000 bitrate.  Via Groovy, I execute a simple SOX command to convert the mp3 to the appropriate wav.  If you want to remove the dead air at the beginning or end of the audio, you can also use a sox command like:

def trimSilence = “sox webrtc_temp.wav webrtc_01.wav silence 1 0.1 0.1% reverse silence 1 0.1 0.1% reverse”.execute()


PESQ was used to make the comparison between the source WAV and the recorded WAV.  PESQ is a C application that requires some compiling.  Once compiled, and put in the test harness, it’s just a matter of Groovy executing it like

def pesq = “Software/source/pesq +8000 [source wav] [recorded wav]”.execute()

For information on compiling PESQ, check out:

https://sdet.us/rtmfp-audio-quality-automation/  (Under the PESQ section)


Instead of WebAudio, I opted to make the WEBRTC (or RTMFP) calls to a Freeswitch, which was set to play back a specific audio file.  The FreeSwitch engineer set it up to play an audio file when a specific number was dialed. His dialplan also waited 2 seconds, to make sure the audio stream wouldn’t get cut off.  Once he had the FreeSwitch configured, he gave me the information on how to dial it.


I built the test harness in Groovy, around helper classes.

This is what the WEBRTC test looks like:

def ch = new CallHelper()
def aa = new AudioAnalysis()

ch.piezoRecordCall(“[the URL for testing goes here]”)

CallHelper is where the Sikuli api is being invoked.  This drives opening applications, clicking on buttons that webdriver otherwise can’t access.  These are all clicks based upon Screen elements.

AudioAnalysis is where the methods related to PESQ score logic are located.

Call Helper

While I could use webdriver to open/close/operate the browser, I’m already required to use Sikuli to select elements and applications that webdriver can’t access… so webdriver is removed from this test harness.

The core of the WEBRTC test is in the piezoRecordCall() method, which looks like this:

def piezoRecordCall(url){
     def rmPreviousMp3 = “rm source/webrtc_01.mp3”.execute()
     Screen piezo = new Screen()
     piezo.with {

              type(“${url}” + Key.ENTER)
        * Now we click Call on WEBRTC
             wait(“images/webrtc_connected.png”,10) //wait for visual icon that user has logged in successfully
             click(“images/chrome_allow.png”)  //click security allowance to access mic 
             sleep(6000) //waiting for 6 seconds of audio to play through browser
      def mp3Convert = “sox -t mp3 webrtc_01.mp3 -t wav -r 8000 -c 1 webrtc_01.wav”.execute()
      def trimSilence = “sox webrtc_temp.wav webrtc_01.wav silence 1 0.1 0.1% reverse silence 1 0.1 0.1% reverse”.execute()
      println “Recording has finished, file has been converted to wav sampled at 8000 bitrate and 1 channel.”


Basically the method is removing previous test data, then clicking on the piezo app icon on the desktop. Sikuli is smart enough to find it anywhere on the desktop, as long as it’s not obstructed with a window.

The script continues to click and find elements necessary to get Piezo to record through Chrome (bypassing the onboard mic, and record via a virtual mic for that application/browser.)

Piezo launches Chrome, when the audio record button is clicked.  Then I send Chrome to the URL passed in to the method. We’ll record dead air, until we get the Chrome WEBRTC audio streamed in.

Now we are at a page that has an interface to call a server. When it makes the call, the audio from the server will be streamed via the WEBRTC protocol.  We make the call by entering a phone number and filling out a form.  This connects to a FreeSWITCH, which plays back audio via the WebRTC protocol.

The script clicks to hang up, stop the recording, and close Piezo and Chrome.

The audio recording is converted from MP3 (Piezo only records AAC or MP3) via a sox command:

def mp3Convert = “sox -t mp3 source/webrtc_01.mp3 -t wav -r 8000 -c 1 source/webrtc_01.wav”.execute()

Audio Analysis

The test passes two values into the Audio Analysis method, a source file name and a recorded file name.  So the test looks like:


The method looks like this:

def pesq(sourceFile, degradedFile){
      def runPesq = “source/pesq +8000 source/original_file.wav source/webrtc_01.wav”.execute()
      def pesq = /[0-9]\.[0-9][0-9][0-9]/
      def result = (runPesq.text =~ pesq).collect {it}
      println “PESQ_MOS Score: ${result[0]}”

That method basically runs PESQ and then uses some Groovy RegEx to search for the PESQ score in the output from the execution.  It prints the score to the console… and that’s it.

Going Further

We can go further with this test harness.  I added in some code to simulate network traffic, packet loss and latency.  This way, if you have a profile (i.e. traffic from India to our website has 200ms latency with 0.2% packet loss) you could add this automatically into the test.

Basically it’s making use of this latency project I worked on last year:


The article link above goes into the details necessary to build your own latency generator.  I used a linux VM, Netem and Squid to make a latency generator on a VM.  Squid was used to open a port up and Netem was used to generate the latency and packet loss.

Back in our test, we would make a call to turn on a latency profile on the VM, then run a method to get Chrome to run through the proxy.  Thereby slowing down the audio with packet loss and latency.

Alternatively, you could set up remote servers throughout the world, to stream audio to your WebRTC interface.  Without servers around the world, the latency generator is a good solution.

4 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *