in ,

RTMFP Audio Quality Automation

The goal here was to test WebRTC and Flash’s RTMFP technology, getting an indicator for audio quality on each run.  The test suite would have to incorporate latency into the tests, where the latency can be set up before the test is run, and torn down when the test is over.

Core Concepts

I followed the core concepts over at:

The test cycle flows like this:

  • There’s some UI a browser goes to
  • A click to call or UI is used to make a phone call (peer to peer or via Flash Phoner)
  • The Call is Answered
  • Media is played in the call
  • The call media is recorded
  • The recording is Analyzed using PESQ

I added on to their concepts with a latency engine I already set up for our web testing where I work.  It is a VM that has netem installed.  I remotely start different Netem bandwidth throttling and then use the VM as a proxy for the browser… thus all browser calls are affected by a latency model.

My Flow

Unlike the googletesters, I don’t have a solid background in C, so I opted to use Groovy.  The Groovy test harness would flow like this:

  • Groovy sets up the latency
  • Groovy makes a Sikuli call to launch a browser ( you could use selenium/webdriver here )
  • Groovy automates the Flash RTMFP component with Sikuli
  • Audio is prepared before hand for the call (this is the source audio by which we will compare)
  • The call itself is recorded by sox, via Groovy (i.e. groovy executing a sox command)
  • Finally PESQ is invoked to make a comparison between the recorded call and the source audio
  • A PESQ score is generated and used as the criteria for quality

Why Sikuli: Since Flash is required for RTMFP, I opted not to make use of Selenium WebDriver for browser automation.  In many cases FLEX elements would be removed from RTMFP forms (to reduce overhead) so you’ll have no API to hook into to place a call via an interface.  While this could be tweaked by a developer, I opted instead to just simply use Groovy and wrap it around Sikuli.

Sikuli allows for UI elements to be selected based on pixel recognition.  But it’s programmable aspects are somewhat lacking.  Since it’s a Java app, it’s suited for working with Groovy.  This way we can have complexity to the tests, and call Sikuli when needed.  If a browser under Selenium is desired, it can also integrate with that, and use Sikuli when the browser API isn’t accessible (like with Flash elements.)

For more info on Sikuli, feel free to check out my other site:

Sikuli Setup With Groovy/Grails/Java

To set up Sikuli to work with another language (like Groovy or JRuby) you need to install the Java jar file for that purpose.  You do this by downloading the Sikuli installer (it’s a jar file.)  When you run it, it will prompt you which of 6 or 7 installation options you want.  It’s a tad confusing.

If you pick options #3 and #4, it will create a lib folder and a java jar.  Here’s where it’s tricky… DO NOT MOVE THE JAR FILE OR LIB FOLDER.  I thought I could move this to my repo so that it’s all committing together to github… uh… it won’t work.

I don’t know if the developers are using straight links at the installation time or what, but you can’t do that.  If you want this to be part of the project you’re working on, put the Sikuli installer in your project root and then run it.  It will create the appropriate files and folders.

If you did install it elsewhere, just make sure your IDE references it correctly, DON’T MOVE THE FILES.

Basically, download the jar installer into your project and install it there.

Project Flow

With Groovy set up with Sikuli, lets begin.  I used the Sikuli IDE to capture the images for each click.  You may want to make another install of Sikuli for this purpose. I found that making screen captures on the Mac doesn’t work right, but if I make the element screen captures within Sikuli they work fine.

When you use the Sikuli IDE to save images, they save in the application folder.  Example: If your main test harness is /home/bwarner/Sites/groovy_webrtc  and you have a test project of Sikuli in /home/bwarner/Documents/test.sikuli (sikuli project), your files are actually stored in the test.sikuli (it’s sorta a directory, but you can’t get to it via the UI… you have to use the command line to get in that folder … i.e. cd test.sikuli  and you’ll see all the images you captured.

So in the Sikuli test project, find the Firefox icon, and click the “double click” action in the IDE.  It will ask you to grab a rectangular area for the icon you want to double click.  Repeat this to map out the flow you need to get to your destination URL.

If you find what I just asked you to do, offensive, no problem – use something like GEB to make a selenium webdriver call to your destination URL.  But you will need Sikuli to navigate the flash element, there’s not much a work around for that.

In your IDE, as you snap these images, you can click them and then in a window you can rename the image to something understandable. When you’re done with all the image mapping of the Sikuli flow, from the command line cp your images out of test.sikuli and into your project.  I recommend making a sub folder in your project called images, and storing all the Sikuli image data there.


In the Groovy WEBRTC Test project you have, just create a new class… we’ll call it ScreenHelper.  This will have methods used to find stuff on the screen.  Let’s make the first couple Groovy methods that will drive Sikuli:

def launchFirefox(){
Screen s = new Screen()

def firefoxURL(u){
   Screen url = new Screen()
      type(“${u}” + Key.ENTER)

Those are my examples, you will have named your images differently of course.

Let’s create a groovy file in our project root… call it RTMFP_call.groovy

def sh = new ScreenHelper()
sh.firefoxURL(“http://[your site with the flash phone]“)

If you were to run this, it would launch Sikuli double click your browser icon, click on the url field (or you could opt to send CMD + L) and enter the URL of your flash phone html page.

Flash Phone Script

Since Flash is difficult to automate with webdriver, we’re using Sikuli to click on image based recognition of the page.  You should have clicked on the flow of your flash phone, selecting the appropriate dropdowns you may have (Codec, server, etc.)

In the ScreenHelper Class, let’s make a new method called FlashPhoner(){}.  Inside the closure, we can add the code:

def flashPhone(){
  Screen fp = new Screen()
     wait(“images/flash.png”, 10)
     def flashPop = screen.exists(“images/flash.png”,0)
// Sometimes the click isn’t caught in flash. we double check if Flash
// is still expecting a click if so we click again
   if(flashPop != null){
                println “flowed into if clause”

That’s my particular script.  Yours will vary based on your needs. But one thing we both have in common is that we are using a Flash Phone. I highlighted a bit in bold that I find very useful.  To use a Flash Phone, you have to click on an Adobe popup.  It’s a real pain.  Even a real click via your mouse, isn’t always taken! Painful!  But usually a second click works.  This is just something we have to live with… my work around was to use some groovy conditional logic on the Flash Mic request:

I wait for the Flash microphone alert.  If it comes up, I find the allow image, and click it.  However, if flash pop (which is the flash security popup) is still visible ( != null) I tell it to retry by finding and clicking the Allow button again.


Audio Recording

I created a separate Groovy class for the audio recording. I called it CallHelper.

In that class I defined a method to use sox to record a call from the command line:

def recordCall(fileName){
    “sox -V6 -d ${fileName}”.execute()

Back in the Groovy test, I add:

def ch = new CallHelper()
and below I add the two new methods we made:

**UPDATE on Recording 3/21/2014**

Above, I used a solution of SOX – doing a command line recording.  The problem here is that it used the onboard speakers and mic.  This seemed like a good idea at the time, but it quickly became an obvious mistake.  Subtle sounds, such as the air conditioner turning on, the guy in the office next to me sneezing, someone talking in the hallway – these seeming inaudible sounds were picked up and altered the test score.  I had scores ranging from 0.9 to 2.8.  The test wasn’t very useful at all.  I would need a production sound room in order to run a test like this.


The Google team used a virtual mic solution – in Windows you can modify the audio recorder to use a virtual mic, excluding the on board.

My choice (since I’m developing on OSX) was to use a OSX app that did the same thing.  I used Piezo.  I like it because it allows me to set the application the virtual mic will use!  So I can tell it to record audio from Firefox… or Chrome… That way it won’t pick up any noise in the room during the test.


PESQ is downloaded from

After you download it, you need to compile it… Move the download to your project. In your downloads source folder do a:

gcc -o PESQ *.c  (instructions are at:

There’s your executable: PESQ.

We can call this via Groovy by making a method in a class (I use a new Class called AudioAnalysis):

def pesq(sourceFile, degradedFile){
    def runPesq = “Software/source/PESQ +8000 /[Your path to your Source Audio File]/${sourceFile} /[Your Path to your Degraded or Recorded Audio from the Phone call]/${degradedFile}”.execute()
    def pesq = /[0-9]\.[0-9][0-9][0-9]/
    def result = (runPesq.text =~ pesq).collect {it}
    println “PESQ_MOS Score: ${result[0]}”

What this little method is doing is executing the PESQ command in Software/source/PESQ (if your path is different, update it) and references two files in your project… the first is the source file and the second is the audio recorded on the call.

Afterwards we are defining a pattern for the PESQ Score and then making a variable that is the .collect {it}

So if we get a score of 2.98, that will be captured… you could then save it to the MySQL db, print to screen, email it, whatever you want.


Since a voice call via RTMFP or WEBRTC on a LAN is unrealistic, we should have a latency generator.  I did a write on building one over at:

Follow that, to set up a VM with Squid, your IPtables, and Netem.  I used Jenkins to have remote jobs be callable for running Netem config’s remotely.

Now you can set your browser to use that VM as your proxy in the browser settings (or if you used selenium/webdriver to create the browser instance, you can define this at startup.)

Back in your Groovy Test, create a class for Latency, and define a method like:

class Latency {
    enum LatencyAmount {
def createLatency(LatencyAmount latencyAmount){
    switch (latencyAmount){
    case LatencyAmount.TENTOEIGHTY:
    println “setting latency 10ms-80ms”
    def jenkinsCall = new URL(“http://[Your URL to yoru Jenkin’s job to do this Netem bandwidth]/build?  delay=0sec”).getText()
  case LatencyAmount.SEVENTYTOONESIXTY:
   println “setting latency 70ms-160ms”
   def jenkinsCall = new URL(“http://[Your URL to yoru Jenkin’s job to do this Netem bandwidth]/build?    delay=0sec”).getText()
   println “setting latency 70ms-160ms 0.3% packet loss”
   def jenkinsCall = new URL(“http://[Your URL to yoru Jenkin’s job to do this Netem bandwidth]/build? delay=0sec”).getText()
    println “setting bandwidth to 768k”
    def jenkinsCall = new URL(“http://[Your URL to yoru Jenkin’s job to do this Netem bandwidth]/build?delay=0sec”).getText()
def stopLatency(){
   println “stopping latency”
   def stop = new URL(“http://[Your URL to yoru Jenkin’s job to do this Netem bandwidth]/build?delay=0sec”).getText()

Back in your test, you can define this like so:

def l = new Latency()

Below the test methods, lets clean this latency and stop Netem from running with:


Ok Great, but What’s the Best Way to Send Audio in the Call?

**UPDATE 3/21/14**

The Google team used WebAudio API to stream audio into the WebRTC call they were testing.  In my case, I am testing with a soft pbx solution of FreeSWITCH.  It made sense for me to reuse it.  I basically make a call to a FreeSwitch that has a dial plan for a test number.  When that number is dialed a specific audio file is played back.



What do you think?

0 points
Upvote Downvote

Total votes: 0

Upvotes: 0

Upvotes percentage: 0.000000%

Downvotes: 0

Downvotes percentage: 0.000000%

Written by Admin

I work for a Telecom company writing and testing software. My passion for writing code is expressed through this blog. It's my hope that it gives hope to any and all who are self-taught.


One Ping

  1. Pingback:

Leave a Reply


Manage your Sports Cards, CCG (MTG, Pokemon, etc), Coin and Stamp collections online

WEBRTC Audio Quality Automation with PESQ