Automated Audio Detection

There aren’t a lot of libraries to accomplish this task of validating audio playback.  I saw this question on Stack Overflow and my solution was to provide a script that uses sox for the audio detection.

Test Browser Based Phone Audio Presence

With web phone calls, you have two major protocols: WebRTC and RTMFP.  The later is flash based.   Usually WebRTC is the choice decision – but if the browser doesn’t support it, or if the bandwidth is an issue, then RTMFP is picked.  This becomes an issue, as it’s pretty challenging to make a test work entirely with Selenium Webdriver.

Below is a sample script written in Groovy.  In the script I was testing a web form that had a drop down for different codec’s to pick.  Considering that, I make use of a simple list and iterate over it’s contents… the list has values like “g711” and “speed”, which are the options in the drop down.  So the test will run through each audio codec available.

The goal is to make sure the audio played.

import org.openqa.selenium.support.ui.Select
import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.sikuli.script.Screen
 
browser = new FirefoxDriver()
 
def testCall(url,protocol){
    println "MAKE SURE Soundflower 2 IS YOUR AUDIO OUT..."
    browser.get("${url}")
    placeCall(protocol)
}
 
def placeCall(protocol){
    codecTypes = ["g711","speex"]
    codecTypes.each { codec ->
        if (protocol == "flash") {
            // Selenium code to fill out web form.
            Select selectProtocol = new Select(browser.findElement(By.id("protocol")))
            selectProtocol.selectByValue("flash")
            Select selectCodec = new Select(browser.findElement(By.id("codec")))
            selectCodec.selectByValue("${codec}")
            WebElement callButton = browser.findElement(By.id("launchPhone"))
            callButton.click()
 
            // Sikuli to mitigate Flash security acceptance
            Screen s = new Screen()
            s.with {
                find("sikuli_imgs/allow_flash.png")
                click("sikuli_imgs/allow_flash.png")
            }
 
            // Sox audio detection
            def audioCheck = "sox -t coreaudio Soundflower /Users/me/myproject/record.wav silence 1 0.1 1% 1 .1 1%".execute()
            println "Waiting 5 seconds"
            sleep(5000)
            audioCheck.waitForOrKill(5000)
            println "EXIT VALUE FOR SOX IS: ${audioCheck.exitValue()} using $codec"
            if(audioCheck.exitValue() == 0){
                println "Heard audio with codec: $codec"
                // we can write out to a csv file, or update a db with a pass at this point
            }else{
                // we could update a csv or db with a test failure
            }
        }
    }
}

The test can be kicked off like so:
testCall(“http://myform….”,”flash”)

Mix of Selenium and Sikuli

Selenium is much faster then Sikuli and without the need to keep a bunch of images around for all the click actions.  But Selenium can’t do everything.  I see people get stuck trying to make Selenium do something it wasn’t meant to do.  If you need to OCR elements on the screen, validate images in video or mitigating Flash (where no Flex api exists)… another library or 3rd party tool is needed. In this case, I use the Sikuli library.  It’s a simple Java Jar – you just put the Sikuli Java jar in your project as an external Library and you have all you need to make the Sikuli calls in code.

I don’t think it’s bad to mix the two like this… I just keep the use of Sikuli only to the elements that are not easily automated in some other way.

Sox command

The sox command is using the -t param to specify the audio board.  On Linux systems, that would be the alsa board. In OSX it’s called coreaudio – as for the “Soundflower” param, that’s the name of my virtual audio channel.

The sox command takes several values regarding the audio detection.  After “silence” the 1, says “we’re looking for audio” and the 0.1 means we’ll wait 1/10th of a second of time, then check for noise to raise 1% above the baseline.  The next three numbers determine the length of time we wait, and how far the audio drops.  These values will trigger a quick response – it starts recording once audio is heard and kills the recording almost immediately (once there is the briefest of pauses.)  When the recording is finished, it passes exit code 0.

Exit code 0 is used to determine pass or fail (if audio was heard or not.)  If after 5 seconds (sleep(5000)) the audio isn’t heard, we kill the test (which is exit code 2 – a fail.)

Virtual Audio Channel

I use virtual audio channels to isolate the audio.  If the regular default channel is used, it will pick up the mic.  In OSX there’s an open source tool called Soundflower that installs two virtual audio channels.    Then in the Systems menu, you change the sound output to go through the Soundflower channel.

That sets all audio going out, to go out the virtual channel rather then the ext. speakers.  Then we pick it up with Sox pointing specifically to that same virtual channel.

Audio Quality

This simple script just covers detecting audio.  The whole bit about Audio Quality is more involved… and covered in other articles on this site.

Leave your vote

0 points
Upvote Downvote

Total votes: 0

Upvotes: 0

Upvotes percentage: 0.000000%

Downvotes: 0

Downvotes percentage: 0.000000%

About Admin 329 Articles
I work for a Telecom company writing and testing software. My passion for writing code is expressed through this blog. It's my hope that it gives hope to any and all who are self-taught.