In working with Screen automation, I realized that I could OCR match captcha’s and use simple logic to assign a ReGex pattern and get a math solution to post into a form. I added a Captcha plugin to this website.  Then I wrote some code in Groovy that makes use of the Sikuli API for turning image/Screen based data into strings via OCR.

The output is then collected as a variable and that is parsed using RegEx for a pattern.  The digits in the math test are extracted, and simple arithmetic is done to them. The answer is then “typed” into the Captcha field via the automation and the form is submitted.

Setting Region: def ocr = (new Region(29,334,431,486)).text()  

That sets a region starting at the X, Y and extending out X, Y pixels.  This is where I expect the browser to load with the captcha in the region. The region could be most or all of the screen, but I just was defining it here for testing purposes. It worked and provided some OCR of the screen in that region:

Output from OCR:

the text on screen is: Website Captcha +5=6 Comment } ~ nth /Libr ‘=L._:_I£..\.2:..’.’.’_:,_‘_____. A__\_ \_ |-\ l\\*—‘’. …’,,._j..__ ..-..- ~ .—-~–v~ —

All I want is what is in Red… so to isolate it, I wrote this regex:

def myregex = /\+([0-9])=([0-9])/

Then I applied a matcher like so:

def matcher = (ocr =~ myregex)

that will give me back an array like this: [+5=6, 5, 6] I need to get the value 5 and the value 6.

That would be matcher[0][1] and matcher[0][2]

The reason is, there’s two arrays… the first [0] is the main set.

Then I want the second value [1] and the third value [2]. I assigned variables to each of these matchers… and converted them into Integers (by forcing them into a string using “${}” and appending the .toInteger() function.

def firstNum = matcher[0][1]

def secondNum = matcher[0][2]

def firstNumAsInt = “${firstNum}”.toInteger()

def secondNumAsInt = “${secondNum}”.toInteger()

Finally, I could simply do a math request like so:

def result = secondNumAsInt – firstNumAsInt

Then the result is typed into the field however you want… I just used some basic tabbing and entering of text. To make it more bulletproof I would do a region scan and decide which pattern to use: number + number = or number + [] = number or [] + number = number.  I could also add in logic to check for division, multiplication, subtraction, words, etc.

The Code

Screen s = new Screen()

s.with{ App.focus(“Firefox”)

type(Key.DOWN, KeyModifier.CMD)


Settings.OcrTextSearch = true


def ocr = (new Region(29,334,431,486)).text()

println “the text on screen is: ${ocr}”

def myregex = /\+([0-9])=([0-9])/ //|| /Captcha:* + ([0-9]) = ([0-9])/

def matcher = (ocr =~ myregex)

def firstNum = matcher[0][1]

def secondNum = matcher[0][2]

def firstNumAsInt = “${firstNum}”.toInteger()

def secondNumAsInt = “${secondNum}”.toInteger()

def result = secondNumAsInt – firstNumAsInt

println “result is ${result}”

type(“${result}”) type(Key.TAB) type(“My spam”)

type(Key.TAB) type(Key.ENTER)

Video Demonstration

[embedplusvideo height=”200″ width=”500″ editlink=”” standard=”″ vars=”ytid=it7ym5nujRw&width=500&height=200&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep3295″ /]

Leave a Reply

Your email address will not be published. Required fields are marked *