in , ,

Ruby Image Scraper

A guy I know was looking for a script that could pull down images off his site.

I had a previously made Ruby script that would scrape URL’s and load each link in a browser to check for patterns, errors, etc.

Using that same script, I added some Nokogiri scripting I found here:

My result was this little script… using Anemone for the url scraping and the Nokogiri calls for the image scraping on each page:

require 'anemone'
require 'nokogiri'
require 'open-uri'
base_url = "http://www.add site"
puts 'Crawling site'
Anemone.crawl(base_url) do |a|
  URLS = []
  a.on_every_page do |p|
    if p.html?
      URLS << p.url.to_s
      URLS.each do |u|
        puts u
        Nokogiri::HTML(open(u)).xpath("//img/@src").each do |src|
          uri = URI.join( u, src ).to_s # make absolute uri
,'wb'){ |f| f.write(open(uri).read) }

The script is available on this github repo:

What do you think?

0 points
Upvote Downvote

Total votes: 0

Upvotes: 0

Upvotes percentage: 0.000000%

Downvotes: 0

Downvotes percentage: 0.000000%

Written by Admin

I work for a Telecom company writing and testing software. My passion for writing code is expressed through this blog. It's my hope that it gives hope to any and all who are self-taught.


Leave a Reply


Ruby Basics: Hashes

Installing Nokogiri on OSX