Ruby Image Scraper

A guy I know was looking for a script that could pull down images off his site.

I had a previously made Ruby script that would scrape URL’s and load each link in a browser to check for patterns, errors, etc.

Using that same script, I added some Nokogiri scripting I found here:

http://stackoverflow.com/questions/7926675/save-all-image-files-from-a-website

My result was this little script… using Anemone for the url scraping and the Nokogiri calls for the image scraping on each page:

require 'anemone'
require 'nokogiri'
require 'open-uri'
 
base_url = "http://www.add site here.com"
puts 'Crawling site'
Anemone.crawl(base_url) do |a|
  URLS = []
  a.on_every_page do |p|
    if p.html?
      URLS << p.url.to_s
      URLS.each do |u|
        puts u
        Nokogiri::HTML(open(u)).xpath("//img/@src").each do |src|
          uri = URI.join( u, src ).to_s # make absolute uri
          File.open(File.basename(uri),'wb'){ |f| f.write(open(uri).read) }
        end
      end
    end
  end
end

The script is available on this github repo:
https://github.com/wbwarnerb/ruby_scrapers

Ruby Image Scraper
User Rating: 0 (0 votes)