JOEZACK.COM Code Musings and Such

1Mar/093

Finding and Fixing Broken Images with Ruby

A family members was having a problem with some mixed up image names on a static html site. I could have fixed it manually in a few shakes, but that's no fun. Instead I used hpricot to scrape, open-uri to test for broken-ness, Find to search and some good old fashion regex to correct.

This was my first time messing around with hpricot and I found it to be powerful and easy to use, two thumbs up. I foresee some scraping and spidering posts in the near future.

On to the code:

My final script was a bit hairy so I broke out the bit I used to find the broken images.

If you run the script it'll print the offending paths to screen:

ruby image_scanner.rb http://site.com/busted.html

Or you can call the get_broken_images method to get an array back:

require 'image_scanner'
scanner = Image_Scanner.new
broken_images = scanner.get_broken_images "http://site.com/busted.html"

In case you're interested, I've also uploaded the full code that I used to search for and correct the images although it's implementation specific, riddled with lazy and is poorly tested. Read the disclaimer!

Just run it and be amazed!

ruby image_scanner.rb http://site.com/busted.html /media_folder /busted.html /fixed.html

Download only the broken image scanner
Download the full script

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
Comments (3) Trackbacks (0)
  1. How do I get to the script you created? I don’t see a link on the page to the code.

  2. Oops, Thanks, Fixed!

  3. Thanks for this! Simple, but saves me having to write my own script.


Leave a comment

(required)

No trackbacks yet.