Monday, October 08, 2007

Check for broken links in Ruby, Bash script and Java

I found myself writing 3 different versions of a function to check for broken links in Ruby, Bash and Java. Just want to document here in case anyone interested. Note that these functions use HEAD method (as opposed to GET/POST). It won't download a big file just to see if it's live.

Ruby
require 'net/http'
require 'uri'

def isLive?(url)
uri = URI.parse(url)
response = nil
Net::HTTP.start(uri.host, uri.port) { |http|
response = http.head(uri.path.size > 0 ? uri.path : "/")
}
return response.code == "200"
end

puts isLive?("http://google.com")
puts isLive?("http://asdfasdf.com")

Bash
#!/bin/bash

function isLive {
wget -q --spider $1
}

isLive "http://google.com/somefakelink"

if [ $? -eq 0 ]; then
echo "Good link"
else
echo "Broken link"
fi

Java: Edited (December 18 2008): I've written a better Java version that handles link forwarding and doesn't use 3rd party API here

/* need httpunit-1.6.jar http://httpunit.sourceforge.net */

private boolean isLive(String link) {
try {
WebRequest request = new HeadMethodWebRequest(link);
WebConversation wc = new WebConversation();
WebResponse response = wc.getResource(request);
return response.getResponseCode() == 200
} catch (Exception e)
return false;
}
}

6 comments:

Paul said...

Thanks. There is an error in your ruby code on the following line:

response = http.head(uri.path.size > 0 ? url.path : "/")

The second "url.path" should be "uri.path" with an "i".

Unknown said...

Thanks for pointing that out. I've fixed it.

Unknown said...

Thanks for posting hung. Quite useful.

Kashif said...

Hey hung,
Can you also please post the
WebRequest,HeadMethodWebRequest,
WebConversation,WebResponse classes which you have used in your java code

Thanks.

Unknown said...

You probably want to set timeouts also:

http.open_timeout = 5
http.read_timeout = 5

And if the url had a query string this code loses it, but you can easily fix it with something like this:

response = http.head(uri.path.size > 0 ? uri.request_uri : "/")

Unknown said...

@Kashif

Those classes are from httpunit library, just download their jar here http://httpunit.sourceforge.net