Thursday, December 18, 2008

Link checker with Java - better version

I posted an old blog regarding checking for broken links with Java. However, that version uses 3rd party library httpunit and doesn't handle forwarding case. If a url forwards to another url, http response code would be 302 and the old version would consider that link is broken.

The new version would follow the forwarding url and check it. If the forwarded url is dead, it would be detected.

So here is the new code, without any extra 3rd party API:


private static boolean isLive(String link) {
HttpURLConnection urlConnection = null;
try {
URL url = new URL(link);
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("HEAD");
urlConnection.setConnectTimeout(5000); /* timeout after 5s if can't connect */
urlConnection.setReadTimeout(5000); /* timeout after 5s if the page is too slow */
urlConnection.connect();
String redirectLink = urlConnection.getHeaderField("Location");
if (redirectLink != null && !link.equals(redirectLink)) {
return isLive(redirectLink);
} else {
return urlConnection.getResponseCode() == HttpURLConnection.HTTP_OK;
}
} catch (Exception e) {
return false;
} finally {
if (urlConnection != null) {
urlConnection.disconnect();
}
}
}

public static void main(String[] args) {
System.out.println(isLive("http://google.com"));
System.out.println(isLive("http://somefakelink.com"));
}

11 comments:

Ahmed Ashour said...

Did you try HtmlUnit? It will be very helpful for JavaScript/Ajax processing, and it is more superior than HttpUnit.

G. Kiragiannis said...

Thanks for publishing this code. My blog had have broken links in the past and I want to have a script check it every time. Do you think this code will be useful to check a very dynamic site like a CMS?

Anonymous said...

Hello, url.equals(redirectLink)) will always be false because url is an URL and redirectLink is a String...

Hung Huynh said...

@Kiragiannis
It certainly can be used to do that.

@pgras
Nice catch! Fixed. Thanks for letting me know.

Noor said...

Hi Hung,

Your code is working fine while in compilation mode. But, In the run time I am facing some problem on connection timed out. Could you please let me know what's the reason behind that.

Hung Huynh said...

Noor,

You can set a 5s timeout for the urlConnection before the connect() call:

urlConnection.setConnectTimeout(5);

Any query that takes more than 5 seconds will be considered a dead link.

Superspizzard said...

Hi may I ask how we adapt this code to accept <a tags as an argument and output the error code if the url doesnt display e.g. 404 etc?

Virag said...

Hey your code works fine...but it hangs when you pass a broken or invalid link to your function...Is there i guess the timeout does not seem to work...can we insert some wait statements after the urlConnection.connect()...if so..how do we do it...??

Hhuynh said...

@5d7fd7f7dd920044ff210375a5ae3e07 you could set the read timeout value also:

urlConnection.setReadTimeout(5000);

Richard said...

Nice. Some websites may appear to be offline (function returns false), but still they do exist when you point your browser to them. I found out they refuse the request method "HEAD". They may return a 500 status. In that case I run the function once more, but with requestMethod "GET" which behaves like a regular web browser.

Gowravajjala Swapna said...

i need help. can anyone help me?