Thursday, December 18, 2008

Link checker with Java - better version

I posted an old blog regarding checking for broken links with Java. However, that version uses 3rd party library httpunit and doesn't handle forwarding case. If a url forwards to another url, http response code would be 302 and the old version would consider that link is broken.

The new version would follow the forwarding url and check it. If the forwarded url is dead, it would be detected.

So here is the new code, without any extra 3rd party API:


private static boolean isLive(String link) {
HttpURLConnection urlConnection = null;
try {
URL url = new URL(link);
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("HEAD");
urlConnection.setConnectTimeout(5000); /* timeout after 5s if can't connect */
urlConnection.setReadTimeout(5000); /* timeout after 5s if the page is too slow */
urlConnection.connect();
String redirectLink = urlConnection.getHeaderField("Location");
if (redirectLink != null && !link.equals(redirectLink)) {
return isLive(redirectLink);
} else {
return urlConnection.getResponseCode() == HttpURLConnection.HTTP_OK;
}
} catch (Exception e) {
return false;
} finally {
if (urlConnection != null) {
urlConnection.disconnect();
}
}
}

public static void main(String[] args) {
System.out.println(isLive("http://google.com"));
System.out.println(isLive("http://somefakelink.com"));
}

Simple demo of clustering Ehcache

To cluster a cache between 2 or more JVM's, it's quite simple if you use Terracotta underneath. It will handle the networking, the shared memory, transparently for you. There's no extra API but a configuration file.

Terracotta also has maven plugin support so you could grab the demo project below and run without downloading anything.

The demo is only one Java source file. It is meant to be run in 2 VM's.

I used a cyclic barrier to synchronize between the 2 VM's. It will block until both VM hit the "await()" call. The first node will create a cache and populate it with 2 elements. The second node then queries out of the cache to demonstrate that it could get to the shared data.

package demo;

import java.util.concurrent.CyclicBarrier;

import net.sf.ehcache.Cache;
import net.sf.ehcache.CacheManager;
import net.sf.ehcache.Element;
import net.sf.ehcache.store.MemoryStoreEvictionPolicy;

public class EhcacheDemo {

private CacheManager manager = CacheManager.create();
private CyclicBarrier barrier = new CyclicBarrier(2);

public void run() throws Exception {
Cache testCache;

System.out.println("Waiting for the other node...");
if (barrier.await() == 0) {
System.out.println("Creating the testCache...");
testCache = new Cache("test", 100, MemoryStoreEvictionPolicy.LFU, true,
null, false, 60, 30, false, 0, null);
manager.addCache(testCache);
}

barrier.await();
testCache = manager.getCache("test");

if (barrier.await() == 0) {
System.out.println("First node: adding 2 elements...");
testCache.put(new Element("k1", "v1"));
testCache.put(new Element("k2", "v2"));
} else {
System.out.println("Second node: querying 3 elements...");
System.out.println(testCache.get("k1"));
System.out.println(testCache.get("k2"));
System.out.println(testCache.get("k3"));
}

System.out.println("Done!");
}

public static void main(String[] args) throws Exception {
new EhcacheDemo().run();
}

}


Note that second node is also querying for non-existed element "k3" as a sanity check.

So how does Terracotta know what I want to share? From the above code, I tell Terracotta to share the cache "manager" and "barrier" by marking them as root in Terracotta configuration file tc-config.xml

<application>
<dso>
<instrumented-classes/>
<roots>
<root>
<field-name>demo.EhcacheDemo.manager</field-name>
</root>
<root>
<field-name>demo.EhcacheDemo.barrier</field-name>
</root>
</roots>
</dso>
</application>



After you download the project, just run this Maven commands:

mvn install
mvn tc:run


The output would be something like this:

[INFO] [node1] Waiting for the other node...
[INFO] [node0] Waiting for the other node...
[INFO] [node0] Creating the testCache...
[INFO] [node1] First node: adding 2 elements...
[INFO] [node0] Second node: querying 3 elements...
[INFO] [node1] Done!
[INFO] [node0] [ key = k1, value=v1, version=1, hitCount=2, CreationTime = 1229592169906, LastAccessTime = 1229592170081 ]
[INFO] [node0] [ key = k2, value=v2, version=1, hitCount=2, CreationTime = 1229592170070, LastAccessTime = 1229592170145 ]
[INFO] [node0] null
[INFO] [node0] Done!