Tuesday, December 12, 2006

Using Java bytecode in clustering technique

When I first heard of clustering Java objects, I immediately thought about object serialization. As someone just got out of school, it's an educated deduction :) I soon learned it's not always the case when speed and scalability are taken into account. Serialization of a plain ol' java object is rather slow when you have lots of transactions, and lots of objects. So how else then you would maintain the object integrity across cluster?

For Terracotta DSO, we choose to go one level down, the JVM bytecode (as opposed to the application level) There's like a whole new world of Java when you decide to look into it. As it turns out, every time you set or get values to variables, the opcodes for them are "getfield" and "putfield". DSO looks for these opcodes and record the mutation of an object in one JVM, then replay that "tape" in another JVM on the same clustered object. The final effect is that we can replicate object data in multiple JVMs on the field level.

Let take a look at a simple class: Person.java
To create the byte code, run these commands:

$> javac Person.java
$> javap -c Person > Person.bc

Person.java



public class Person {
private String name;

public Person(String aName) {
name = aName;
}

public String getName() {
return name;
}

public void setName(String aName) {
name = aName;
}
}


Person.bc



Compiled from "Person.java"
public class Person extends java.lang.Object{
public Person(java.lang.String);
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."":()V
4: aload_0
5: aload_1
6: putfield #2; //Field name:Ljava/lang/String;
9: return

public java.lang.String getName();
Code:
0: aload_0
1: getfield #2; //Field name:Ljava/lang/String;
4: areturn

public void setName(java.lang.String);
Code:
0: aload_0
1: aload_1
2: putfield #2; //Field name:Ljava/lang/String;
5: return

}


Voila, the gut of our class is exposed. For DSO, this is what we work on. So as you can see, the DSO claim about only sending the deltas (the changes) of an object across network, makes sense. The magic is in the bytecode instrumentation, using ASM framework.

Similarly, locking by using "synchronized" blocks can be observed by the tell-tales "monitorenter", "monitorexit" bytecodes. Don't take my words for it, try it out yourself. I've learned a great deal since I start studying Java bytecode.

One good source I found is: Java bytecode:
Understanding bytecode makes you a better programmer

Ari Zilka talk at Google is another great source.

Have fun-
Post a Comment