Thursday, July 26, 2007

Cluster a standalone Spring app to calculate Mandelbrot set

I've heard about Spring for awhile and have made little attempt to check out what it was. Most of the descriptions about Spring gave me the impressions that only big enterprise applications would use Spring. Man I was wrong :) As I dug down into Pro Spring book and read the Spring reference, I was able to convert one of my stand alone apps into Spring and reaped the benefit that Spring has to offer. Not only that, I was able to cluster my bean to share it among JVMs with little effort, thanks to Terracotta.

My application is a simple demo of calculating the Mandelbrot set. It is multithreaded, using a queue task and workers. Here is a rough sketch (I'm UML illiterate):

------------------------------------- -----------------------------
MandelbrotModel: CalculateNode : Runnable
rawData: int[][] model: MandelbrotModel
workLoad: BlockingQueue
------------------------------------ 1 ----->* -----------------------------


Each CaculateNode has a reference to the model, thus the workLoad queue. It will take a task (a Segment) out of workLoad, process it, and put back the findings in "rawData". Spring comes into the picture when it allows me to wire the dependency between nodes and model, like so:

<bean id="model" class="mandelbrot.MandelbrotModel" scope="singleton">
<property name="length">
<value>600</value>
</property>
<property name="numNodes">
<value>2</value>
</property>
<property name="numTasks">
<value>20</value>
</property>
</bean>

<bean id="node" class="mandelbrot.CalculateNode" scope="singleton">
<property name="model" ref="model">
</property>
</bean>

The above snippet describes 2 beans, a "model" and a "node". Properties are 1 on 1 matching of fields in your class. Say the field "private int length" in the model is set to 600 pixel, and so on. Spring "magic" is in the "node" bean, it says, point node's field of "model" to the model bean above, hence the dependency injection.

public class CalculateNode implements Runnable {
private MandelbrotModel model;

public void setModel(MandelbrotModel model) {
this.model = model;
}
}

Notice I have a setModel() method in the CalculateNode but I don't have to call it explicitly in my program. Spring will call it and set it for me. My node can do all it work, pretending that it has the model reference. The main method is really simple.

public static void main(String[] args) {
ApplicationContext ctx = new ClassPathXmlApplicationContext(
"mandelbrot.xml");

CalculateNode node1 = (CalculateNode) ctx.getBean("node");
new Thread(node1).start();

CalculateNode node2 = (CalculateNode) ctx.getBean("node");
new Thread(node2).start();
}

I just need to query out 2 nodes and start them in 2 threads, no need to worry where is my model and how the nodes got a reference to it.

But wait, there's more. Spring also offers to publish application events for you for cheap. Say now that I have data of the Mandelbrot, I can display it on a Swing application. I don't want to wait for all the nodes to finish calculation and display the graphic in the end. I want to paint it in real time, as soon as a node finishes with a segment, I'll paint it. To pull that off, I need to know when a segment is finished, the right source would be to ask the model. It knows that information when a node reports back with data. The "ghetto" way would be implement a listener list, traverse that list and and invoke callback function for each listener. This was the old implementation that I had. It does the job but it aint neat. In real world, you might see how this is not desirable sometimes because it couples your components. Spring allows you to solve this problem nicely: if you want to publish events, use the application context and call publishEvent() with your message! That's all you have to do.

Let examine this function in the MandelbrotModel. It's invoked by a node to report result

public synchronized void addSegmentData(int[][] data, Segment segment) {
for (int row = segment.getStart(); row < segment.getEnd(); row++) {
System.arraycopy(data[row - segment.getStart()], 0, rawData[row], 0,
length);
}

ctx.publishEvent(new SegmentEvent(this, segment));
}

"ctx" is a reference to the context of the application, automatically given to anyone if you implement a Spring interface "ApplicationContextAware".

@Override
public void setApplicationContext(ApplicationContext applicationcontext)
throws BeansException {
ctx = applicationcontext;
}

As for someone who wants to listen to events, implements ApplicationListener interface and catch your events:

@Override
public void onApplicationEvent(ApplicationEvent event) {
if (event instanceof SegmentEvent) {
final SegmentEvent segEvent = (SegmentEvent) event;
processSegment(segEvent.getSegment());
}
}

I have my ViewerFrame (extends from JFrame) to act as listener, declared in my XML bean definition file and also has a reference to the model:

<bean id="frame" class="mandelbrot.ViewerFrame" scope="singleton">
<property name="model" ref="model">
</property>
</bean>

There is no change in the main method. As Spring initializes the "frame" bean, it will automatically show up. This is pretty easy because I made itself aware, knowing that it has been created by implementing InitialzingBean interface

@Override
public void afterPropertiesSet() throws Exception {
int length = model.getLength();
this.image = new BufferedImage(length, length, BufferedImage.TYPE_INT_RGB);

this.setSize(length, length);
SwingUtilities.invokeLater(new Runnable() {
public void run() {
setVisible(true);
}
});
}

The fun doesn't end here though. Now I want to have another JVM to join in the calculation, making it go faster (a little bit of drama here since the Mandelbrot set is pretty fast to calculate). The ideal scenario is the second JVM also has the same "model" in memory then its minions of "nodes" can just pound on the work. This second JVM can be on a totally separate machine even. How the heck anyone is gonna pull this off without changing the code, worrying about networking, sharing heap, etc...? This is where the power of Terracotta comes in. I just mark the "model" bean as shared and the SegmentEvent as distributed. That will enable any JVM in the cluster having reference to this model, and events published from model will be cluster-wise.

Terracotta is fully Spring aware, and the snippet below is how I wired it to my Spring app.

<spring>
<jee-application name="*">
<application-contexts>
<application-context>
<paths>
<path>mandelbrot.xml</path>
</paths>
<beans>
<bean name="model">
</bean>
</beans>
<distributed-events>
<distributed-event>mandelbrot.SegmentEvent</distributed-event>
</distributed-events>
</application-context>
</application-contexts>
</jee-application>
</spring>

When I use Terracotta eclipse plugin, after starting a Terracotta server (used to hold the shared objects), starting two instances of my applications that shared the "model" is as easy as hitting the Run button twice. Now I have 2 JVMs, each has 2 nodes working on the same model. Neato. As my bird would say "oh wow" when something excites him.

So there you have it. I'm a Spring novice now but I'm sure there are many cools things from Spring waiting to be discovered. And with the added boost from Terracotta, the fun is multiplied.




Here is the link to the source code