Monday, September 17, 2007

Distributed EhCache as second level cache under Hibernate

EhCache is a one of the great options for Hibernate second level cache. By making it distributed, multiple web applications will be able to share the same cache thus enhance your overall performance and availability. To enable the distributed cache, Terracotta 2.4.3 has a built in support for EhCache 1.3.0 and 1.2.4. I will go through an example of how this be done.

The stack could be visualized like this:

----------- -----------
Tomcat 1 Tomcat 2
----------- -----------
Hibernate Hibernate
------------------------------
EhCache
TERRACOTTA
------------------------------


Terracotta is the driving force though its presence is transparent to your web app thanks to bytecode instrumentation. First, let take a look at enabling EhCache in Hibernate configuration file
[+]hibernate.cfg.xml

It's just the same standard setting that you would use for non-distributed case. You want to turn on query cache mode, point Hibernate to ehcache.xml and finally specify a provider. The ehcache.xml can be as simple as this:
[+]ehcache.xml

One thing I'd like to point out is that Terracotta persists heap memory to disk efficiently (and fault them in as needed) so "overflowToDisk" becomes redundant in StandardQueryCache. As a matter of fact, Terracotta doesn't honor this option. Now that the cache is set up, in our entity mapping files, we need to let Hibernate know which entities we'd like to cache during runtime. In my example, I have an Event table (title, date) and a Person table (firstname, lastname) that have a many-to-many relationship through a join table PERSON_EVENT. With that in mind, let's examine Event.hbm.xml and Person.hbm.xml:
[+]Mapping files

The details might be distracting but if you're familiar with Hibernate, this should be as simple as it gets :)

Now we get past all the settings, the fun stuff begins with our servlets. I've created 2 servlets, one called CreateEvents that will populate data into our table. The other, QueryEvents, will query and display the cache hit statistic.
[+]CreateEvents.java

Some default data (3 persons and 2 events) are created during init() phase. With (2), I added an option to add additional Person to the database so later we can use it to demonstrate cache invalidating. To be able to get statistic of cache hit and miss, (3), a query statistic object is created. It will give us the hit/miss count in (4), for the query "select * from Event".
With QueryEvents.java, we will ask Hibernate for list of persons, and events, which would prove to us whether the cache is used or database is used:
[+]QueryEvents.java

I ran these two servlets in 1 Tomcat to make sure everything working correctly:
By hitting http://localhost:8080/Events/create (mapped to CreateEvents servlet)

Events created: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 1
Event query cache hit: 0

Since it's the first time we query for events, that's why we have one cache miss. Hibernate went to db to get the data.
Now we hit http://localhost:8080/Events/query (mapped to QueryEvents servlet), the result is:

Events found: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 1
Event query cache hit: 1

People found: [Ichigo Kurosaki, Abarai Renji, Ishida Uryu]
Person query cache miss: 1
Person query cache hit: 0

Participants of Even 1: [Abarai Renji, Ishida Uryu, Ichigo Kurosaki]
Participants of Even 2: [Abarai Renji, Ishida Uryu]

As expected, the Event query is now hit the cache, raising the hit count to 1. And since it's the first time we query for Person, the cache miss is 1. The participants list proves that the event pojos coming from the cache are valid and can be re-associated to this Hibernate session. Now, if we run CreateEvents servlet on Tomcat 1 and QueryEvents on Tomcat 2, the distributed cache should give us the same result.
This is where Terracotta comes in. There is no change in settings needed in any of the Hibernate mappings files, nor ehcache.xml. There is no code change either. What you need to do is to run Tomcat with Terracotta enabled. The process involved setting 3 java system properties to Tomcat jvm

-Xbootclasspath/p:"path/to/Terracotta/bootjar"
-Dtc.install-root=/path/to/Terracotta/install
-Dtc.config=/path/to/tc-config.xml

Detailed instructions can be found here
Luckily, Terracotta has a nice Session configuration tool that will help you set up 2 Tomcats (or Weblogic) cluster. All you need is to import your WAR file. I created a Evetns.war file that contains both of my serlvets and all the needed jars. I need to configure tc-config.xml to let Terracotta knows that I'm using Hibernate, EhCache by adding those modules (1). Also, classes that will be shared need to be instrumented (2). Terracotta also supports sharing of session by declaring your webapp name (3). However, I'm not clustering any session in this example.
[+]tc-config.xml

The session configurator will start up Terracotta server and 2 Tomcats. We can now access CreateEvents servlet on the first Tomcat at port 9081 by hitting http://localhost:9081/Events/create and hit QueryEvents on the second Tomcat at http://localhost:9082/Events/query.

The result I got is:
 Events/create:
Events created: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 1
Event query cache hit: 0

Events/query:
Events found: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 0
Event query cache hit: 1

People found: [Ichigo Kurosaki, Abarai Renji, Ishida Uryu]
Person query cache miss: 1
Person query cache hit: 0

Participants of Even 1: [Abarai Renji, Ishida Uryu, Ichigo Kurosaki]
Participants of Even 2: [Abarai Renji, Ishida Uryu]

If I reload Events/query, the statistic is as expected:

Event query cache miss: 0
Event query cache hit: 2

Person query cache miss: 1
Person query cache hit: 1


To test that the cache is invalidated, after hitting Events/query, in the cache we now have a list of 3 persons. If we create one new person, by hitting http://localhost:9081/Events/create?fn=John&ln=Smith, what we have in the cache now is stale data. Of course, thanks to Terracotta, the second Tomcat + Hibernate is aware of this situation and stale data will be invalidated. Which leads to a cache miss (instead of a hit) when we reload http://localhost:9082/Events/query

Event query cache miss: 0
Event query cache hit: 3

People found: [Ichigo Kurosaki, Abarai Renji, Ishida Uryu, John Smith]
Person query cache miss: 2
Person query cache hit: 1

As you can see, Event query cache hit continues to rise, when we now have a cache miss in Person query since the cached data is made invalid. Hibernate had to hit the database for new fresh data.

I hope I didn't bore you with too much details but I think it's important to each steps. Terracotta is greatly beneficial if you choose to use EhCache as distributed cache with Hibernate.

You can download the project here and give it a try.
Post a Comment