Monday, September 17, 2007

Distributed EhCache as second level cache under Hibernate

EhCache is a one of the great options for Hibernate second level cache. By making it distributed, multiple web applications will be able to share the same cache thus enhance your overall performance and availability. To enable the distributed cache, Terracotta 2.4.3 has a built in support for EhCache 1.3.0 and 1.2.4. I will go through an example of how this be done.

The stack could be visualized like this:

----------- -----------
Tomcat 1 Tomcat 2
----------- -----------
Hibernate Hibernate
------------------------------
EhCache
TERRACOTTA
------------------------------


Terracotta is the driving force though its presence is transparent to your web app thanks to bytecode instrumentation. First, let take a look at enabling EhCache in Hibernate configuration file
[+]hibernate.cfg.xml

It's just the same standard setting that you would use for non-distributed case. You want to turn on query cache mode, point Hibernate to ehcache.xml and finally specify a provider. The ehcache.xml can be as simple as this:
[+]ehcache.xml

One thing I'd like to point out is that Terracotta persists heap memory to disk efficiently (and fault them in as needed) so "overflowToDisk" becomes redundant in StandardQueryCache. As a matter of fact, Terracotta doesn't honor this option. Now that the cache is set up, in our entity mapping files, we need to let Hibernate know which entities we'd like to cache during runtime. In my example, I have an Event table (title, date) and a Person table (firstname, lastname) that have a many-to-many relationship through a join table PERSON_EVENT. With that in mind, let's examine Event.hbm.xml and Person.hbm.xml:
[+]Mapping files

The details might be distracting but if you're familiar with Hibernate, this should be as simple as it gets :)

Now we get past all the settings, the fun stuff begins with our servlets. I've created 2 servlets, one called CreateEvents that will populate data into our table. The other, QueryEvents, will query and display the cache hit statistic.
[+]CreateEvents.java

Some default data (3 persons and 2 events) are created during init() phase. With (2), I added an option to add additional Person to the database so later we can use it to demonstrate cache invalidating. To be able to get statistic of cache hit and miss, (3), a query statistic object is created. It will give us the hit/miss count in (4), for the query "select * from Event".
With QueryEvents.java, we will ask Hibernate for list of persons, and events, which would prove to us whether the cache is used or database is used:
[+]QueryEvents.java

I ran these two servlets in 1 Tomcat to make sure everything working correctly:
By hitting http://localhost:8080/Events/create (mapped to CreateEvents servlet)

Events created: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 1
Event query cache hit: 0

Since it's the first time we query for events, that's why we have one cache miss. Hibernate went to db to get the data.
Now we hit http://localhost:8080/Events/query (mapped to QueryEvents servlet), the result is:

Events found: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 1
Event query cache hit: 1

People found: [Ichigo Kurosaki, Abarai Renji, Ishida Uryu]
Person query cache miss: 1
Person query cache hit: 0

Participants of Even 1: [Abarai Renji, Ishida Uryu, Ichigo Kurosaki]
Participants of Even 2: [Abarai Renji, Ishida Uryu]

As expected, the Event query is now hit the cache, raising the hit count to 1. And since it's the first time we query for Person, the cache miss is 1. The participants list proves that the event pojos coming from the cache are valid and can be re-associated to this Hibernate session. Now, if we run CreateEvents servlet on Tomcat 1 and QueryEvents on Tomcat 2, the distributed cache should give us the same result.
This is where Terracotta comes in. There is no change in settings needed in any of the Hibernate mappings files, nor ehcache.xml. There is no code change either. What you need to do is to run Tomcat with Terracotta enabled. The process involved setting 3 java system properties to Tomcat jvm

-Xbootclasspath/p:"path/to/Terracotta/bootjar"
-Dtc.install-root=/path/to/Terracotta/install
-Dtc.config=/path/to/tc-config.xml

Detailed instructions can be found here
Luckily, Terracotta has a nice Session configuration tool that will help you set up 2 Tomcats (or Weblogic) cluster. All you need is to import your WAR file. I created a Evetns.war file that contains both of my serlvets and all the needed jars. I need to configure tc-config.xml to let Terracotta knows that I'm using Hibernate, EhCache by adding those modules (1). Also, classes that will be shared need to be instrumented (2). Terracotta also supports sharing of session by declaring your webapp name (3). However, I'm not clustering any session in this example.
[+]tc-config.xml

The session configurator will start up Terracotta server and 2 Tomcats. We can now access CreateEvents servlet on the first Tomcat at port 9081 by hitting http://localhost:9081/Events/create and hit QueryEvents on the second Tomcat at http://localhost:9082/Events/query.

The result I got is:
 Events/create:
Events created: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 1
Event query cache hit: 0

Events/query:
Events found: [Event 1: 2007-09-30, Event 2: 2007-12-01]
Event query cache miss: 0
Event query cache hit: 1

People found: [Ichigo Kurosaki, Abarai Renji, Ishida Uryu]
Person query cache miss: 1
Person query cache hit: 0

Participants of Even 1: [Abarai Renji, Ishida Uryu, Ichigo Kurosaki]
Participants of Even 2: [Abarai Renji, Ishida Uryu]

If I reload Events/query, the statistic is as expected:

Event query cache miss: 0
Event query cache hit: 2

Person query cache miss: 1
Person query cache hit: 1


To test that the cache is invalidated, after hitting Events/query, in the cache we now have a list of 3 persons. If we create one new person, by hitting http://localhost:9081/Events/create?fn=John&ln=Smith, what we have in the cache now is stale data. Of course, thanks to Terracotta, the second Tomcat + Hibernate is aware of this situation and stale data will be invalidated. Which leads to a cache miss (instead of a hit) when we reload http://localhost:9082/Events/query

Event query cache miss: 0
Event query cache hit: 3

People found: [Ichigo Kurosaki, Abarai Renji, Ishida Uryu, John Smith]
Person query cache miss: 2
Person query cache hit: 1

As you can see, Event query cache hit continues to rise, when we now have a cache miss in Person query since the cached data is made invalid. Hibernate had to hit the database for new fresh data.

I hope I didn't bore you with too much details but I think it's important to each steps. Terracotta is greatly beneficial if you choose to use EhCache as distributed cache with Hibernate.

You can download the project here and give it a try.

20 comments:

Anjan said...

cool post!

keep them coming

I missed your webinars. will you re-run them ?

thank you,

BR,
~A

Hung Huynh said...

A,

Thanks for stopping by. Our webinars can be found at http://www.terracotta.org/confluence/display/orgsite/Online+Training

Hung-

manunath said...

How does terracotta or distributed cache identify stale data.

Can you explain me how is this done?

Thanks,
Manjunath

manunath said...

How does terracotta or distributed cache identify stale data.

Can you explain me how is this done?

Thanks,
Manjunath

Hung Huynh said...

Manjunath,

Stale data are identified and evicted by TTL and Idle time, just like it would be with the regular EhCache.

Hibernate + EhCache detects out-of-synch in database and cached queries. Terracotta doesn't interfere with this process. It only makes sure the cached queries are clustered.

Hung-

Dean said...

When my Java code access an object that was cached by EhCache using Terracotta, does my reference to the object point into Terracotta-backed "network attached memory"? If so, does this mean I have to configure all of the Terracotta instrumentation for that object and other objects it references? (If that was in the post, I missed it.) If not, how does that work?

Hung Huynh said...

Dean,

You were right. Your references to the pojos are backed by Terracotta and live as network-attached-memory objects. To let TC know about these POJO classes, you have to specify them in tc-config.xml. Check my example in blog, section "instrumented class".

Anonymous said...

do you mean all the entity classes configured them in the hibernate 2nd level cache ? or should i include all the classes that have a references to the the entity classes..?

Hung Huynh said...

Yes, you also have to include classes that references shared entities. Pattern matching and wildcards are supported so including them is pretty straightforward. If you run into any problem, please check out our forum.

Albert said...

Great article.

A question however.

Inserts are fine. What happens in case of an update. How would the other instance know that it has a stale copy.

Great work!
Albert.

Hung Huynh said...

Hi Albert,
As I stated earlier in my reply to Manjunath, Terracotta doesn't have to do extra work to detect stale data. Hibernate + Ehcache are the ones that handle that part.

Bahata said...

Hi
If I do not consider Query Cache, and only consider the second level cache for persistent objects, can I get all objects stored in cache only without going to database? i.e. Say I know, all objects I need are already there in cache and a database hit is not required. But I want to read the objects according to one of their non-unique non-id field eg name of a person, not his ID.

Hung Huynh said...

My understanding is no. Hibernate 2nd level cache does not store object instances, it maintains a map from id to a serialized state of the object. So, if you want to search based on some other field, you need to use query cache and 2nd level cache

Seema Richard said...

Doesn't EHCache support clustered hibernate cache by itself? Why is Terracotta needed in that case?

Hung Huynh said...

I don't know much about native support of distributed caching of Ehcache but I do know that modification of a cache object resulting the whole object to be shipped to other nodes. When in Terracotta, only the change is propagated so it's more efficient.

You can take a look at this blog
http://www.miketec.org/serendipity/index.php?/archives/8-Introduction-to-Terracotta.html , paragraph about "Caching" to get some additional info.

Williamsburg said...

check NCache, which is a distributed cache, also supports NHibernate L2 caching. And, you do not need to do any programming to use NCache with NHibernate.

http://www.alachisoft.com/ncache/nhibernate_l2cache_index.html

Adshafqat said...

Another Good tutorial about Ehcache, spring and hibernate integration

http://eiconsulting.blogspot.com/2011/10/ehcache-implementation-in-spring.html

Venkat Nitw05 said...

Hi Hung,
In my project, i have a similar scenario. 
Single webapp and a cache Cluster. 
And the cache is on a separate server on the same network. And, one more server is there for the high availability.

Can you please give me an idea how to start on this!

Thanks in advance.

hhuynh said...

Hello,

There is very detailed documentation of how to do that at Ehcache website 

http://ehcache.org/documentation/configuration/distributed-cache-configuration

Sandy said...

Hi Hung Huynh ,
1 query I have implemented Ehcache it is working fine , but in case of update what can i do ? any suggestion...i tried with Query.setCacheMode(CacheMode.GET )