C2B2 logo icon

Getting Started with JBoss Enterprise Data Grid

JBoss have recently granted early access to Enterprise Data Grid (EDG), the supported version of Infinispan. EDG is a high performance key/value based cache supporting local, replicated, invalidation and distributed modes. Distributed mode is the one we'll be focusing on in this blog post because its the one that's gaining lots of attention at the moment due to its elastic scaling properties.

You can sign up now for a copy of EDG to download and have a play around with here http://www.jboss.com/edg6-early-access/.  So to help you get started lets create a distributed grid and check out the Hotrod client-server protocol.
Fully replicated caches don't tend to scale too well past a relatively small number of nodes as the overhead of full replication has a significantly detrimental effect on performance.  Distributed caches alleviate this bottleneck by providing redundancy through a configurable number of back-ups or copies of the data allowing you to scale caching architectures linearly.  When grid nodes join and leave the cluster data is re-balanced between the nodes to maintain the configured level of redundancy (number of back-ups) and an even distribution.
EDG can be run in embedded or client-server mode with Memcached, REST and Hotrod clients provided as part of the distribution.  Hotrod is a smart java based client which provides in-built connection pooling, failover and smart-routing.  The smart-routing feature of Hotrod optimises client calls into the EDG server grid as the client has up to date knowledge of the server side topology and is able to route requests directly to the nodes owning the data.
Once you've signed up and downloaded the distribution (we are using JBoss Data Grid Server 6.0.0 Beta1), unzip it into a location of your choice.
If you take a look around you'll soon discover that the directory structure is a bit different to Infinispan, in fact it bears a striking resemblance to JBossAS7 (or JBossEAP 6 if you are paying) and that's because the EDG server in runs inside JBoss's new modular application server framework.
JBossAS7 runs in two modes, standalone (a single unmanaged instance) or domain (a centrally managed set of servers).  Domain mode permits a number of server instances to be set up and started quickly with only a small amount of configuration - so lets do that then...
Firstly we need to create a management user so we can access the domain console to start and stop managed server instances.  There's an add-user script in the bin directory at the root of the installation, we can run this to perform this task:
Now lets configure the domain to create 4 servers.  Managed servers are defined in the hosts.xml file located in the $EDG_HOME/domain/configuration directory.  By default two are already configured but two servers doesn't constitute a grid so we will have to increase this.  Here is the initial configuration, I'd already changed the highlighted "port-offset" field from 150 to 100...
And this is what it looked like after I added two more servers:
Note that for each of these servers the port-offset increases by another 100 (so we don't get port binding conflicts), both are set to "auto-start" (so the domain controller will start these servers automatically when it starts) and also the "topology.machine" is different (not important for this example but EDG can intelligently distribute data to ensure back-ups or copies are held on separate machines to ensure redundancy).
Now we have 4 servers or data nodes ready for our grid, next we should define a distributed cache to store our key/value data in.  As we will be running EDG in domain mode we can make this change globally in the domain.xml configuration file, again this is located in the $EDG_HOME/domain/configuration directory.  EDG already comes with some predefined caches but we will add our own anyway named "test-dist-cache"
Notice the attribute start="EAGER" tells EDG to initialize this cache on start up and virtual-nodes="10" will improve the data distribution across the nodes by sub-dividing positions on EDG's internal key hashing algorithm used to assign individual keys to data nodes.
Ok, so lets start up the EDG servers using the domain controller:
And use the user credentials we created in the first step to access the domain console application at the default URL http://localhost:9990
You should see all 4 data grid nodes running and may also notice that you can stop and start individual nodes directly from the console.  This is a great management feature and a vast improvement over previous JBoss console incarnations, as you've also seen it's super quick to get a cluster of servers up and running.
For more detailed information on the EDG cache instances connect jconsole to the running server instances and navigate the MBean tree to the jboss.infinispan domain, from here we can view the number of entries, hit ratios etc.
So we now have an EDG distributed cache up and running, we can control the individual nodes using the JBoss management console and we can view cache statistics using any JMX based tool.  All we need now is a client!
Here's a simple Hotrod client that will put a bunch of records into the grid and then run on-demand checks to ensure all the records can still be found.
public class HotrodClient {




public static void main(String[] args) throws IOException {






int records = 100;


String cache = "test-dist-cache";


Properties properties = new Properties();
properties.setProperty("testOnBorrow", "true");
properties.setProperty("testWhileIdle", "true");
properties.setProperty(ConfigurationProperties.SERVER_LIST, ";;;");




System.out.println("Starting the Hotrod Clientn");






RemoteCacheManager remoteCacheManager = new RemoteCacheManager(properties);


RemoteCache<String, String> remoteCache = remoteCacheManager.getCache(cache);





for (int i = 0; i < records; i++) {


remoteCache.put("key" + i, "value" + i);


System.out.println("Loaded " + records + " records into the EDG cachen");




BufferedReader bs = new BufferedReader(new InputStreamReader(System.in));






System.out.println("Press any key to check the records in the cache or 'X' to exit");


while (!(bs.readLine().equalsIgnoreCase("X"))) {


System.out.println("Checking to see how many of the " + records + " records can be found in the cache");
int found = 0;
for (int i = 0; i < records; i++) {
if (remoteCache.get("key" + i) != null) {
System.out.println("Found " + found + " of " + records + " records.");














This dependencies for the client application can be found in the client/java folder of the distribution:
Lets take a quick look at some of the interesting parts in the code.
As I mentioned earlier the EDG Hotrod client provides connection pooling, this is provided by the Apache commons-pool library so we can set connection pool properties for the client using the set of available parameters for this library.  See http://commons.apache.org/pool/ for a full list.
properties.setProperty("testOnBorrow", "true");
properties.setProperty("testWhileIdle", "true");
We also need to tell the client how to connect to the EDG grid, this is done using the Hotrod client property "infinispan.client.hotrod.server_list".




Don't be alarmed that all the data grid nodes are listed here, the Hotrod client only needs to be able to connect to one server in the list, once a connection is established the entire current EDG topology view is returned.  In practice you only need specify one EDG server address that is always available to ensure the client can connect to the whole grid.  A full list of client properties can be found here: http://docs.jboss.org/infinispan/5.1/apidocs/org/infinispan/client/hotrod/RemoteCacheManager.html




If we run the client we should see the output similar to the following:
You can see that the EDG cluster topology is returned immediately to the client on start up.  If we check the number of records in the cache now we'll get this response, hopefully!
Lets stop one of the EDG servers using the console:
 and repeat the test:
Look at the exception, the client attempted to use a connection to the server we just stopped.  There's no need to panic because its intelligent enough to discard the dead connection from the pool and also remove the server 4 from the clients view of the topology.  The data held on the EDG server is re-balanced and all records are still found.
And if we start the server we just stopped using the console and repeat the test we should see something like this in the client output:
Note that the restarted server is added to the clients topology view and into the connection pool.
You should try stopping more servers, restarting, checking the cache statistics out in jconsole etc All the original records should still be present with their redundant copies.
So that's it a quick look at JBoss Enterprise Data Grid and the Hotrod client, a powerful client-server caching architecture.