As any developer worth his salt will know, it’s always prudent to plan for disaster – no matter how confident you are in your ability!
So what happens when things go wrong with an application deployed to Weblogic? Are you on your own? Well, the best way to find out is to make something go wrong – so let’s do that!
There’s no need for a complex server configuration to investigate troubleshooting, so I’ve created a domain with a single admin server where I’ll be deploying a particularly buggy application. There are four servlets, each of which contains a different sort of bad practice. Using JMeter to simulate high load, the challenge will be to try to diagnose the problem in each!
Oracle’s JRockit comes with all the usual JVM tools and also provides some of its own. My favourite tool (because everyone needs favourite software) is Mission Control (jrmc). Here’s a quick overview of what it offers us for our scenario:
Mission Control is an eclipse based workbench for JVM analysis which has some great tools. Unfortunately, it's not free for normal use, thoughthere is a free developer license. Some of the functions and concepts you may befamiliar with from other tools, however, Mission Control gets to use jRockitspecific monitoring which provides extra detailed information! Flight Recorderand Memory Leak Detector are the tools I’ll be focussing on, but they are by nomeans the only tools available!
Flight Recorder is a pretty handyfunction of the JRockit JVM whereby events, GC activity and a host of otherinformation is logged. Usually, such comprehensive logging incurs a performancecost. Not so in this case; nearly all of this activity is being performed bythe JVM anyway – flight recorder simply remembers it.
Memory Leak Detector
Mission Control's standard console isn't always enough to drill down to the crux of memory leak issues. Memory leaks are often very subtle problems, so this tool is separate from the main console and can run on live data, meaning you can watch objects grow on the heap while the server is working!
Apache JMeter is the tool we’re going to use to drive our problem servlets with requests. If you’re following this scenario yourself, or have access to JMeter, I would encourage you to take a look at its capabilities. It’s quite powerful and can be used for much more than this sort of scenario.
|The examples loaded into JMeter should look like this. Click the play button to start invoking, the stop button next to it to stop.|
The first problem servlet, after being invoked immediately ramps up the CPU. It's clearly very busy with something, so that's our first clue!
|This is the console immediately after invoking the JMeter example and starting a FlightRecording. Note that starting and stopping a Flight Recording invokes a system.gc(), so watch out for GC blips at the start and end.|
After collecting the completed Flight Recording we can take a look around each section. Do keep your eyes peeled when looking at each section if you're unfamiliar with Mission Control - there are a lot of small tabs which give a lot of extra information. They're easily missable if you don't know they're there!
In the first section, "Memory", we get to see data on the heap and GC activity. The screenshot below should raise immediate red flags when you think that all this was taken over the course of a single minute! The red bar chart shows each GC event (the Y-axis shows how long it took in ms) and clearly shows a huge amount of work being done by garbage collection. This is a lot of unnecessary pause time and CPU usage!
Solution: There is excessive garbage collection. It's very bad practice to include system.gc() in your code for this very reason!
The first big difference to notice between this servlet and the first is that the server doesn't seem to be doing anything! While in the first case, the CPU activity ramped up to a relatively high level, there was almost no change here. Looking past the Memory and Code sections, the CPU/Threads section suddenly becomes very interesting!
|The area in blue shows the percentage of CPU being used by applications running on the server|
Investigating through the tabs of the CPU/Threads section, there immediately seems to be something wrong. A huge proportion of threads aren't doing anything. Looking more closely, we can see 30 threads sleeping, all but one of which are because of our application! This would explain why there is comparatively low CPU usage.
|All but one of the sleeping threads come from our problem servlet|
Solution: Very low CPU usage caused by lots of sleeping threads
After such low CPU activity, we're now back to a servlet showing a lot of activity! For comparison, here is the same screen shown in the first screenshot of servlet 2:
|Again, the area in blue is the CPU activity from our application|
The difference is huge! While we're looking at high CPU usage, let's look at the hot threads tab:
|Clicking on any of the threads listed gives details on what spawned them|
Here there are a lot of threads, all of which are very busy! Clicking on each reveals that they are all, with no exceptions, from Servlet3! By way of comparison, my flight recording from the equally busy Servlet1 has just two entries here, both of which are controlled by the JVM.
Solution: The servlet is doing an excessive amount of processing, which would cause other applications to become unresponsive.
After running the fourth JMeter example, driving Servlet4, I noticed a steady increase in the percentage of the heap being used. Below is a screenshot of the Heap Contents tab of the Memory section. It shows the same sort of graph as the Overview in the General section, but slightly clearer:
The increase is only slight, but remember we are only sampling a minute here (the default Flight Recording length, which can be extended) so our sample size is small. To investigate memory leaks, it's best to use the Memory Leak Detector I mentioned earlier. This time, after the Flight Recording had stopped, I didn't halt the JMeter test, so Memory Leak Detector had more data to work with.
|Memory Leak Detector shows each object's growth rate compared to others changing while the server is running.|
Here, we can see that the growth rate of byte objects on the heap is particularly high. Right click and choose to view allocation in the context menu to investigate the byte objects further. Now we know what is leaking, it would be useful to know where it is leaking from!
|Right clicking on an object give the option to see where the object is being allocated|
In this screenshot, we can see that 41.59% are coming from Servlet4.processRequest and all of those are coming from the doGet() method. Now we have a very good candidate and starting place to look for a leak!
Solution: There is a definite memory leak here where byte objects are created with strong references and so are not garbage collected.
In The Real World
These examples are deliberately as obvious as possible and we’re running them individually and without other applications running at the same time. In your production environments, these sorts of performance issues are much more subtle and hard to spot. It’s a good idea to use this sort of live monitoring tool regularly for your own curiosity just to get used to what “normal” looks like for your system. You may begin to see patterns in heap usage or behaviour of the garbage collector and so be better able to prepare for regular high-volumes of requests and make sure that batch process from another department doesn’t coincide with any significant event!