Posted by

Posted on

September 26, 2014

Posted under

Comments

JVM Flags – Part 8 – GC Logging

The last part of this series is about garbage collection logging and associated flags. The GC log is a highly important tool for revealing potential improvements to the heap and GC configuration or the object allocation pattern of the application. For each GC happening, the GC log provides exact data about its results and duration.

-XX:+PrintGC

The flag -XX:+PrintGC (or the alias -verbose:gc) activates the “simple” GC logging mode, which prints a line for every young generation GC and every full GC. Here is an example:

[GC 246656K->243120K(376320K), 0,0929090 secs]

[Full GC 243120K->241951K(629760K), 1,5589690 secs]

A line begins (in red) with the GC type, either “GC” or “Full GC”. Then follows (in blue) the occupied heap memory before and after the GC, respectively (separated by an arrow), and the current capacity of the heap (in parentheses). The line concludes with the duration of the GC (real time in seconds).

Thus, in the first line, 246656K->243120K(376320K) means that the GC reduced the occupied heap memory from 246656K to 243120K. The heap capacity at the time of GC was 376320K, and the GC took 0.0929090 seconds.

The simple GC logging format is independent of the GC algorithm used and thus does not provide any more details. In the above example, we cannot even tell from the log if the GC moved any objects from the young to the old generation. For that reason, detailed GC logging is more useful than the simple one.

-XX:+PrintGCDetails

If we use -XX:+PrintGCDetails instead of -XX:+PrintGC, we activate the “detailed” GC logging mode which differs depending on the GC algorithm used. We start by taking a look at the output produced by a young generation GC using the Throughput Collector. For better readability, I split the output in several lines and indented some of them. In the actual log, this is just a single line and less readable for humans.

[GC

[PSYoungGen: 142816K->10752K(142848K)] 246648K->243136K(375296K),

0,0935090 secs

]

[Times: user=0,55 sys=0,10, real=0,09 secs]

We can recognize a couple of elements from the simple GC log: We have a young generation GC (red) which reduced the occupied heap memory from 246648K to 243136K (blue) and took 0.0935090 seconds. In addition to that, we obtain information about the young generation itself: the collector used (orange) as well as its capacity and occupancy (green). In our example, the “PSYoungGen” collector was able to reduce the occupied young generation heap memory from 142816K to 10752K.

Since we know the young generation capacity, we can easily tell that the GC was triggered because otherwise the young generation would not have been able to accommodate another object allocation: 142816K of the available 142848K were already used. Furthermore, we can conclude that most of the objects removed from the young generation are still alive and must have been moved to the old generation: Comparing the green and blue output shows that even though the young generation was almost completely emptied, the total heap occupancy remained roughly the same.

The “Times” section of the detailed log contains information about the CPU time used by the GC, separated into user space (“user”) and kernel space (“sys”) of the operating system. Also, it shows the real time (“real”) that passed while the GC was running (which, however, with 0.09 is just a rounded value of the 0.0935090 seconds also shown in the log). If, like in our example, the CPU time is considerably higher than the real time passed, we can conclude that the GC was run using multiple threads. In that case, the CPU time logged is the sum of the CPU times of all GC threads. And indeed, I can reveal that the collector used 8 threads in our example.

Now consider the output of a full GC.

[Full GC

[PSYoungGen: 10752K->9707K(142848K)]

[ParOldGen: 232384K->232244K(485888K)] 243136K->241951K(628736K)

[PSPermGen: 3162K->3161K(21504K)],

1,5265450 secs

]

[Times: user=10,96 sys=0,06, real=1,53 secs]

In addition to details about the young generation, the log also provides us with details about the old and permanent generations. For all three generations, we can see the collector used, the occupancy before and after GC, and the capacity at the time of GC. Note that each number shown for the total heap (blue) is equal to the sum of the respective numbers of the young and old generations. In our example, 241951K of the total heap are occupied, 9707K of which are in the young generation and 232244K of which belong to the old generation. The full GC took 1.53 seconds, and the CPU time of 10.96 seconds in user space shows that the GC used multiple threads (like above, 8 threads).

The detailed output for the different generations enables us to reason about the GC cause. If, for any generation, the log states that its occupancy before GC was almost equal to its current capacity, it is likely that this generation triggered the GC. However, in the above example, this does not hold for any of the three generations, so what caused GC in this case? With the Throughput Collector, this can actually happen if GC ergonomics (see part 6 of this series) decides that a GC should be run already before one of the generations gets exhausted.

A full GC may also happen when it is explicitly requested, either by the application or via one of the external JVM interfaces. Such a “system GC” can be identified easily in the GC log because in that case the line starts with “Full GC (System)” instead of “Full GC”.

For the Serial Collector, the detailed GC log is very similar to that of the Throughput Collector. The only real difference is that the various sections have different names because other GC algorithms are being used (for example, the old generation section is called “Tenured” instead of “ParOldGen”). It is good that the exact names of the collectors are used because it enables us to conclude just from the log some of the garbage collection settings used by the JVM.

For the CMS Collector, the detailed log for young generation GCs is very similar to that of the Throughput Collector as well, but the same cannot be said for old generation GCs. With the CMS Collector, old generation GCs are run concurrently to the application using different phases. As such, the output itself is different from the output for full GCs. Additionally, the lines for the different phases are usually separated in the log by lines for young generation GCs that happen while the concurrent collection is running. Yet, being familiar with all the elements of GC logging that we have already seen for the other collectors, it is not difficult to understand the logs for the different phases. Only when interpreting durations we should be particularly careful and keep in mind that most of the phases run concurrently to the application. Thus, as opposed to stop-the-world collections, long durations for individual phases (or for a complete GC cycle) do not necessarily indicate a problem.

Ad we know from part 7 of this series, full GCs can still happen when the CMS Collector does not complete a CMS cycle in time. If that happens, the GC log additionally contains a hint as to what caused the full GC, e.g., the well-known “concurrent mode failure”.

In order to keep this article reasonably short, I will refrain from giving a detailed description of the CMS Collector GC log. Also, one of the actual authors of the collector has already published a great explanation here, which I highly recommend for reading.

-XX:+PrintGCTimeStamps and -XX:+PrintGCDateStamps

It is possible to add time and date information to the (simple or detailed) GC log. With -XX:+PrintGCTimeStamps a timestamp reflecting the real time passed in seconds since JVM start is added to every line. An example:

0,185: [GC 66048K->53077K(251392K), 0,0977580 secs]

0,323: [GC 119125K->114661K(317440K), 0,1448850 secs]

0,603: [GC 246757K->243133K(375296K), 0,2860800 secs]

And if we specify -XX:+PrintGCDateStamps each line starts with the absolute date and time when it was written:

2014-01-03T12:08:38.102-0100: [GC 66048K->53077K(251392K), 0,0959470 secs]

2014-01-03T12:08:38.239-0100: [GC 119125K->114661K(317440K), 0,1421720 secs]

2014-01-03T12:08:38.513-0100: [GC 246757K->243133K(375296K), 0,2761000 secs]

It is possible to combine the two flags if both outputs are desired. I would recommend to always specify both flags because the information is highly useful in order to correlate GC log data with data from other sources.

-Xloggc

By default the GC log is written to stdout. With -Xloggc:<file> we may instead specify an output file. Note that this flag implicitly sets -XX:+PrintGC and -XX:+PrintGCTimeStamps as well. Still, I would recommend to set these flags explicitly if desired, in order to safeguard yourself against unexpected changes in new JVM versions.

“Manageable” Flags

A frequently discussed question is whether GC logging should be activated for production system JVMs. The overhead of GC logging is usually rather small, so I have a clear tendency towards “yes”. However, it is good to know that we do not have to decide in favor of (or against) GC logging when starting the JVM.

The HotSpot JVM has a special (but very small) category of flags called “manageable”. For manageable flags, it is possible to change their values at run time. All the flags that we have discussed here and that start with “PrintGC” belong to the “manageable” category. Thus, we can activate or deactivate GC logging for a running JVM whenever and as often as we want. In order to set manageable flags we can, for example, use the jinfo tool shipped with the JDK or use a JMX client and call the setVMOption operation of the HotSpotDiagnostic MXBean.

References

The entire series of the JVM Flags (Part 1 thru Part 8) is the creation of the maestro Patrick Peschlow. Special thanks to him for compiling these facts in such a fluent manner.

Posted by

Arpit Khandelwal

Posted on

September 26, 2014

Posted under

Uncategorized

Comments

1 Comment

JVM Flags – Part 7 – CMS Collector

The Concurrent Mark Sweep Collector (“CMS Collector”) of the HotSpot JVM has one primary goal: low application pause times. This goal is important for most interactive applications like web applications. Before we take a look at the relevant JVM flags, let us briefly recap the operation of the CMS Collector and the main challenges that may arise when using it.

Just like the Throughput Collector (see part 6 of the series), the CMS Collector handles objects in the old generation, yet its operation is much more complex. The Throughput Collector always pauses the application threads, possibly for a considerable amount of time, which however enables its algorithms to safely ignore the application. In contrast to that, the CMS Collector is designed to run mostly concurrent to the application threads and only cause few (and short) pause times. The downside of running GC concurrently to the application is that various synchronization and data inconsistency issues may arise. In order to achieve safe and correct concurrent execution, a GC cycle of the CMS Collector is divided into a number of consecutive phases.

Phases of the CMS Collector

A GC cycle of the CMS Collector consists of six phases. Four of the phases (the names of which start with “Concurrent”) are run concurrently to the actual application while the other two phases need to stop the application threads.

Initial Mark: The application threads are paused in order to collect their object references. When this is finished, the application threads are started again.
Concurrent Mark: Starting from the object references collected in phase 1, all other referenced objects are traversed.
Concurrent Preclean: Changes to object references made by the application threads while phase 2 was running are used to update the results from phase 2.
Remark: As phase 3 is concurrent as well, further changes to object references may have happened. Therefore, the application threads are stopped once more to take any such updates into account and ensure a correct view of referenced objects before the actual cleaning takes place. This step is essential because it must be avoided to collect any objects that are still referenced.
Concurrent Sweep: All objects that are not referenced anymore get removed from the heap.
Concurrent Reset: The collector does some housekeeping work so that there is a clean state when the next GC cycle starts.

A common misconception is that the CMS collector runs fully concurrent to the application. We have seen that this is not the case, even if the stop-the-world phases are usually very short when compared to the concurrent phases.

It should be noted that, even though the CMS Collector offers a mostly concurrent solution for old generation GCs, young generation GCs are still handled using a stop-the-world approach. The rationale behind this is that young generation GCs are typically short enough so that the resulting pause times are satisfactory even for interactive applications.

Challenges

When using the CMS Collector in real-world applications, we face two major challenges that may create a need for tuning:

Heap fragmentation
High object allocation rate

Heap fragmentation is possible because, unlike the Throughput Collector, the CMS Collector does not contain any mechanism for defragmentation. As a consequence, an application may find itself in a situation where an object cannot be allocated even though the total heap space is far from exhausted – simply because there is no consecutive memory area available to fully accommodate the object. When this happens, the concurrent algorithms do not help anymore and thus, as a last resort, the JVM triggers a full GC. Recall that a full GC runs the algorithm used by the Throughput Collector and thus resolves the fragmentation issues – but it also stops the application threads. Thus, despite all the concurrency that the CMS Collector brings, there is still a risk of a long stop-the-world pause happening. This is “by design” and cannot be switched off – we can only reduce its likelihood by tuning the collector. Which is problematic for interactive applications that would like to have a 100% guarantee of being safe from any noticeable stop-the-world pauses.

The second challenge is high object allocation rate of the application. If the rate at which objects get instantiated is higher than the rate at which the collector removes dead objects from the heap, the concurrent algorithm fails once again. At some point, the old generation will not have enough space available to accommodate an object that is to be promoted from the young generation. This situation is referred to as “concurrent mode failure”, and the JVM reacts just like in the heap fragmentation scenario: It triggers a full GC.

When one of these scenarios manifests itself in practice (which, as is so often the case, usually happens on a production system), it often turns out that there is an unnecessary large amount of objects in the old generation. One possible countermeasure is to increase young generation size, in order to prevent premature promotions of short-lived objects into the old generation. Another approach is to use a profiler, or take heap dumps of the running system, to analyze the application for excessive object allocation, identify these objects, and eventually reduce the amount of objects allocated.

In the following we will take a look at the most relevant JVM flags available for tuning the CMS Collector.

-XX:+UseConcMarkSweepGC

This flag is needed to activate the CMS Collector in the first place. By default, HotSpot uses the Throughput Collector instead.

-XX:+UseParNewGC

When the CMS collector is used, this flag activates the parallel execution of young generation GCs using multiple threads. It may seem surprising at first that we cannot simply reuse the flag -XX:+UseParallelGC known from the Throughput Collector, because conceptually the young generation GC algorithms used are the same. However, since the interplay between the young generation GC algorithm and the old generation GC algorithm is different with the CMS collector, there are two different implementations of young generation GC and thus two different flags.

Note that with recent JVM versions -XX:+UseParNewGC is enabled automatically when -XX:+UseConcMarkSweepGC is set. As a consequence, if parallel young generation GC is not desired, it needs to be disabled by setting -XX:-UseParNewGC.

-XX:+CMSConcurrentMTEnabled

When this flag is set, the concurrent CMS phases are run with multiple threads (and thus, multiple GC threads work in parallel with all the application threads). This flag is already activated by default. If serial execution is preferred, which may make sense depending on the hardware used, multithreaded execution can be deactivated via -XX:-CMSConcurrentMTEnabled.

-XX:ConcGCThreads

The flag -XX:ConcGCThreads=<value> (in earlier JVM versions also known as -XX:ParallelCMSThreads) defines the number of threads with which the concurrent CMS phases are run. For example, value=4 means that all concurrent phases of a CMS cycle are run using 4 threads. Even though a higher number of threads may well speed up the concurrent CMS phases, it also causes additional synchronization overhead. Thus, for a particular application at hand, it should be measured if increasing the number of CMS threads really brings an improvement or not.

If this flag is not explicitly set, the JVM computes a default number of parallel CMS threads which depends on the value of the flag -XX: ParallelGCThreads known from the Throughput Collector. The formula used is ConcGCThreads = (ParallelGCThreads + 3)/4. Thus, with the CMS Collector, the flag -XX:ParallelGCThreads does not only affect stop-the-world GC phases but also the concurrent phases.

In summary, there are quite a few ways of configuring multithreaded execution of the CMS collector. Precisely for this reason, it is recommended to first run the CMS Collector with its default settings and then measure if there is a need for tuning at all. Only if measurements in a production system (or a production-like test system) show that the pause time goals of the application are not reached, GC tuning via these flags should be considered.

-XX:CMSInitiatingOccupancyFraction

The Throughput Collector starts a GC cycle only when the heap is full, i.e., when there is not enough space available to store a newly allocated or promoted object. With the CMS Collector, it is not advisable to wait this long because it the application keeps on running (and allocating objects) during concurrent GC. Thus, in order to finish a GC cycle before the application runs out of memory, the CMS Collector needs to start a GC cycle much earlier than the Throughput Collector.

As different applications have different object allocation patterns, the JVM collects run time statistics about the actual object allocations (and deallocations) it observes and uses them to determine when to start a CMS GC cycle. To bootstrap this process, the JVM takes a hint when to start the very first CMS run. The hint may be set via -XX:CMSInitiatingOccupancyFraction=<value> where value denotes the utilization of old generation heap space in percent. For example, value=75 means that the first CMS cycle starts when 75% of the old generation is occupied. Traditionally, the default value of CMSInitiatingOccupancyFraction is 68 (which was determined empirically quite some time ago).

-XX+UseCMSInitiatingOccupancyOnly

We can use the flag -XX+UseCMSInitiatingOccupancyOnly to instruct the JVM not to base its decision when to start a CMS cycle on run time statistics. Instead, when this flag is enabled, the JVM uses the value of CMSInitiatingOccupancyFraction for every CMS cycle, not just for the first one. However, keep in mind that in the majority of cases the JVM does a better job of making GC decisions than us humans. Therefore, we should use this flag only if we have good reason (i.e., measurements) as well as really good knowledge of the lifecycle of objects generated by the application.

-XX:+CMSClassUnloadingEnabled

In contrast to the Throughput Collector, the CMS Collector does not perform GC in the permanent generation by default. If permanent generation GC is desired, it can be enabled via -XX:+CMSClassUnloadingEnabled. In earlier JVM versions, it may be required to additionally set the flag -XX:+CMSPermGenSweepingEnabled. Note that, even if this flag is not set, there will be an attempt to garbage-collect permanent generation once it runs out of space, but the collection will not be concurrent – instead, once again, a full GC will be run.

-XX:+CMSIncrementalMode

This flag activates the incremental mode of the CMS Collector. Incremental mode pauses the concurrent CMS phases regularly, so as to yield completely to the application threads. As a consequence, the collector will take longer to complete a whole CMS cycle. Therefore, using incremental mode only makes sense if it has been measured that the threads running a normal CMS cycle interfere too much with the application threads. This happens rather rarely on modern server hardware which usually has enough processors available to accommodate concurrent GC.

-XX:+ExplicitGCInvokesConcurrent and -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses

Nowadays, the widely accepted best practice is to refrain from explicitly invoking GC (a so-called “system GC”) by calling System.gc() in the application. While this advice holds regardless of the GC algorithm used, it is worth mentioning that a system GC is an especially unfortunate event when the CMS Collector is used, because it triggers a full GC by default. Luckily, there is a way to change the default. The flag -XX:+ExplicitGCInvokesConcurrent instructs the JVM to run a CMS GC instead of a full GC whenever system GC is requested. There is a second flag, -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses, which additionally ensures that the permanent generation is included into the CMS GC in case of a system GC request. Thus, by using these flags, we can safeguard ourselves against unexpected stop-the-world system GCs.

-XX:+DisableExplicitGC

And while we are on the subject… this is a good opportunity to mention the flag -XX:+DisableExplicitGC which tells the JVM to completely ignore system GC requests (regardless of the type of collector used). For me, this flag belongs to a set of “default” flags that can be safely specified on every JVM run without further thinking.

Posted by

Arpit Khandelwal

Posted on

September 26, 2014

Posted under

Uncategorized

Comments

Leave a comment

JVM Flags – Part 6 – Throughput Collector

For most application areas that we find in practice, a garbage collection (GC) algorithm is being evaluated according to two criteria:

The higher the achieved throughput, the better the algorithm.
The smaller the resulting pause times, the better the algorithm.

Let us first clarify the terms “throughput” and “pause times” in the context of GC. The JVM always executes a GC in dedicated threads, the “GC threads”. Whenever GC threads are active, they compete against the actual “application threads” for available processors and thus CPU time. Simplified just a little bit, we refer by “throughput” to the fraction of total program execution time at which the application threads are running. For example, a throughput of 99/100 means that on average the application threads are running 99 out of 100 seconds of program execution time, while the GC threads are only running for one second during the same time span.

The term “pause time” refers to a time span in which the application threads are paused completely in favor of the GC threads. For example, a pause time of 100 milliseconds during a GC means that no application thread was active during that 100 millisecond interval. If we then refer to an “average pause time” of 100 milliseconds for a running application, we state that all pause times observed for that application had an average length of 100 milliseconds. Similarly, a “maximum pause time” of 100 milliseconds means that none of the pause times observed were larger than 100 milliseconds.

Throughput vs. pause times

A high throughput is desirable because only the application threads perform “productive” work in the perception of the end user of the application. Intuitively, an application runs faster if the throughput is higher. Low pause times are desirable as well, because from the end user’s perspective a hanging application is always undesirable regardless of whether the stalls are being caused by GC or other reasons. Depending on the type of application, even short pauses of 200 milliseconds can disrupt end user experience. Thus, it is important to have a low maximum pause time, in particular for an interactive application.

Unfortunately, “high throughput” and “low pause times” are competing goals. Think about it this way, again a bit simplified for the sake of clarity: A GC requires certain preconditions in order to run safely. For example, it must be guaranteed that application threads do not modify the state of objects while the GC threads try to decide which objects are still referenced and which ones are not. For this reason, the application threads must be stopped during a GC (or, depending on the algorithm used, only during certain phases of a GC). This, however, causes additional costs for thread scheduling: direct costs through context switches and indirect costs because of cache effects. Together with the costs for additional JVM-internal safety measures, this means that each GC comes along with some non-negligible overhead, which adds up with the time taken by the GC threads to perform their actual work. Therefore, we can maximize throughput by running the GC as seldom as possible, i.e., only when it is unavoidable, to save all the overhead associated with it.

However, executing the GC only infrequently means that whenever a GC is run it has much more work to do, as the number of objects that accumulated on the heap in the meantime is much higher. A single GC takes more time until completion, which in turn causes higher average and maximum pause times. Thus, with low pause times in mind, it would be desirable to run the GC more frequently so that each single run completes more quickly. This in turn adds overhead and causes throughput to decline, and we are back where we started.

In summary, when designing (or using!) a GC algorithm we have to decide what we are aiming for: A GC algorithm may target one of the two goals only (i.e., solely focus on maximum throughput or on minimal pause times) or try to find a compromise between them.

Garbage collection on the HotSpot JVM

We already discussed the young generation GC in part 5 of this series. For the old generation, the HotSpot JVM mainly offers two classes of GC algorithms (leaving aside the new G1 GC algorithm for now). The first class contains algorithms that try to maximize throughput while the second class tries to minimize pause times. Today we focus on the first, “throughput-oriented” class.

As we want to focus on configuration flags, I will only give a brief overview of the throughput-oriented GC algorithms offered by HotSpot. The GC algorithm is triggered when an object allocation in the old generation fails due to lack of space (usually, the “allocation” is in fact a promotion of an object from the young generation). Starting at so-called “GC roots”, the GC then searches the heap for reachable objects and marks them as alive. Afterwards, the GC moves the live objects within the old generation so that they occupy a single, non-fragmented memory block, and takes a note that the remaining memory area is free. That is, we don’t follow a copying strategy into a different heap area, like the young generation GC algorithm does. Instead, we keep all objects in the same heap area, thereby defragmenting that area. The collectors use one or more threads to perform GC. When more than one thread is used, the different steps of the algorithm are subdivided such that each GC thread mostly works in its own area without interfering with others. During a GC, all application threads are paused, getting restarted only when the GC is finished. Now let us take a look at the most important flags regarding the throughput-oriented GC algorithms.

-XX:+UseSerialGC

We use this flag to activate the serial, i.e., single-threaded version of the throughput-oriented garbage collector. Both the young generation and the old generation GC will be executed by a single GC thread only. This flag is to be recommended for JVMs that only have a single processor core available. In such a situation, using multiple GC threads would even be counterproductive because these threads would compete for CPU resources and cause synchronization overhead but never actually run in parallel.

-XX:+UseParallelGC

With this flag, we tell the JVM to execute the young generation GC in parallel using multiple GC threads. With Java 6, in my opinion, this flag should not be used because -XX:+UseParallelOldGC is clearly preferable. Note that with Java 7 the situation has changed a bit (see, e.g., this overview) so that -XX:+UseParallelGC may be used to the same effect as -XX:+UseParallelOldGC.

-XX:+UseParallelOldGC

The naming of this flag is a little unfortunate because “old” sounds like “deprecated”. However, “old” actually refers to the old generation, which explains why -XX:+UseParallelOldGC is preferable over -XX:+UseParallelGC: in addition to a parallel young generation GC, it also activates parallel old generation GC. I recommend using this flag whenever high throughput it desired and the JVM has two or more processor cores available.

As a sidenote, the parallel versions of the throughput-oriented HotSpot GC algorithms are often called “throughput collectors” (or also: “the Throughput Collector”) because they aim at increased throughput through parallel execution.

-XX:ParallelGCThreads

With -XX:ParallelGCThreads=<value> we can specify the number of GC threads to use for parallel GC. For example, with -XX:ParallelGCThreads=6 each parallel GC will be executed with six threads. If we don’t explicitly set this flag, the JVM will use a default value which is computed based on the number of available (virtual) processors. The determining factor is the value N returned by the Java method Runtime.availableProcessors(). For N <= 8 parallel GC will use just as many, i.e., N GC threads. For N > 8 available processors, the number of GC threads will be computed as 3+5N/8.

Using the default setting makes most sense when the JVM uses the system and its processors exclusively. However, if more than one JVM (or other CPU-hungry systems) are all running on the same machine, we should use -XX:ParallelGCThreads in order to reduce the number of GC threads to an adequate value. For example, if four server JVMs are running on a machine with 16 processor cores, then -XX:ParallelGCThreads=4 is a sensible choice so that GCs of different JVMs don’t interfere with each other.

-XX:-UseAdaptiveSizePolicy

The throughput collectors offer an interesting (but common, at least on modern JVMs) mechanism to improve user-friendliness of GC configuration. This mechanism is part of what is known as “ergonomics”, a concept introduced for HotSpot with Java 5. With ergonomics, the garbage collector may dynamically apply modifications to the different heap areas sizes as well as the GC settings if it has evidence that these modifications would improve GC performance. The precise meaning of “improve GC performance” may be specified by the user via the flags -XX:GCTimeRatio and -XX:MaxGCPauseMillis (see below).

It is important to know that ergonomics is activated by default – which is fine, as adaptive behavior is one of the biggest strengths of the JVM. Still, sometimes we may have a pretty clear idea of what settings are best for a particular application, and in these cases we might not want the JVM to mess around with our settings. Whenever we find ourselves in such a situation, we may consider deactivating some of the ergonomics by setting -XX:-UseAdaptiveSizePolicy.

-XX:GCTimeRatio

With -XX:GCTimeRatio=<value> we tell the JVM a target value for the throughput to achieve. More precisely, a value of -XX:GCTimeRatio=N specifies a target fraction of N/(N+1) for the execution time of the application threads (related to the total program execution time). For example, with -XX:GCTimeRatio=9 we demand that the application threads should be active for at least 9/10 of the total execution time (and, accordingly, the GC threads for the remaining 1/10). Based on measurements at run time, the JVM will then try to modify the heap and GC configuration such that the the target throughput is reached. The default value for -XX:GCTimeRatio is 99, i.e., the application threads should run for at least 99 percent of the total execution time.

-XX:MaxGCPauseMillis

The flag -XX:MaxGCPauseMillis=<value> tells the JVM a target value (in milliseconds) for the maximum pause time. At run time, the throughput collector then computes statistics (a weighted average and the standard deviation) over the pause times it observes. If the statistics suggest that there is a risk of experiencing pause times exceeding the target value, the JVM modifies heap and GC settings in order to reduce them. Note that the statistics are computed separately for the young and old generation GCs. Also note that, by default, there is no target value set for the maximum pause time.

If both target values for the maximum pause time and the minimum throughput are set, achieving the maximum pause time goal has higher priority. Of course, there is no guarantee at all that the JVM will achieve either goal, even though it will try hard to do so. In the end, everything depends on the behavior of the application at hand.

When setting a maximum pause time target, we should take care not to choose too small a value. As we know by now, in order to keep pause times low, the JVM has to increase the total number of GCs which may seriously impact the achievable throughput. That’s why for applications that require low pause times as their primary goal (which is the case for most web applications), I would recommend to not use the throughput collector at all, but instead switch to the CMS collector. The CMS collector will be the topic of the next part of this series.

Posted by

Arpit Khandelwal

Posted on

September 26, 2014

Posted under

JVM, JVM Flags

Comments

Leave a comment

JVM Flags – Part 5 – Young Generation Garbage Collection

In this part of our series we focus on one of the major areas of the heap, the “young generation”. First of all, we discuss why an adequate configuration of the young generation is so important for the performance of our applications. Then we move on to learn about the relevant JVM flags.

From a purely functional perspective, a JVM does not need a young generation at all – it can do with a single heap area. The sole reason for having a young generation in the first place is to optimize the performance of garbage collection (GC). More specifically, the separation of the heap into a young generation and an old generation has two benefits: It simplifies the allocation of new objects (because allocation only affects the young generation) and it allows for a more efficient cleanup of objects not needed anymore (by using different GC algorithms in the two generations).

Extensive measurements across a wide range of object-oriented programs have shown that many applications share a common characteristic: Most objects “die” young, i.e., after their creation they are not referenced for long in the program flow. Also, it has been observed that young objects are rarely referenced by older objects. Now if we combine these two obsrvations, it becomes apparent that it is desirable for GC to have quick access to young objects – for example in a separate heap area called the “young generation”. Within this heap area, GC can then identify and collect “dead” young objects quickly without having to search them between all the old objects that will still live on the heap for a long time.

The Sun/Oracle HotSpot JVM further divides the young generation into three sub-areas: one large area named “Eden” and two smaller “survivor spaces” named “From” and “To”. As a rule, new objects are allocated in “Eden” (with the exception that if a new object is too large to fit into “Eden” space, it will be directly allocated in the old generation). During a GC, the live objects in “Eden” first move into the survivor spaces and stay there until they have reached a certain age (in terms of numbers of GCs passed since their creation), and only then they are transferred to the old generation. Thus, the role of the survivor spaces is to keep young objects in the young generation for a little longer than just their first GC, in order to be able to still collect them quickly should they die soon afterwards.

Based on the assumption that most of the young objects may be deleted during a GC, a copying strategy (“copy collection”) is being used for young generation GC. At the beginning of a GC, the survivor space “To” is empty and objects can only exist in “Eden” or “From”. Then, during the GC, all objects in “Eden” that are still being referenced are moved into “To”. Regarding “From”, the still referenced objects in this space are handled depending on their age. If they have not reached a certain age (“tenuring threshold”), they are also moved into “To”. Otherwise they are moved into the old generation. At the end of this copying procedure, “Eden” and “From” can be considered empty (because they only contain dead objects), and all live objects in the young generation are located in “To”. Should “To” fill up at some point during the GC, all remaining objects are moved into the old generation instead (and will never return). As a final step, “From” and “To” swap their roles (or, more precisely, their names) so that “To” is empty again for the next GC and “From” contains all remaining young objects.

As a summary, an object is usually born in “Eden” and then alternates between the survivor spaces on each young generation GC. If the objects survives until a certain number of young generation GCs has passed, it will finally be moved into the old generation and stay there with all other long-lived objects. When the object eventually dies in the old generation, it has to be collected with higher effort, by one of the more heavyweight GC algorithms (a plain copy collection cannot be used here – there simply is no place to copy to).

It now becomes clear why young generation sizing is so important: If the young generation is too small, short-lived objects will quickly be moved into the old generation where they are harder to collect. Conversely, if the young generation is too large, we will have lots of unnecessary copying for long-lived objects that will later be moved to the old generation anyway. Thus we need to find a compromise somewhere between small and large young generation size. Unfortunately, finding the right compromise for a particular application can often only be done by systematic measurement and tuning. And that’s where the JVM flags come into play.

-XX:NewSize and -XX:MaxNewSize

Similar to the total heap size (with -Xms and -Xmx) it is possible to explicitly set a lower and upper bound for the size of the young generation. However, when setting -XX:MaxNewSize we need to take into account that the young generation is only one part of the heap and that the larger we choose its size the smaller the old generation will be. For stability reasons it is not allowed to choose a young generation size larger than the old generation, because in the worst case it may become necessary for a GC to move all objects from the young generation into the old generation. Thus -Xmx/2 is an upper bound for -XX:MaxNewSize.

For performance reasons we may also specify the initial size of the young generation using the flag -XX:NewSize. This is useful if we know the rate at which young objects are being allocated (for example because we measured it!) and can save some of the costs required for slowly growing the young generation to that size over time.

-XX:NewRatio

It is also possible to specify the young generation size in relation to the size of the old generation. The potential advantage of this approach is that the young generation will grow and shrink automatically when the JVM dynamically adjusts the total heap size at run time. The flag -XX:NewRatio allows us to specify the factor by which the old generation should be larger than the young generation. For example, with -XX:NewRatio=3 the old generation will be three times as large as the young generation. That is, the old generation will occupy 3/4 and the young generation will occupy 1/4 of the heap.

If we mix absolute and relative sizing of the young generation, the absolute values always have precedence. Consider the following example:

$ java -XX:NewSize=32m -XX:MaxNewSize=512m -XX:NewRatio=3 MyApp

With these settings, the JVM will try to size the young generation at one third of the old generation size, but it will never let young generation size fall below 32 MB or exceed 512 MB.

There is no general rule if absolute or relative young generation sizing is preferable. If we know the memory usage of our application well, it can be advantageous to specify a fixed size both for the total heap and the young generation, and it can also be useful to specify a ratio. If we only know a little or maybe nothing at all about our application in this respect, the correct approach is to just let the JVM do the work and not to mess around with the flags. If the application runs smoothly, we can be happy that we didn’t put in extra effort where none was needed. And should we encounter performance problems or OutOfMemoryErrors, we would still need to first perform a series of meaningful measurements to narrow down the root cause of the problem before moving on to tuning.

-XX:SurvivorRatio

The flag -XX:SurvivorRatio is similar to -XX:NewRatio but applies to the areas inside the young generation. The value of -XX:SurvivorRatio specifies how large “Eden” should be sized relative to one of the two survivor spaces. For example, with -XX:SurvivorRatio=10 we dimension “Eden” ten times as large as “To” (and at the same time ten times as large as “From”). As a result, “Eden” occupies 10/12 of the young generation while “To” and “From” each occupy 1/12. Note that the two survivor spaces are always equal in size.

What effect does survivor space sizing have? Suppose that the survivor spaces are very small compared to “Eden”. Then we have lots of space in “Eden” for newly allocated objects, which is desirable. If all these objects can be collected during the next GC, “Eden” is empty again and everything is fine. However, if some of these young objects are still being referenced, we have only little space in the survivor spaces to accommodate them. As a consequence, most of these objects will be moved to the old generation right after their first GC, which is not desirable. Now let us consider the opposite situation: Suppose that the survivor spaces are relatively large in size. Then they have lots of space to fulfill their main purpose, to accommodate objects that survive one or more GCs but still die young. However, the smaller “Eden” space will be exhausted more quickly, which increases the number of young generation GCs performed. This is undesirable.

In summary, we want to minimize the number of short-lived objects that are prematurely moved into the old generation, but we also want to minimize the number and duration of young generation GCs. Once again we need to find a compromise, which in turn depends on the characteristics of the application at hand. A good starting point for finding an adequate compromise is to learn about the age distribution of the objects in the particular application.

-XX:+PrintTenuringDistribution

With the flag -XX:+PrintTenuringDistribution we tell the JVM to print the age distribution of all objects contained in the survivor spaces on each young generation GC. Take the following example:

Desired survivor size 75497472 bytes, new threshold 15 (max 15)

– age 1: 19321624 bytes, 19321624 total

– age 2: 79376 bytes, 19401000 total

– age 3: 2904256 bytes, 22305256 total

The first line tells us that the target utilization of the “To” survivor space is about 75 MB. It also shows some information about the “tenuring threshold”, which represents the number of GCs that an object may stay in the young generation before it is moved into the old generation (i.e., the maximum age of the object before it gets promoted). In this example, we see that the current tenuring threshold is 15 and that its maximum value is 15 as well.

The next lines show, for each object age lower than the tenuring threshold, the total number of bytes of all objects that currently have that age (if no objects currently exist for a certain age, that line is omitted). In the example, about 19 MB have already survived one GC, about 79 KB have survived two GCs, and about 3 MB have survived three GCs. At the end of each line, we see the accumulated byte count of all objects up to that age. Thus, the “total” value in the last line indicates that the “To” survivor space currently contains about 22 MB of object data. As the target utilization of “To” is 75 MB and the current tenuring threshold is 15, we can conclude that no objects have to be promoted to the old generation as part of the current young generation GC. Now suppose that the next GC leads to the following output:

Desired survivor size 75497472 bytes, new threshold 2 (max 15)

– age 1: 68407384 bytes, 68407384 total

– age 2: 12494576 bytes, 80901960 total

– age 3: 79376 bytes, 80981336 total

– age 4: 2904256 bytes, 83885592 total

Let us compare the output to the previous tenuring distribution. Apparently, all the objects of age 2 and 3 from the previous output are still located in “To”, because here we see exactly the same number of bytes printed for age 3 and 4. We can also conclude that some of the objects in “To” have been successfully collected by the GC, because now we only have 12 MB of objects of age 2 while in the previous output we had 19 MB listed for age 1. Finally, we see that about 68 MB of new objects, shown at age 1, have been moved from “Eden” into “To” during the last GC.

Note that the total number of bytes in “To” – in this case almost 84 MB – is now larger than the desired number of 75 MB. As a consequence, the JVM has reduced the tenuring threshold from 15 to 2, so that with the next GC some of the objects will be forced to leave “To”. These objects will then either be collected (if they have died in the meantime) or moved to the old generation (if they are still referenced).

-XX:InitialTenuringThreshold, -XX:MaxTenuringThreshold and -XX:TargetSurvivorRatio

The tuning knobs shown in the output of -XX:+PrintTenuringDistribution can be adjusted by various flags. With -XX:InitialTenuringThreshold and -XX:MaxTenuringThreshold we can set the initial and maximum value of the tenuring threshold, respectively. Additionally, we can use -XX:TargetSurvivorRatio to specify the target utilization (in percent) of “To” at the end of a young generation GC. For example, the combination -XX:MaxTenuringThreshold=10 -XX:TargetSurvivorRatio=90 sets an upper bound of 10 for the tenuring threshold and a target utilization of 90 percent for the “To” survivor space.

While there are different approaches to use these flags to tune young generation behavior, no general guideline is available. We restrict ourselves to two cases that are pretty clear:

If the tenuring distribution shows that many objects just grow older and older before finally reaching the maximum tenuring threshold, this indicates that the value of -XX:MaxTenuringThreshold may be too large.
If the value of -XX:MaxTenuringThreshold is larger than 1 but most objects never reach an age larger than 1, we should take a look at the target utilization of “To”. Should the target utilization never be reached, then we know that all young objects get collected by the GC, which is exactly what we want. However, if the target utilization is frequently reached, then at least some of the objects beyond age 1 have been moved into the old generation, and maybe prematurely so. In this case, we can try to tune the survivor spaces by increasing their size or target utilization.

-XX:+NeverTenure and -XX:+AlwaysTenure

Finally, I would like to quickly mention two rather exotic flags which we can use to test two extremes of young generation GC behavior. If -XX:+NeverTenure is set, objects are never promoted to the old generation. This behavior makes sense when we are sure that we don’t need an old generation at all. However, as such, the flag is apparently very risky and also wastes at least half of the reserved heap memory. The inverse behavior can be triggered with -XX:+AlwaysTenure, i.e., no survivor spaces are used so that all young objects are immediately promoted to the old generation on their first GC. Again, it is difficult to find a valid use case for this flag – it can be fun to see what happens in a testing environment, but apart from that I would not recommend using either flag.

Conclusion

It is important to run an application with an adequate configuration for the young generation, and there are quite a few flags to tune it. However, tuning the young generation without considering the old generation as well rarely leads to success. When tuning the heap or the GC settings, we should always take the interplay between the young and old generation into account.

In the next two parts of this series we will learn about two fundamental old generation GC strategies offered by the HotSpot JVM. We will get to know the “Throughput Collector” and the “Concurrent Low Pause Collector” and take a look at their basic principles, algorithms, and tuning flags.

Posted by

Arpit Khandelwal

Posted on

September 26, 2014

Posted under

JVM, JVM Flags

Comments

Leave a comment

JVM Flags – Part 4 – Heap Tuning

Ideally, a Java application runs just fine with the default JVM settings so that there is no need to set any flags at all. However, in case of performance problems (which unfortunately arise quite often) some knowledge about relevant JVM flags is a welcome companion. In this part of our series, we will take a look at some JVM flags from the area of memory management.

All established HotSpot memory management and garbage collection algorithms are based on the same basic partitioning of the heap: The “young generation” contains newly allocated and short-lived objects while the “old generation” contains long-lived objects beyond a certain age. In addition to that, the “permanent generation” contains objects expected to live throughout the whole JVM lifecycle, e.g., the object representations of loaded classes or the String intern cache. For the following discussion, we assume that the heap is partitioned according to this classic strategy of young, old, and permanent generations. However, note that other strategies are also promising, one prominent example being the new G1 garbage collector, which blurs the distinction between the young and old generations. Also, current developments seem to indicate that some future version of the HotSpot JVM will not have the separation between the old and permanent generations anymore.

-Xms and -Xmx (or: -XX:InitialHeapSize and -XX:MaxHeapSize)

Arguably the most popular JVM flags at all are -Xms and -Xmx, which allow us to specify the initial and maximum JVM heap size, respectively. Both flags expect a value in bytes but also support a shorthand notation where “k” or “K” represent “kilo”, “m” or “M” represent “mega”, and “g” or “G” represent “giga”. For example, the following command line starts the Java class “MyApp” setting an initial heap size of 128 megabytes and a maximum heap size of 2 gigabytes:

$ java -Xms128m -Xmx2g MyApp

Note that, in practice, the initial heap size turns out to also be a lower bound for the heap size, i.e., a minimum heap size. While it is true that the JVM may dynamically resize the heap at run time, and thus in theory we might observe the heap size fall below its initial size, I never witnessed such a case in practice even with very low heap utilization. This behavior is convenient for developers and operations because, if desired, it allows them to specify a static heap size simply by setting -Xms and -Xmx to the same value.

It is useful to know that both -Xms and -Xmx are only shortcuts which are internally mapped to -XX:InitialHeapSize and -XX:MaxHeapSize. These two XX flags may also be used directly, to the same effect:

$ java -XX:InitialHeapSize=128m -XX:MaxHeapSize=2g MyApp

Note that all JVM output regarding initial and maximum heap size uses the long names exclusively. Thus, when looking for information about the heap size of a running JVM, e.g., by checking the output of -XX:+PrintCommandLineFlags or by querying the JVM via JMX, we should look for “InitialHeapSize” or “MaxHeapSize” and not for “Xms” or “Xmx”.

-XX:+HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath

If we refrain from setting -Xmx to an adequate value, we run the risk of being hit by an OutOfMemoryError, one of the most dreadful beasts that we may face when dealing with the JVM. As detailed in our blog series on this subject, the root cause of an OutOfMemoryError needs to be diagnosed carefully. Often, a good start for a deep analysis is a heap dump – too bad if none is available, in particular if the JVM has already crashed and the error did only appear on a production system after the application ran smoothly for several hours or days.

Luckily, there is a way to tell the JVM to generate a heap dump automatically when an OutOfMemoryError occurs, by setting the flag -XX:+HeapDumpOnOutOfMemoryError. Having this flag set “just in case” can save a lot of time when facing an unexpected OutOfMemoryError. By default, the heap dump is stored in a file java_pid<pid>.hprof in the directory where the JVM was started (here, <pid> is the process ID of the JVM process). To change the default, we may specify a different location using the flag -XX:HeapDumpPath=<path>, with <path> being a relative or absolute path to the file where to store the heap dump.

While all this sounds pretty nice, there is one caveat that we need to keep in mind. A heap dump can get large, and especially so when an OutOfMemoryError arises. Thus, it is recommended to always set a custom location using -XX:HeapDumpPath, and to choose a place with enough disk space available.

-XX:OnOutOfMemoryError

We can even execute an arbitrary sequence of commands when an OutOfMemoryError happens, e.g., to send an e-mail to an admin or to perform some cleanup job. This is made possible by the flag -XX:OnOutOfMemoryError, which expects a list of commands and, if applicable, their parameters. We will not go into the details here but just show an example configuration. With the following command line, should an OutOfMemoryError occur, we will write a heap dump to the file /tmp/heapdump.hprof and execute the shell script cleanup.sh in home directory of the user running the JVM.

$ java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof -XX:OnOutOfMemoryError ="sh ~/cleanup.sh" MyApp

-XX:PermSize and -XX:MaxPermSize

The permanent generation is a separate heap area which contains, among others, the object representations of all classes loaded by the JVM. To successfully run applications that load lots of classes (e.g., because they depend on lots of third-party libraries, which in turn depend on and load classes from even more libraries) it may be necessary to increase the size of the permanent generation. This can be done using the flags -XX:PermSize and -XX:MaxPermSize. Here, -XX:MaxPermSize sets the maximum size of the permanent generation while -XX:PermSize sets its initial size on JVM startup. A quick example:

$ java -XX:PermSize=128m -XX:MaxPermSize=256m MyApp

Note that the permanent generation size is not counted as part of the heap size as specified by -XX:MaxHeapSize. That is, the amount of permanent generation memory specified by -XX:MaxPermSize may be required in addition to the heap memory specified by -XX:MaxHeapSize.

-XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize

An interesting but often neglected memory area of the JVM is the “code cache”, which is used to store the native code generated for compiled methods. The code cache does rarely cause performance problems, but once we have a code cache problem its effects may be devastating. If the code cache is fully utilized, the JVM prints a warning message and then switches to interpreted-only mode: The JIT compiler gets deactivated and no bytecode will be compiled into native code anymore. Thus, the application will continue to run, but slower by an order of magnitude, until someone notices.

Like with the other memory areas, we may specify the size of the code cache ourselves. The relevant flags are -XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize, and they expect byte values just like the flags introduced above.

-XX:+UseCodeCacheFlushing

If the code cache grows constantly, e.g., because of a memory leak caused by hot deployments, increasing the code cache size will only delay its inevitable overflow. To avoid overflow, we can try an interesting and relatively new option: to let the JVM dispose some of the compiled code when the code cache fills up. This may be done by specifying the flag -XX:+UseCodeCacheFlushing. Using this flag, we can at least avoid the switch to interpreted-only mode when we face code cache problems. However, I would still recommend to tackle the root cause as soon as possible once a code cache problem has manifested itself, i.e., identify the memory leak and fix it.

Posted by

Arpit Khandelwal

Posted on

September 26, 2014

Posted under

JVM, JVM Flags

Comments

Leave a comment

JVM Flags – Part 3 – Printing all XX Flags and their Values

With Java 6 (update 20 oder 21), the HotSpot JVM offers two new command line flags which print a table of all XX flags and their values to the command line right after JVM startup. As many HotSpot users were longing for such a feature since the first versions of Java, I decided to make it the main topic of this article.

-XX:+PrintFlagsFinal and -XX:+PrintFlagsInitial

Let us take a look at the output of the new flags right away. Starting the client VM using -XX:+PrintFlagsFinal results in an alphabetically sorted table of 590 XX flags (note that the number of flags may change with each release):

$ java -client -XX:+PrintFlagsFinal Benchmark
[Global flags]
uintx AdaptivePermSizeWeight               = 20               {product}
uintx AdaptiveSizeDecrementScaleFactor     = 4                {product}
uintx AdaptiveSizeMajorGCDecayTimeScale    = 10               {product}
uintx AdaptiveSizePausePolicy              = 0                {product}
[...]
uintx YoungGenerationSizeSupplementDecay   = 8                {product}
uintx YoungPLABSize                        = 4096             {product}
 bool ZeroTLAB                             = false            {product}
 intx hashCode                             = 0                {product}

Each table row represents an XX flag and contains five columns. The first column shows the data type of the flag, the second one its name, the fourth one its value, and the fifth one its category. In the third column, “=” means that the value in column 4 is the default value for the flag, while “:=” indicates that the flag was set to that value (either by the user or by JVM ergonomics).

Note that I am only using the “Benchmark” class for this example because it was used in the previous parts of this series as well. You can get the same output even without a main class by running java with the additional parameter -version.

Now let us check how many flags the server VM offers. In order to unlock any additional hidden flags, we also specify the flags -XX:+UnlockExperimentalVMOptions and -XX:+UnlockDiagnosticVMOptions:

$ java -server -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal Benchmark

The result is a whopping 724 flags. Wow! Let us take a look at the flags that have been set to values other than the default:


$ java -server -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal Benchmark | grep ":"
<pre>uintx InitialHeapSize                     := 57505088         {product}
uintx MaxHeapSize                         := 920649728        {product}
uintx ParallelGCThreads                   := 4                {product}
 bool PrintFlagsFinal                     := true             {product}
 bool UseParallelGC                       := true             {product}

We have only set one of the flags, -XX:+PrintFlagsFinal, ourselves. The other flags were set by the server VM based on the underlying system, in order to run with appropriate heap size and garbage collector settings.

If we only want to see the default values of all XX flags, we can use a related flag, -XX:+PrintFlagsInitial. With -XX:+PrintFlagsInitial, the output only shows the original “=” entries in column 3 (also for those flags that were set to other values). However, note that some of the flags are missing when compared to the output of -XX:+PrintFlagsFinal, supposedly because these flags are created dynamically.

It is interesting to study the table contents in order to compare the behavior of the client and the server VM. Also, it is revealing to see which flags affect the values of other flags when being set on the command line. For the interested reader, such a study is described in the excellent blog post Inspecting HotSpot JVM Options. This post also briefly explains the meaning of the flag categories shown in column 5.

-XX:+PrintCommandLineFlags

To conclude the topic, let us consider one more flag which in fact has been available for quite some time already: -XX:+PrintCommandLineFlags. This flag tells the JVM to print the names and values of exactly those XX flags that have been set by the user or JVM ergonomics on startup. In other words, it lists the flags that have a “:=” in column 3 of the -XX:+PrintFlagsFinal output. Seen this way, we may consider -XX:+PrintCommandLineFlags as a shortcut to use when we are only interested in the modified flags. Let us return to the above example:

$ java -server -XX:+PrintCommandLineFlags Benchmark
-XX:InitialHeapSize=57505088 -XX:MaxHeapSize=920081408 -XX:ParallelGCThreads=4 -XX:+PrintCommandLineFlags -XX:+UseParallelGC

Now if we set -XX:+PrintCommandLineFlags every time we start a Java program and write its output to a log file, we document the effects of our JVM flag settings on the performance of the application. Thus, similar to -showversion (see part 1 of this series), I consider -XX:+PrintCommandLineFlags a flag that should always be set on JVM startup. It comes for free, and you never know when you are going to need that information.

It is curious to see that – not only in this example – the maximum heap size as listed by -XX:+PrintCommandLineFlags is a little bit smaller than the corresponding value shown by -XX:+PrintFlagsFinal.

Posted by

Arpit Khandelwal

Posted on

September 25, 2014

Posted under

JVM, JVM Flags

Comments

Leave a comment

JVM Flags – Part 2 – Flag Categories and JIT Compiler Diagnostics

In the second part of this series, I present an introduction to the different categories of flags offered by the HotSpot JVM. Also, I am going to discuss some interesting flags regarding JIT compiler diagnostics.

JVM flag categories

The HotSpot JVM offers three categories of flags. The first category contains the standard flags. As the name suggests, both the functionality and the output of a standard flag are stable and will most likely not change in future releases of the JVM. You can retrieve a list of all standard flags by running java without any parameter (or with the parameter -help, which incidentally is a standard flag itself). We have already seen some of the standard flags, e.g., -server, in the first part of this series.

The second category are the X flags, non-standardized flags the behavior or availability of which may change with future releases. All flags of this category start with “-X” and can be listed with java -X. Note that the list is not guaranteed to be complete, one notable missing flag being -Xcomp.

The third (and by far the largest) category consists of the XX flags, which are also non-standardized and could not even be listed for a long time (recently, this has changed, and we will return to this topic in the third part of the series). For practical use, however, there is no real difference between X flags and XX flags. The behavior of the X flags is quite stable, whereas many of the XX flags are rather experimental (and are primarily used by JVM developers for debugging and tuning of the JVM implementation itself). A worthwhile read is the HotSpot JVM documentation for non-standardized flags which explicitly states that XX flags should not be used in a careless fashion. This is true, but in my view this recommendation equally holds for the X flags (and some of the standard flags as well). Regardless of the category, it pays off to try to understand the intended behavior (and possible side effects) of a flag before using it.

One word about the syntax of XX flags. All XX flags start with “-XX:”, but then the syntax differs depending on the type of flag.

For a boolean flag, we either have a “+” or a “-” and only then the actual name of the JVM option to set. For example, -XX:+<name> activates the option <name> while -XX:-<name> deactives that option.
For a flag that takes a non-boolean value like a text string or an integer number, we first have the name of the flag followed by a “=”, and then the value to set. For example, -XX:<name>=<value> sets the option <name> to the value <value>.

Now let us take a look at some XX flags from the area of JIT compilation.

-XX:+PrintCompilation and -XX:+CITime

It is quite revealing to watch the JIT compiler work while a Java application is running. By setting the flag -XX:+PrintCompilation we can enable some simple output regarding the bytecode to native code compilation. Let us consider an example (output shortened) for a run of the server VM:

$ java -server -XX:+PrintCompilation Benchmark
  1       java.lang.String::hashCode (64 bytes)
  2       java.lang.AbstractStringBuilder::stringSizeOfInt (21 bytes)
  3       java.lang.Integer::getChars (131 bytes)
  4       java.lang.Object::<init> (1 bytes)
---   n   java.lang.System::arraycopy (static)
  5       java.util.HashMap::indexFor (6 bytes)
  6       java.lang.Math::min (11 bytes)
  7       java.lang.String::getChars (66 bytes)
  8       java.lang.AbstractStringBuilder::append (60 bytes)
  9       java.lang.String::<init> (72 bytes)
 10       java.util.Arrays::copyOfRange (63 bytes)
 11       java.lang.StringBuilder::append (8 bytes)
 12       java.lang.AbstractStringBuilder::<init> (12 bytes)
 13       java.lang.StringBuilder::toString (17 bytes)
 14       java.lang.StringBuilder::<init> (18 bytes)
 15       java.lang.StringBuilder::append (8 bytes)
[...]
 29       java.util.regex.Matcher::reset (83 bytes)

Whenever a method is compiled, a line is printed to the output of -XX:+PrintCompilation. Each line consists of a running number (a unique compiler task ID) and the name and size of the compiled method. Thus, line 1 represents the compilation of the String::hashCode method to native code. Depending on the type of method and compiler task, additional output characters are printed. For example, the creation of a native wrapper method is marked by an “n”, like with System::arraycopy in the example above. Note that such a line does not contain a running number and method size, because nothing is actually compiled to native code. It is also possible to see methods being recompiled, see the output for StringBuilder::append in lines 11 and 15. The output stops at line 29, which means that there were a total of 29 method compilations while this Java application was running.

There is no official documentation of -XX:+PrintCompilation, but this description is one of the few good resources on the output of this flag. I highly recommend it for further study.

The JIT compiler output helps us to understand some differences between the client VM and the server VM. With the server VM, our example application produces 29 lines of compilation output while the use of the client VM results in 55 lines (no figure shown). This may seem strange because the server VM is supposed to do “more” compilation than the client VM. However, given their respective default settings, the server VM watches methods much longer than the client VM before deciding whether they are hot spots and need to be compiled at all. Thus, it is not surprising that, with the server VM, some potential method compilations may only happen at a later stage.

By additionally setting the flag -XX:+CITime we request various statistics about compilations to be printed on JVM shutdown. Let us take a look at just one particular section of the statistics:

$ java -server -XX:+CITime Benchmark
[...]

Accumulated compiler times (for compiled methods only)
------------------------------------------------
  Total compilation time   :  0.178 s
    Standard compilation   :  0.129 s, Average : 0.004
    On stack replacement   :  0.049 s, Average : 0.024
[...]

A total of 0.178 seconds has been spent (on the 29 compiler tasks). Of these, 0.049 seconds belong to “on stack replacement”, i.e., the compilation of a method currently on the stack. This technique is not so simple to implement in a performant fashion, but it is actually highly important. Without “on stack replacement”, methods that have a long execution time (for example, because they contain long-running loops) could not be replaced on the fly by their compiled counterparts.

Once again, the comparison between the client VM and the server VM is interesting. The corresponding statistics for the client VM (no figure shown) indicate that even though 55 methods have been compiled, only a total of 0.021 seconds was spent on these compilations. Thus, the server VM has done less compilations than the client VM, but it spent much more time on them. The reason for this behavior is that the server VM performs more optimizations when generating native code.

In the first part of this series, we have learned about the flags -Xint and -Xcomp. With -XX:+PrintCompilation and -XX:+CITime we can now get a better picture of how the JIT compiler behaves in both cases. With -Xint, the flag -XX:+PrintCompilation produces exactly zero lines of output in both cases. Also, -XX:+CITime confirms that no time at all has been spent on compilation. Now with -Xcomp things are different. The client VM produces 726 lines of output immediately after startup and then no more because every relevant method has been compiled. With the server VM we even see 993 lines of output, which tells us that more aggressive optimizations are performed. Also, the statistics printed on JVM teardown show a huge difference between the two VMs. Consider the run with the server VM:

$ java -server -Xcomp -XX:+CITime Benchmark
[...]
Accumulated compiler times (for compiled methods only)
------------------------------------------------
  Total compilation time   :  1.567 s
    Standard compilation   :  1.567 s, Average : 0.002
    On stack replacement   :  0.000 s, Average : -1.#IO
[...]

The 1.567 seconds spent on compilations using -Xcomp are about 10 times as much as with the default settings, i.e., mixed mode. Still, the application runs slower than with mixed mode. For comparison, the client VM only takes 0.208 seconds to compile its 726 methods using -Xcomp, and it is even slower than the server VM with -Xcomp.

As a side note, no “on stack replacement” takes place because every method gets compiled when it is called for the first time. The corrupted output “Average: -1.#IO” (correct would be: 0) demonstrates once more that the output of non-standardized flags is nothing to rely on too much.

-XX:+UnlockExperimentalVMOptions

Sometimes when setting a particular JVM flag, the JVM terminates right after startup printing the message “Unrecognized VM option”. If that happens, you should first check if you mistyped the flag. However, if the name is spelled correctly and the JVM still does not recognize the flag, you may need to unlock the flag by setting -XX:+UnlockExperimentalVMOptions. It is not quite clear to me why this security mechanism is needed, but I suspect that the flags guarded in this way have a tendency to affect the stability of the JVM if not used properly (for example, because they may write excessive debug output to some log file).

Some flags are only intended to be used for JVM development but not for actual use with Java applications. If a flag cannot be enabled by -XX:+UnlockExperimentalVMOptions, but you really need to use that flag for whatever reason, then you may try your luck with a debug build of the JVM. For the Java 6 HotSpot JVM you can find debug builds here.

-XX:+LogCompilation and -XX:+PrintOptoAssembly

If you find yourself in a situation where the information given by -XX:+PrintCompilation is not detailed enough, you can use the flag -XX:+LogCompilation to write extended compilation output into the file “hotspot.log”. In addition to many details about the compiled methods you can also see which compiler threads were started for which tasks. Note that -XX:+LogCompilation needs to be unlocked by -XX:+UnlockExperimentalVMOptions.

The JVM even allows us to take a look at the native code resulting from bytecode compilation. With the flag -XX:+PrintOptoAssembly, the native code generated by the compiler threads gets written both to stdout and to the file “hotspot.log”. Using this flag requires you to run a debug build of the server VM. We can study the output of -XX:+PrintOptoAssembly in order to understand what kind of optimizations the JVM actually performs, for example regarding dead code elimination.

Posted by

Arpit Khandelwal

Posted on

September 25, 2014

Posted under

JVM, JVM Flags

Comments

Leave a comment

JVM Flags – Part 1 – JVM Types and Compiler Modes

Modern JVMs do an amazing job at running Java applications (and those of other compatible languages) in an efficient and stable manner. Adaptive memory management, garbage collection, just-in-time compilation, dynamic classloading, lock optimization – just to cite some of the magic that happens between the scenes but rarely affects the average programmer directly. At run time, the JVM optimizes the way it handles the application or parts of it, based on continuous measurements and profiling.

Even with such a level of automation (or rather, because of so much automation, as one might argue), it is important that the JVM still provides adequate facilities for external monitoring and manual tuning. In case of errors or low performance, it must be possible for experts to intervene. Incidentally, aside from all the magic that happens under the hood, a wide range of manual tuning knobs is one of the strengths of modern JVMs as well. Of particular interest are the command line flags that can be passed to the JVM at startup. Some JVMs provide several hundreds of these flags, and it is easy to get lost without proper knowledge in this area. The goal of this blog series is to highlight the most relevant flags for everyday use and explain what they are good for. We will focus on the Sun/Oracle HotSpot JVM as of Java 6, though in most cases similar flags exist for the other popular JVMs.

-server and -client

There are two types of the HotSpot JVM, namely “server” and “client”. The server VM uses a larger default size for the heap, a parallel garbage collector, and optimizes code more aggressively at run time. The client VM is more conservative, resulting in shorter startup time and lower memory footprint. Thanks to a concept called “JVM ergonomics”, the type of JVM is chosen automatically at JVM startup time based on certain criteria regarding the available hardware and operating system. The exact criteria can be found here. From the criteria table, we also see that the client VM is only available on 32-bit systems.

If we are not happy with the pre-selected JVM, we can use the flags -server and -client to prescribe the usage of the server and client VM, respectively. Even though the server VM was originally targeted at long-running server processes, nowadays it often shows superior performance than the client VM in many standalone applications as well. My recommendation is to choose the server VM by setting the -server flag whenever performance in the sense of shorter execution time is important for an application. A common gotcha: On 32-bit systems, a HotSpot JDK is required to be able to run the server VM at all – the 32-bit JRE only ships with the client VM.

-version and -showversion

How do we know which Java installation and which JVM type is used when we call java? With more than one Java installation on a system there is always a slight risk of running the wrong JVM without noticing it. Especially popular in this respect are pre-installed JVMs on various Linux distributions, even though I have to admit that things have become better over the years.

Luckily, we have the -version flag available, which prints some information about the used JVM to stdout. One example:

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)

The output shows the Java version number (1.6.0_24) and the exact build ID of the JRE used (1.6.0_24-b07). We also see the name (HotSpot), the type (Client) and the build ID (19.1-b02) of the JVM. In addition to that, we learn that the JVM runs in mixed mode. This mode of execution is the default mode of HotSpot and means that the JVM dynamically compiles byte code into native code at run time. We also learn that class data sharing is enabled. Class data sharing is an approach that stores the system classes of the JRE in a readonly cache (in a jsa file, “Java Shared Archive”) which is used as a shared resource by the classloader of all Java processes. Class data sharing may be beneficial to performance when compared to reading all the class data from jar archives over and over again.

The -version flag terminates the JVM immediately after printing out the above data. However, there is a similar flag -showversion which can be used to produce the same output but then proceed and execute a given Java application. Thus, -showversion is a useful addition to the command line of virtually every Java application. You never know when you suddenly need some information about the JVM used by a particular (crashed) Java application. By adding -showversion on startup, we are guaranteed to have this information available whenever we may need it.

-Xint, -Xcomp, and -Xmixed

The two flags -Xint and -Xcomp are not too relevant for our everyday work, but highly interesting in order to learn something about the JVM. The -Xint flag forces the JVM to execute all bytecode in interpreted mode, which comes along with a considerable slowdown, usually factor 10 or higher. On the contrary, the flag -Xcomp forces exactly the opposite behavior, that is, the JVM compiles all bytecode into native code on first use, thereby applying maximum optimization level. This sounds nice, because it completely avoids the slow interpreter. However, many applications will also suffer at least a bit from the use of -Xcomp, even if the drop in performance is not comparable with the one resulting from -Xint. The reason is that by setting -Xcomp we prevent the JVM from making use of its JIT compiler to full effect. The JIT compiler creates method usage profiles at run time and then optimizes single methods (or parts of them) step by step, and sometimes speculatively, to the actual application behavior. Some of these optimization techniques, e.g., optimistic branch prediction, cannot be applied effectively without first profiling the application. Another aspect is that methods are only getting compiled at all when they prove themselves relevant, i.e., constitute some kind of hot spot in the application. Methods that are called rarely (or even only once) are continued to be executed in interpreted mode, thus saving the compilation and optimization cost.

Note that mixed mode also has its own flag, -Xmixed. With recent versions of HotSpot, mixed mode is the default, so we don’t have to specify this flag anymore.

Let us consider the results of a simple example benchmark which fills a HashMap with objects and then retrieves them again. For each benchmark, the execution time shown is the average time over a large number of runs.

$ java -server -showversion Benchmark
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
Average time: 0.856449 seconds

$ java -server -showversion -Xcomp Benchmark
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Server VM (build 19.1-b02, compiled mode)
Average time: 0.950892 seconds

$ java -server -showversion -Xint Benchmark
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Server VM (build 19.1-b02, interpreted mode)
Average time: 7.622285 seconds

Of course there are also benchmarks that show -Xcomp to be best. Still, and especially for long running applications, I would strongly advise everyone to stick to the JVM default settings and let the JIT compiler make full use of its dynamic potential. After all, the JIT compiler is one of the most sophisticated components of the JVM – in fact, the recent advances in this area are the major reason why Java is not considered slow anymore nowadays.

Posted by

Arpit Khandelwal

Posted on

September 25, 2014

Posted under

Core Java, Java Classloaders

Comments

Leave a comment

Java ClassLoader – Part 5 – Complete Reference

Class loading is one of the most powerful mechanisms provided by the Java language specification. Even though the internals of class loading falls under the “advanced topics” heading, all Java programmers should know how the mechanism works and what can be done with it to suit their needs. This can save time that would otherwise have been spent debugging ClassNotFoundException, ClassCastException, etc.

This article starts from the basics, such as the difference between code and data, and how they are related to form an instance or object. Then it looks into the mechanism of loading code into the JVM with the help of class loaders, and the main type of class loaders available in Java. The article then looks into the internals of class loaders, where we cover using the basic algorithm (or probing), followed by class loaders before it loads a class. The next section of the article uses code examples to demonstrate the necessity for developers to extend and develop their own class loaders. This is followed by explanation on writing your own class loaders and how to use them to make a generic task-execution engine that can be used to load the code supplied by any remote client, define it in the JVM, and instantiate and then execute it. The article concludes with references to J2EE-specific components where custom class loading schemas becomes the norm.

Class and Data

A class represents the code to be executed, whereas data represents the state associated with that code. State can change; code generally does not. When we associate a particular state to a class, we have an instance of that class. So different instances of the same class can have different state, but all refer to the same code. In Java, a class will usually have its code contained in a .class file, though there are exceptions. Nevertheless, in the Java runtime, each and every class will have its code also available in the form of a first-class Java object, which is an instance of java.lang.Class. Whenever we compile any Java file, the compiler will embed a public, static, final field named class, of the type java.lang.Class, in the emitted byte code. Since this field is public, we can access it using dotted notation, like this:

java.lang.Class klass = Myclass.class;

Once a class is loaded into a JVM, the same class (I repeat, the same class) will not be loaded again. This leads to the question of what is meant by “the same class.” Similar to the condition that an object has a specific state, an identity, and that an object is always associated with its code (class), a class loaded into a JVM also has a specific identity, which we’ll look at now.

In Java, a class is identified by its fully qualified class name. The fully qualified class name consists of the package name and the class name. But a class is uniquely identified in a JVM using its fully qualified class name along with the instance of the ClassLoader that loaded the class. Thus, if a class named Cl in the package Pg is loaded by an instance kl1 of the class loader KlassLoader, the class instance of C1, i.e. C1.class is keyed in the JVM as (Cl, Pg, kl1). This means that the two class loader instances (Cl, Pg, kl1) and (Cl, Pg, kl2) are not one and the same, and classes loaded by them are also completely different and not type-compatible to each other. How many class loader instances do we have in a JVM? The next section explains this.

Class Loaders

In a JVM, each and every class is loaded by some instance of a java.lang.ClassLoader. The ClassLoader class is located in the java.lang package and developers are free to subclass it to add their own functionality to class loading.

Whenever a new JVM is started by typing java MyMainClass, the “bootstrap class loader” is responsible for loading key Java classes like java.lang.Object and other runtime code into memory first. The runtime classes are packaged inside of the JRE\lib\rt.jar file. We cannot find the details of the bootstrap class loader in the Java documentation, since this is a native implementation. For the same reason, the behavior of the bootstrap class loader will also differ across JVMs.

In a related note, we will get null if we try to get the class loader of a core Java runtime class, like this:

log(java.lang.String.class.getClassLoader());

Next comes the Java extension class loader. We can store extension libraries, those that provide features that go beyond the core Java runtime code, in the path given by the java.ext.dirs property. The ExtClassLoader is responsible for loading all .jar files kept in the java.ext.dirs path. A developer can add his or her own application .jar files or whatever libraries he or she might need to add to the classpath to this extension directory so that they will be loaded by the extension class loader.

The third and most important class loader from the developer perspective is the AppClassLoader. The application class loader is responsible for loading all of the classes kept in the path corresponding to the java.class.path system property.

“Understanding Extension Class Loading” in Sun’s Java tutorial explains more on the above three class loader paths. Listed below are a few other class loaders in the JDK:

java.net.URLClassLoader
java.security.SecureClassLoader
java.rmi.server.RMIClassLoader
sun.applet.AppletClassLoader

java.lang.Thread, contains the method public ClassLoader getContextClassLoader(), which returns the context class loader for a particular thread. The context class loader is provided by the creator of the thread for use by code running in this thread when loading classes and resources. If it is not set, the default is the class loader context of the parent thread. The context class loader of the primordial thread is typically set to the class loader used to load the application.

How Class Loaders Work

All class loaders except the bootstrap class loader have a parent class loader. Moreover, all class loaders are of the type java.lang.ClassLoader. The above two statements are different, and very important for the correct working of any class loaders written by developers. The most important aspect is to correctly set the parent class loader. The parent class loader for any class loader is the class loader instance that loaded that class loader. (Remember, a class loader is itself a class!)

A class is requested out of a class loader using the loadClass() method. The internal working of this method can be seen from the source code for java.lang.ClassLoader, given below:


protected synchronized Class<?> loadClass   (String name, boolean resolve) throws ClassNotFoundException{

    // First check if the class is already loaded

    Class c = findLoadedClass(name);

    if (c == null) {
        try {
            if (parent != null) {
                c = parent.loadClass(name, false);
            } else {
                c = findBootstrapClass0(name);
            }
        } catch (ClassNotFoundException e) {
            // If still not found, then invoke
            // findClass to find the class.
            c = findClass(name);
        }
    }

    if (resolve) {
        resolveClass(c);
    }
    return c;
}

To set the parent class loader, we have two ways to do so in the ClassLoader constructor:

public class MyClassLoader extends ClassLoader{
public MyClassLoader(){
        super(MyClassLoader.class.getClassLoader());
    }
}

public class MyClassLoader extends ClassLoader{
    public MyClassLoader(){
        super(getClass().getClassLoader());
    }
}

The first method is preferred because calling the method getClass() from within the constructor should be discouraged, since the object initialization will be complete only at the exit of the constructor code. Thus, if the parent class loader is correctly set, whenever a class is requested out of a ClassLoader instance, if it cannot find the class, it should ask the parent first. If the parent cannot find it (which again means that its parent also cannot find the class, and so on), and if the findBootstrapClass0() method also fails, the findClass() method is invoked. The default implementation of findClass() will throw ClassNotFoundException and developers are expected to implement this method when they subclass java.lang.ClassLoader to make custom class loaders. The default implementation of findClass() is shown below.

protected Class<?> findClass(String name) throws ClassNotFoundException {
        throw new ClassNotFoundException(name);
    }

Inside of the findClass() method, the class loader needs to fetch the byte codes from some arbitrary source. The source can be the file system, a network URL, a database, another application that can spit out byte codes on the fly, or any similar source that is capable of generating byte code compliant with the Java byte code specification. You could even use BCEL (Byte Code Engineering Library), which provides convenient methods to create classes from scratch at runtime. BCEL is being used successfully in several projects such as compilers, optimizers, obsfuscators, code generators, and analysis tools. Once the byte code is retrieved, the method should call the defineClass() method, and the runtime is very particular about which ClassLoader instance calls this method. Thus, if two ClassLoader instances define byte codes from the same or different sources, the defined classes are different.

The Java language specification gives a detailed explanation on the process of loading, linking, and the initialization of classes and interfaces in the Java Execution Engine.

Figure 1 shows an application with a main class called MyMainClass. As explained earlier, MyMainClass.class will be loaded by the AppClassLoader. MyMainClass creates instances of two class loaders, CustomClassLoader1 and CustomClassLoader2, which are capable of finding the byte codes of a fourth class called Target from some source (say, from a network path). This means the class definition of the Target class is not in the application class path or extension class path. In such a scenario, if MyMainClass asks the custom class loaders to load the Target class, Target will be loaded and Target.class will be defined independently by both CustomClassLoader1 and CustomClassLoader2. This has serious implications in Java. If some static initialization code is put in the Target class, and if we want this code to be executed one and only once in a JVM, in our current setup the code will be executed twice in the JVM: once each when the class is loaded separately by both CustomClassLoaders. If the Target class is instantiated in both the CustomClassLoaders to have the instances target1 and target2 as shown in Figure 1, then target1 and target2 are not type-compatible. In other words, the JVM cannot execute the code:

Target target3 = (Target) target2;

The above code will throw a ClassCastException. This is because the JVM sees these two as separate, distinct class types, since they are defined by different ClassLoader instances. The above explanation holds true even ifMyMainClass doesn’t use two separate class loader classes like CustomClassLoader1 and CustomClassLoader2, and instead uses two separate instances of a single CustomClassLoader class. This is demonstrated later in the article with code examples.

Figure 1. Multiple ClassLoaders loading the same Target class in the same JVM

A more detailed explanation on the process of class loading, defining, and linking is in Andreas Schaefer’s article “Inside Class Loaders.”

Why Do We Need our Own Class Loaders?

One of the reasons for a developer to write his or her own class loader is to control the JVM’s class loading behavior. A class in Java is identified using its package name and class name. For classes that implementjava.io.Serializable, the serialVersionUID plays a major role in versioning the class. This stream-unique identifier is a 64-bit hash of the class name, interface class names, methods, and fields. Other than these, there are no other straightforward mechanisms for versioning a class. Technically speaking, if the above aspects match, the classes are of “same version.”

But let us think of a scenario where we need to develop a generic Execution Engine, capable of executing any tasks implementing a particular interface. When the tasks are submitted to the engine, first the engine needs to load the code for the task. Suppose different clients submit different tasks (i.e., different code) to the engine, and by chance, all of these tasks have the same class name and package name. The question is whether the engine will load the different client versions of the task differently for different client invocation contexts so that the clients will get the output they expect. The phenomenon is demonstrated in the sample code download, located in the Referencessection below. Two directories, samepath and differentversions, contain separate examples to demonstrate the concept.

Figure 2 shows how the examples are arranged in three separate subfolders, called samepath, differentversions, and differentversionspush:

Figure 2. Example folder structure arrangement

In samepath, we have version.Version classes kept in two subdirectories, v1 and v2. Both classes have the same name and same package. The only difference between the two classes is in the following lines:

    public void fx(){
        log("this = " + this + "; Version.fx(1).");
    }

inside of v1, we have Version.fx(1) in the log statement, whereas in v2, we have Version.fx(2). Put both these slightly different versions of the classes in the same classpath, and run the Test class:

set CLASSPATH=.;%CURRENT_ROOT%\v1;%CURRENT_ROOT%\v2
%JAVA_HOME%\bin\java Test

This will give the console output shown in Figure 3. We can see that code corresponding to Version.fx(1) is loaded, since the class loader found that version of the code first in the classpath.

Figure 3. samepath test with version 1 first in the classpath

Repeat the run, with a slight change in the order of path elements in class path.

set CLASSPATH=.;%CURRENT_ROOT%\v2;%CURRENT_ROOT%\v1
%JAVA_HOME%\bin\java Test

The console output is now changed to that shown in Figure 4. Here, the code corresponding to Version.fx(2) is loaded, since the class loader found that version of the code first in the classpath.

Figure 4. samepath test with version 2 first in the classpath

From the above example it is obvious that the class loader will try to load the class using the path element that is found first. Also, if we delete the version.Version classes from v1 and v2, make a .jar (myextension.jar) out ofversion.Version, put it in the path corresponding to java.ext.dirs, and repeat the test, we see that version.Version is no longer loaded by AppClassLoader but by the extension class loader, as shown in Figure 5.

Figure 5. AppClassLoader and ExtClassLoader

Going forward with the examples, the folder differentversions contains an RMI execution engine. Clients can supply any tasks that implement common.TaskIntf to the execution engine. The subfolders client1 and client2 contain slightly different versions of the class client.TaskImpl. The difference between the two classes is in the following lines:

static{
        log("client.TaskImpl.class.getClassLoader (v1) : " + TaskImpl.class.getClassLoader());
    }

public void execute(){
   log("this = " + this + "; execute(1)");
}

Instead of the getClassLoader(v1) and execute(1) log statements in execute() inside of client1, client2 has getClassLoader(v2) and execute(2) log statements. Moreover, in the script to start the Execution Engine RMI server, we have arbitrarily put the task implementation class of client2 first in the classpath.

CLASSPATH=%CURRENT_ROOT%\common;%CURRENT_ROOT%\server;
%CURRENT_ROOT%\client2;%CURRENT_ROOT%\client1
%JAVA_HOME%\bin\java server.Server

The screenshots in Figures 6, 7, and 8 show what is happening under the hood. Here, in the client VMs, separate client.TaskImpl classes are loaded, instantiated, and sent to the Execution Engine Server VM for execution. From the server console, it is apparent that client.TaskImpl code is loaded only once in the server VM. This single “version” of the code is used to regenerate many client.TaskImpl instances in the server VM, and execute the task.

Figure 6. Execution Engine Server console

Figure 6 shows the Execution Engine Server console, which is loading and executing code on behalf of two separate client requests, as shown in Figures 7 and Figure 8. The point to note here is that the code is loaded only once (as is evident from the log statement inside of the static initialization block), but the method is executed twice for each client invocation context.

Figure 7. Execution Engine Client 1 console

In Figure 7, the code for the TaskImpl class containing the log statement client.TaskImpl.class.getClassLoader(v1) is loaded by the client VM, and supplied to the Execution Engine Server. The client VM in Figure 8 loads different code for the TaskImpl class containing the log statement client.TaskImpl.class.getClassLoader(v2), and supplies it to the Server VM.

Figure 8. Execution Engine Client 2 console

Here, in the client VMs, separate client.TaskImpl classes are loaded, instantiated, and sent to the Execution Engine Server VM for execution. A second look at the server console in Figure 6 reveals that the client.TaskImplcode is loaded only once in the server VM. This single “version” of the code is used to regenerate the client.TaskImpl instances in the server VM, and execute the task. Client 1 should be unhappy since instead of his “version” of the client.TaskImpl(v1), it is some other code that is executed in the server against Client 1’s invocation! How do we tackle such scenarios? The answer is to implement custom class loaders.

Custom Class Loaders

The solution to fine-control class loading is to implement custom class loaders. Any custom class loader should have java.lang.ClassLoader as its direct or distant super class. Moreover, in the constructor, we need to set the parent class loader, too. Then, we have to override the findClass() method. The differentversionspush folder contains a custom class loader called FileSystemClassLoader. Its structure is shown in Figure 9:

Figure 9. Custom class loader relationship

Below are the main methods implemented in common.FileSystemClassLoader:

    public byte[] findClassBytes(String className){
        try{
            String pathName = currentRoot + File.separatorChar + className.replace('.', File.separatorChar)+ ".class";
            FileInputStream inFile = new FileInputStream(pathName);
            byte[] classBytes = new byte[inFile.available()];
            inFile.read(classBytes);
            return classBytes;
        }catch (java.io.IOException ioEx){
            return null;
        }
    } 

    public Class findClass(String name)throws    ClassNotFoundException{
        byte[] classBytes = findClassBytes(name);
        if (classBytes==null){
            throw new ClassNotFoundException();
        }
        else{
            return defineClass(name, classBytes, 0, classBytes.length);
        }
    }
 

    public Class findClass(String name, byte[]   classBytes)throws ClassNotFoundException{
        if (classBytes==null){
            throw new ClassNotFoundException("(classBytes==null)");
        } else{
            return defineClass(name, classBytes, 0, classBytes.length);
        }
    } 

    public void execute(String codeName,    byte[] code){
        Class klass = null;
        try{
            klass = findClass(codeName, code);
            TaskIntf task = (TaskIntf) klass.newInstance();
            task.execute();
        } catch(Exception exception){
            exception.printStackTrace();
        }
    }

This class is used by the client to convert the client.TaskImpl(v1) to a byte[]. This byte[] is then send to the RMI Server Execution Engine. In the server, the same class is used for defining the class back from the code in the form of byte[]. The client-side code is shown below:

public class Client{
    public static void main (String[] args){
        try{
            byte[] code = getClassDefinition("client.TaskImpl");
            serverIntf.execute("client.TaskImpl",  code);
            }catch(RemoteException remoteException){
                remoteException.printStackTrace();
            }
        } 

    private static byte[] getClassDefinition (String codeName){
        String userDir = System.getProperties().getProperty("BytePath");
        FileSystemClassLoader fscl1 = null;
        try{
            fscl1 = new FileSystemClassLoader (userDir);
        } catch(FileNotFoundException    fileNotFoundException){
            fileNotFoundException.printStackTrace();
        }
        return fscl1.findClassBytes(codeName);
    }
}

Inside of the execution engine, the code received from the client is given to the custom class loader. The custom class loader will define the class back from the byte[], instantiate the class, and execute. The notable point here is that, for each client request, we use separate instances of the FileSystemClassLoader class to define the client-supplied client.TaskImpl. Moreover, the client.TaskImpl is not available in the class path of the server. This means that when we call findClass() on the FileSystemClassLoader, the findClass() method calls defineClass() internally, and the client.TaskImpl class gets defined by that particular instance of the class loader. So when a new instance of the FileSystemClassLoader is used, the class is defined from the byte[] all over again. Thus, for each client invocation, class client.TaskImpl is defined again and again and we are able to execute “different versions” of the client.TaskImpl code inside of the same Execution Engine JVM.

    public void execute(String codeName, byte[] code)throws RemoteException{
        FileSystemClassLoader fileSystemClassLoader = null;
        try{
            fileSystemClassLoader = new FileSystemClassLoader();
            fileSystemClassLoader.execute(codeName, code);
        }catch(Exception exception){
            throw new RemoteException(exception.getMessage());
        }
    }

Examples are in the differentversionspush folder. The server and client side consoles are shown in Figures 10, 11, and 12:

Figure 10. Custom class loader execution engine

Figure 10 shows the custom class loader Execution Engine VM console. We can see the client.TaskImpl code is loaded more than once. In fact, for each client execution context, the class is newly loaded and instantiated.

Figure 11. Custom class loader engine, Client 1

In Figure 11, the code for the TaskImpl class containing the log statement client.TaskImpl.class.getClassLoader(v1) is loaded by the client VM, and pushed to the Execution Engine Server VM. The client VM in Figure 12 loads a different code for the TaskImpl class containing the log statement client.TaskImpl.class.getClassLoader(v2), and pushes to the Server VM.

Figure 12. Custom class loader engine, Client 2

This code example shows how we can leverage separate instances of class loaders to have side-by-side execution of “different versions” of code in the same VM.

Class Loaders In J2EE

The class loaders in some J2EE servers tend to drop and reload classes at different intervals. This will occur in some implementations and may not on others. Similarly, a web server may decide to remove a previously loaded servlet instance, perhaps because it is explicitly asked to do so by the server administrator, or because the servlet has been idle for a long time. When a request is first made for a JSP (assuming it hasn’t been precompiled), the JSP engine will translate the JSP into its page implementation class, which takes the form of a standard Java servlet. Once the page’s implementation servlet has been created, it will be compiled into a class file by the JSP engine and will be ready for use. Each time a container receives a request, it first checks to see if the JSP file has changed since it was last translated. If it has, it’s retranslated so that the response is always generated by the most up-to-date implementation of the JSP file. Enterprise application deployment units in the form of .ear, .war, .rar, etc. will also needs to be loaded and reloaded at will or as per configured policies. For all of these scenarios, loading, unloading and reloading is possible only if we have control over the application server’s JVM’s class-loading policy. This is attained by an extended class loader, which can execute the code defined in its boundary. Brett Peterson has given an explanation of class loading schemas in a J2EE application server context in his article “ Understanding J2EE Application Server Class Loading Architectures” at TheServerSide.com.

Summary

The article talked about how classes loaded into a Java virtual machine are uniquely identified and what limitations exist when we try to load different byte codes for classes with the same names and packages. Since there is no explicit class versioning mechanism, if we want to load classes at our own will, we have to use custom class loaders with extended capabilities. Many J2EE application servers have a “hot deployment” capability, where we can reload an application with a new version of class definition, without bringing the server VM down. Such application servers make use of custom class loaders. Even if we don’t use an application server, we can create and use custom class loaders to finely control class loading mechanisms in our Java applications. Ted Neward’s book Server-Based Java Programming throws light onto the ins and outs of Java class loading, and it teaches those concepts of Java that underlie the J2EE APIs and the best ways to use them.

Posted by

Arpit Khandelwal

Posted on

September 25, 2014

Posted under

J2EE, JMS

Comments

Leave a comment

Java Message Service (JMS) using ActiveMQ

JMS let’s you send messages containing for example a String, array of bytes or a serializable Java object, from one program to another. It doesn’t however use a direct connection from program A to program B, instead the message is sent to a JMS provider and put there in a Queue where it waits until the other program receives it.

MessageProducer is a Java program sending a JMS message to a Queue on the JMS Provider. MessageConsumer is another program which receives that message. These two programs can run on separate machines and all they have to know to communicate is the URL of the JMS Provider. The Provider can be for example a Java EE server, like JBoss or Glassfish. But don’t be afraid, you don’t need a full-blown JEE server to send a JMS message. In this article we will use ActiveMQ which is lightweight and easy to use.

First we need to download ActiveMQ. If you are using Linux, you can get it from this link. For Windows you can use this link. In case the links don’t work, you can find the files in ‘Downloads’ section on ActiveMQ’s webpage.

After the download, extract it to any directory and run the ‘activemq’ program from beneath the ‘{path-where-you-extracted-activemq}/bin’ directory:


user@user-laptop:~/activemq/apache-activemq-5.3.0/bin$ ./activemq

You should see a bunch of INFO messages appearing on the terminal:


...

INFO | ActiveMQ Web Demos at http://0.0.0.0:8161/demo

INFO | RESTful file access application at http://0.0.0.0:8161/fileserver

INFO | Started SelectChannelConnector@0.0.0.0:8161

ow the ActiveMQ server is up and running. You can close it any time by pressing Ctrl-C. ActiveMQ has a nice admin console, where you can see a lot of useful informations and change the settings: http://localhost:8161/admin/.

Now that we have a JMS provider running, let’s write our message producer and consumer programs. For that, you will need to put the ActiveMQ’s JAR file on the class path. The file you need is called (for version 5.3.0) ‘activemq-all-5.3.0.jar’ or something similar and is in the extracted ActiveMQ directory. In Eclipse you could click right mouse button on your project and choose Properties->Java Build Path->Libraries->Add External Library.

Here is the code of the program sending (producing) the messages:


import javax.jms.*;



import org.apache.activemq.ActiveMQConnection;

import org.apache.activemq.ActiveMQConnectionFactory;


public class Producer {

    // URL of the JMS server. DEFAULT_BROKER_URL will just mean

    // that JMS server is on localhost

    private static String url = ActiveMQConnection.DEFAULT_BROKER_URL;

    // Name of the queue we will be sending messages to

    private static String subject = "TESTQUEUE";


    public static void main(String[] args) throws JMSException {

        // Getting JMS connection from the server and starting it

        ConnectionFactory connectionFactory = new ActiveMQConnectionFactory(url);

        Connection connection = connectionFactory.createConnection();

        connection.start();



        // JMS messages are sent and received using a Session. We will
        // create here a non-transactional session object. If you want
        // to use transactions you should set the first parameter to 'true'

        Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);



        // Destination represents here our queue 'TESTQUEUE' on the
        // JMS server. You don't have to do anything special on the
        // server to create it, it will be created automatically.

        Destination destination = session.createQueue(subject);


        // MessageProducer is used for sending messages (as opposed
        // to MessageConsumer which is used for receiving them)

        MessageProducer producer = session.createProducer(destination);


        // We will send a small text message saying 'Hello' in Japanese
        TextMessage message = session.createTextMessage("?????");


        // Here we are sending the message!
        producer.send(message);

        System.out.println("Sent message '" + message.getText() + "'");

        connection.close();
    }
}

There is a lot going on here. The Connection represents our connection with the JMS Provider – ActiveMQ. Be sure not to confuse it with SQL’s Connection. ‘Destination’ represents the Queue on the JMS Provider that we will be sending messages to. In our case, we will send it to Queue called ‘TESTQUEUE’ (it will be automatically created if it didn’t exist yet).

What you should note is that there is no mention of who will finally read the message. Actually, the Producer does not know where or who the consumer is! We are just sending messages into queue ‘TESTQUEUE’ and what happens from there to the sent messages is not of Producer’s interest any more.

The most interesting for us part in the above code is probably line 46 where we use function ‘.createTextMessage(“?????”);’ to send a text message (in this case to our Japanese friend).

Now let’s see how to receive (consume) the sent message. Here is the code for the Consumer class:


import javax.jms.*;

import org.apache.activemq.ActiveMQConnection;

import org.apache.activemq.ActiveMQConnectionFactory;


public class Consumer {

    // URL of the JMS server

    private static String url = ActiveMQConnection.DEFAULT_BROKER_URL;

    // Name of the queue we will receive messages from

    private static String subject = "TESTQUEUE";

    public static void main(String[] args) throws JMSException {

        // Getting JMS connection from the server

        ConnectionFactory connectionFactory = new ActiveMQConnectionFactory(url);

        Connection connection = connectionFactory.createConnection();

        connection.start();


        // Creating session for seding messages

        Session session = connection.createSession(false,Session.AUTO_ACKNOWLEDGE);

        // Getting the queue 'TESTQUEUE'

        Destination destination = session.createQueue(subject);

        // MessageConsumer is used for receiving (consuming) messages

        MessageConsumer consumer = session.createConsumer(destination);


        // Here we receive the message.
        // By default this call is blocking, which means it will wait
        // for a message to arrive on the queue.

        Message message = consumer.receive();


        // There are many types of Message and TextMessage
        // is just one of them. Producer sent us a TextMessage
        // so we must cast to it to get access to its .getText() method.

        if (message instanceof TextMessage) {

            TextMessage textMessage = (TextMessage) message;

            System.out.println("Received message '"+ textMessage.getText() + "'");

        }
        connection.close();
    }
}

As you see, it looks pretty similar to the Producer’s code before. Actually only the part starting from line 35 is substantially different. We produce there a MessageConsumer instead of MessageReceiver and then use it’s .receive() method instead of .send(). You can see also an ugly cast from Message to TextMessage but there is nothing we could do about it, because .receive() method just returns interface Message (TextMessage interface extends Message) and there are no separate methods for receiving just TextMessage’s.

Compile now both programs remembering about adding ActiveMQ’s JAR file to the classpath. Before running them be also sure that the ActiveMQ’s instance is running (for example in a separate terminal). First run the Producer program:


2014/09/25 15:56:37 org.apache.activemq.
transport.failover.FailoverTransport doReconnect
<span style="font-family: MS Mincho;">情報</span>: Successfully connected to tcp://localhost:61616
Sent message '<span style="font-family: MS Mincho;">こんにちは</span>'

If you see something similar to the output above (especially the ‘Sent message’ part) then it means that the message was successfully sent and is now inside the TESTQUEUE queue. You can enter the Queues section in the ActiveMQ’s admin console http://localhost:8161/admin/queues.jsp and see that there is one message sitting in TESTQUEUE:

In order to receive that message run now the Consumer program:

2014/09/25 15:58:03 org.apache.activemq.
transport.failover.FailoverTransport doReconnect
<span style="font-family: MS Mincho;">情報</span>: Successfully connected to tcp://localhost:61616
Received message '<span style="font-family: MS Mincho;">こんにちは</span>'

If you are getting above input (or something similar) everything went ok. The message was successfully received.

You are now probably thinking “Why would anybody want to do that??”. In fact, the code presented here to transfer just a small text message was pretty big, and you also needed an instance of ActiveMQ running, and dependencies in the classpath and all that…. Instead of using JMS we could use plain TCP/IP with few times less effort. So, what are good points of using JMS compared to simple TCP/IP or Java RMI? Here they are:

Communication using JMS is asynchronous. The producer just sends a message and goes on with his business. If you called a method using RMI you would have to wait until it returns, and there can be cases you just don’t want to lose all that time.
Take a look at how we run the example above. At the moment we run the Producer to send a message, there was yet no instance of Consumer running. The message was delivered at the moment the Consumer asked ActiveMQ for it. Now compare it to RMI. If we tried to send any request through RMI to a service that is not running yet, we would get a RemoteException. Using JMS let’s you send requests to services that may be currently unavailable. The message will be delivered as soon as the service starts.
JMS is a way of abstracting the process of sending messages. As you see in the examples above, we are using some custom Queue with name ‘TESTQUEUE’. The producer just knows it has to send to that queue and the consumer takes it from there. Thanks to that we can decouple the producer from the consumer. When a message gets into the queue, we can do a full bunch of stuff with it, like sending it to other queues, copying it, saving in a database, routing based on its contents and much more. here you can see some of the possibilities.

JMS is widely used as a System Integration solution in big, distributed systems like those of for example banks. There are many books dealing with this huge topic, for example Enterprise Integration Patterns. If you want to learn more about JMS itself you can do it for example on this JMS Tutorial on Sun’s webpage.

analyzejava

Java tips and tricks

Monthly Archives: September 2014

JVM Flags – Part 8 – GC Logging

JVM Flags – Part 7 – CMS Collector

JVM Flags – Part 6 – Throughput Collector

JVM Flags – Part 5 – Young Generation Garbage Collection

JVM Flags – Part 4 – Heap Tuning

JVM Flags – Part 3 – Printing all XX Flags and their Values

JVM Flags – Part 2 – Flag Categories and JIT Compiler Diagnostics

JVM Flags – Part 1 – JVM Types and Compiler Modes

Java ClassLoader – Part 5 – Complete Reference

How Class Loaders Work

Class Loaders In J2EE

Summary

Java Message Service (JMS) using ActiveMQ