Can CMS maximum free chunk size provide advance warning before full GC?

Mon Apr 29 11:53:38 PDT 2013

On 4/29/13 1:05 AM, Nadav Wiener wrote:
> For the context of a soft real time system that should not pause for 
> more than 200ms, we're looking for a way to have an advance warning 
> before a Full GC is imminent. We realize we might not be able to avoid 
> it, but we'd like to fail over to another node before the system stalls.
>
> We've been able to come up with a scheme that will provide us with an 
> advance warning, ahead of imminent full GC that may cause the system 
> to stall for several seconds (which we need to avoid).
>
> What we've been able to come up with relies on CMS free list 
> statistics: <code>-XX:PrintFLSStatistics=1</code>. This prints free 
> list statistics into the GC log after every GC cycle, including young 
> GC, so the information is available at short intervals, and will 
> appear even more frequently during intervals of high memory allocation 
> rate. It probably costs a little in terms of performance, but our 
> working assumption is that we can afford it.
>
> The output to the log looks like so:
>
>     Statistics for BinaryTreeDictionary:
>     ------------------------------------
>     Total Free Space: 382153298
>     Max   Chunk Size: 382064598
>     Number of Blocks: 28
>     Av.  Block  Size: 13648332
>     Tree      Height: 8
>
> In particular, the maximum free chunk size is 382064598 words. With 
> 64-bit words this should amount to just below 2915MB. This number has 
> been decreasing very slowly, at a rate of roughly 1MB per hour.
>
> It is our understanding that so long as the maximum free chunk size is 
> larger than the young generation (assuming no humungous object 
> allocation), every object promotion should succeed.

To a very large degree this is correct.  There are circumstances under 
which  an object
promoted from the young generation into the CMS generation will require 
more space
in the CMS generation than it did in the young generation.  I don't 
think this happens
to a significant extent.

>
> Recently, we've run a several-days-long stress tests, and have been 
> seeing that CMS was able to maintain maximum chunk sizes upward of 94% 
> of total old region space. The maximum free chunk size appears to be 
> decreasing at a rate of less than 1MB/hour, which should be fine -- 
> according to this we won't be hitting full GC any time soon, and the 
> servers will likely be down for maintenance more frequently than full 
> GC can occur.
>
> In a previous test, at a time when the system was less memory 
> efficient, we've been able to run the system for a good 10 hours. 
> During the first hour, the maximum free chunk size has decreased to 
> 100MB, where it stayed for over 8 hours. During the last 40 minutes of 
> the run, the maximum free chunk size has decreased at a steady rate 
> towards 0, when a full GC occurred -- this was very encouraging, 
> because for that workload we seemed to be able to get a 40 minute 
> advance warning (when the chunk size started a steady decline towards 0).

Out of curiosity, when you say "less memory efficient" do you mean that the
previous system actually used more objects in the Java heap?  Or did you
do something to make the objects of fewer sizes?   Meaning, if before
you had objects of sizes N, N+1, N+2, N+3, now you pad them out to
all be of size N+3.
>
> **My question to you**: assuming this all reflects a prolonged peak 
> workload (workload at any given point in time in production will only 
> be lower), does this sound like a valid approach? To what degree of 
> reliability do you reckon we should be able to count on the maximum 
> free chunk size statistic from the GC log?
>
The maximum free chunk size is exact at the time GC prints it, but it 
can be stale
by the time you read it and make your decisions.
> We are definitely open for suggestions, but request that they be 
> limited to solutions available on HotSpot (No Azul for us, at least 
> for now). Also, G1 by itself is no solution unless we can come up with 
> a similar metric that will give us advance warning before Full GCs, or 
> any GCs that significantly exceed our SLA (and these can occasionally 
> occur).

I think that the use of maximum free chunk size as your metric is a good
choice.  It is very conservative (which sounds like what you want) and
not subject to odd mixtures of object sizes.

For G1 I think you could use the number of completely free regions.
I don't know if it is printed in any of the logs currently but it is 
probably
a metric we maintain (or could easily).  If the number of completely free
regions decreases over time, it could signal that a full GC is coming.

Jon

>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130429/a1e2031d/attachment.html