How to alert for heap fragmentation

Fri Oct 12 07:28:48 PDT 2012

Ramki, Todd,

There are several projects in the pipeline for cleaning up verbose logs, 
reporting more/better data and improving the JVM monitoring infrastructure in 
different ways.

Exactly what data we will add and what logging that will be improved is not 
decided yet but I wouldn't have too high hopes that CMS is first out. Our 
prime target for logging improvements lately has been G1 which, by the way, 
might be worth while checking out if you are worried about fragmentation.

We have done some initial attempts along the lines of JEP 158 [1], again 
mainly for G1, and we are currently working with GC support for the 
event-based JVM tracing described in JEP 167 [2]. In the latter JEP the 
Parallel collectors (Parallel Scavenge and Parallel Old) will likely be first 
out with a few events. Have a look at these JEPs for more details.

[1] http://openjdk.java.net/jeps/158
[2] http://openjdk.java.net/jeps/167

Best regards,
/Jesper

On 2012-10-12 08:30, Srinivas Ramakrishna wrote:
>
> Todd, good question :-)
>
> @Jesper et al, do you know the answer to Todd's question? I agree that
> exposing all of these stats via suitable JMX/Mbean interfaces would be quite
> useful.... The other possibility would be to log in the manner of HP's gc logs
> (CSV format with suitable header), or jstat logs, so parsing cost would be
> minimal. Then higher level, general tools like Kafka could consume the
> log/event streams, apply suitable filters and inform/alert interested
> monitoring agents.
>
> @Todd & Saroj: Can you perhaps give some scenarios on how you might make use
> of information such as this (more concretely say CMS fragmentation at a
> specific JVM)? Would it be used only for "read-only" monitoring and alerting,
> or do you see this as part of an automated data-centric control system of
> sorts. The answer is kind of important, because something like the latter can
> be accomplished today via gc log parsing (however kludgey that might be) and
> something like Kafka/Zookeeper. On the other hand, I am not sure if the
> latency of that kind of thing would fit well into a more automated and
> fast-reacting data center control system or load-balancer where a more direct
> JMX/MBean like interface might work better. Or was your interest purely of the
> "development-debugging-performance-measurement" kind, rather than of
> production JVMs? Anyway, thinking out loud here...
>
> Thoughts/Comments/Suggestions?
> -- ramki
>
> On Thu, Oct 11, 2012 at 9:11 PM, Todd Lipcon <todd at cloudera.com
> <mailto:todd at cloudera.com>> wrote:
>
>     Hey Ramki,
>
>     Do you know if there's any plan to offer the FLS statistics as a metric
>     via JMX or some other interface in the future? It would be nice to be able
>     to monitor fragmentation without having to actually log and parse the gc logs.
>
>     -Todd
>
>
>     On Thu, Oct 11, 2012 at 7:50 PM, Srinivas Ramakrishna <ysr1729 at gmail.com
>     <mailto:ysr1729 at gmail.com>> wrote:
>
>         In the absence of fragmentation, one would normally expect the max
>         chunk size of the CMS generation
>         to stabilize at some reasonable value, say after some 10's of CMS GC
>         cycles. If it doesn't, you should try
>         and use a larger heap, or otherwise reshape the heap to reduce
>         promotion rates. In my experience,
>         CMS seems to work best if its "duty cycle" is of the order of 1-2 %,
>         i.e. there are 50 to 100 times more
>         scavenges during the interval that it's not running vs the interva
>         during which it is running.
>
>         Have Nagios grep the GC log file w/PrintFLSStatistics=2 for the string
>         "Max  Chunk Size:" and pick the
>         numeric component of every (4n+1)th match. The max chunk size will
>         typically cycle within a small band,
>         once it has stabilized, returning always to a high value following a
>         CMS cycle's completion. If the upper envelope
>         of this keeps steadily declining over some 10's of CMS GC cycles, then
>         you are probably seeing fragmentation
>         that will eventually succumb to fragmentation.
>
>         You can probably calibrate a threshold for the upper envelope so that
>         if it falls below that threshold you will
>         be alerted by Nagios that a closer look is in order.
>
>         At least something along those lines should work. The toughest part is
>         designing your "filter" to detect the
>         fall in the upper envelope. You will probably want to plot the metric,
>         then see what kind of filter will detect
>         the condition.... Sorry this isn't much concrete help, but hopefully
>         it gives you some ideas to work in
>         the right direction...
>
>         -- ramki
>
>         On Thu, Oct 11, 2012 at 4:27 PM, roz dev <rozdev29 at gmail.com
>         <mailto:rozdev29 at gmail.com>> wrote:
>
>             Hi All
>
>             I am using Java 6u23, with CMS GC. I see that sometime Application
>             gets paused for longer time because of excessive heap fragmentation.
>
>             I have enabled PrintFLSStatistics flag and following is the log
>
>
>             2012-10-09T15:38:44.724-0400: 52404.306: [GC Before GC:
>             Statistics for BinaryTreeDictionary:
>             ------------------------------------
>             Total Free Space: -668151027
>             Max   Chunk Size: 1976112973
>             Number of Blocks: 175445
>             Av.  Block  Size: 20672
>             Tree      Height: 78
>             Before GC:
>             Statistics for BinaryTreeDictionary:
>             ------------------------------------
>             Total Free Space: 10926
>             Max   Chunk Size: 1660
>             Number of Blocks: 22
>             Av.  Block  Size: 496
>             Tree      Height: 7
>
>
>             I would like to know from people about the way they track Heap
>             Fragmentation and how do we alert for this situation?
>
>             We use Nagios and I am wondering if there is a way to parse these
>             logs and know the max chunk size so that we can alert for it.
>
>             Any inputs are welcome.
>
>             -Saroj
>
>
>
>
>             _______________________________________________
>             hotspot-gc-use mailing list
>             hotspot-gc-use at openjdk.java.net
>             <mailto:hotspot-gc-use at openjdk.java.net>
>             http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>         _______________________________________________
>         hotspot-gc-use mailing list
>         hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>         http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>
>     --
>     Todd Lipcon
>     Software Engineer, Cloudera
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jesper_wilhelmsson.vcf
Type: text/x-vcard
Size: 236 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121012/cc8087e8/jesper_wilhelmsson-0001.vcf