Multi-second ParNew collections but stable CMS

Thu Dec 18 20:23:56 UTC 2014

On 12/18/2014 12:00 PM, Ashwin Jayaprakash wrote:
> *@Jon*, thanks for clearing that up. Yes, that was my source of 
> confusion. I was misinterpreting the user time with the real time.
>
> *Jon's reply from an offline conversation:*
>
>     Are these the 7 second collections you refer to in the paragraph
>     above?
>     If yes, the "user" time is the sum of the time spent by multiple
>     GC threads.
>     The real time is the GC pause time that your application experiences.
>     In the above case the GC pauses are .65s, 1.10s and .67s.
>
>
> Something that added to my confusion was the tools we are using 
> in-house. In addition to the GC logs we have 1 tool that uses the 
> GarbageCollectorMXBean's getCollectionTime() method. This does not 
> seem to match the values I see in the GC logs 
> (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29).
>
> The other tool is the ElasticSearch JVM stats logger which uses 
> GarbageCollectorMXBean's LastGCInfo 
> (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 
> and 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187).
>
> Do these methods expose the total time spent by all the parallel GC 
> threads for the ParNew pool or the "real" time? They do not seem to 
> match the GC log times.

I haven't found the JVM code that provides information for 
getCollectionTime() but
I would expect to match the pause times in the GC logs.  From your 
earlier mail

> *Sample gc log:*
> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:    2956312 bytes,    2956312 total
> - age   2:     591800 bytes,    3548112 total
> - age   3:      66216 bytes,    3614328 total
> - age   4:     270752 bytes,    3885080 total
> - age   5:     615472 bytes,    4500552 total
> - age   6:     358440 bytes,    4858992 total
> : 900635K->8173K(996800K), 0.0317340 secs] 
> 1352217K->463460K(31633280K)After GC:

The time 0.0317340 is what the GC measures and is what I would expect to 
be available
through the MXBeans.

You're saying that does not match?

Jon

> *@Gustav* We do not have any swapping on the machines. It could be the 
> disk IO experienced by the GC log writer itself, as you've suspected. 
> The machine has 128G of RAM
>
> *"top" sample from a similar machine:*
>    PID    USER PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4 21.1        
> 2408:05 java
>
> *"free -g":
> *
> total       used       free     shared    buffers cached
> Mem:           120        119          0 0          0         95
> -/+ buffers/cache:         23         96
> Swap:            0          0          0
>
> *@Charlie* Hugepages has already been disabled
>
> *sudo sysctl -a | grep hugepage*
> vm.nr_hugepages = 0
> vm.nr_hugepages_mempolicy = 0
> vm.hugepages_treat_as_movable = 0
> vm.nr_overcommit_hugepages = 0
>
> *cat /sys/kernel/mm/transparent_hugepage/enabled*
> [always] madvise never
>
>
> Thanks all!
>
>
>
> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu 
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>
>     Ashwin,
>
>     On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>     Hi, we have a cluster of ElasticSearch servers running with 31G
>>     heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>
>>     While our old gen seems to be very stable with about 40% usage
>>     and no Full GCs so far, our young gen keeps growing from ~50MB to
>>     850MB every few seconds. These ParNew collections are taking
>>     anywhere between 1-7 seconds and is causing some of our requests
>>     to time out. The eden space keeps filling up and then cleared
>>     every 30-60 seconds. There is definitely work being done by our
>>     JVM in terms of caching/buffering objects for a few seconds,
>>     writing to disk and then clearing the objects (typical
>>     Lucene/ElasticSearch indexing and querying workload)
>
>     From you recent mail
>
>     Times: user=7.89 sys=0.55, real=0.65 secs]
>     Times: user=7.71 sys=4.59, real=1.10 secs]
>     Times: user=7.46 sys=0.32, real=0.67 secs]
>
>     Are these the 7 second collections you refer to in the paragraph
>     above?
>     If yes, the "user" time is the sum of the time spent by multiple
>     GC threads.
>     The real time is the GC pause time that your application experiences.
>     In the above case the GC pauses are .65s, 1.10s and .67s.
>
>     Comment below  regarding  "eden space keeps filling up".
>
>>
>>     These long pauses are not good for our server throughput and I
>>     was doing some reading. I got some conflicting reports on how
>>     Cassandra is configured compared to Hadoop. There are also
>>     references
>>     <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>     to this old ParNew+CMS bug
>>     <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>     would've been addressed in the JRE version we are using.
>>     Cassandra recommends
>>     <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
>>     larger NewSize with just 1 for max tenuring, whereas Hadoop
>>     recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a
>>     small NewSize.
>>
>>     Since most of our allocations seem to be quite short lived, is
>>     there a way to avoid these "long" young gen pauses?
>>
>>     Thanks in advance. Here are some details.
>>     *
>>     Heap settings:*
>>     java -Xmx31000m -Xms31000m
>>     -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>     -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>     -XX:CMSInitiatingOccupancyFraction=70
>>     -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC
>>     -XX:+PrintPromotionFailure
>>     -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>     -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2
>>     -XX:+PrintGCDateStamps
>>     -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>     -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>
>>     *
>>     Last few lines of "kill -3 pid" output:*
>>     Heap
>>      par new generation   total 996800K, used 865818K
>>     [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000)
>>       eden space 886080K,  94% used [0x00007fa18e800000,
>>     0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>       from space 110720K,  25% used [0x00007fa1cb570000,
>>     0x00007fa1cd091078, 0x00007fa1d2190000)
>>       to   space 110720K,   0% used [0x00007fa1c4950000,
>>     0x00007fa1c4950000, 0x00007fa1cb570000)
>>      concurrent mark-sweep generation total 30636480K, used 12036523K
>>     [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>      concurrent-mark-sweep perm gen total 128856K, used 77779K
>>     [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>     *
>>     *
>>     *Sample gc log:*
>>     2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>     Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>     - age   1:    2956312 bytes,    2956312 total
>>     - age   2:     591800 bytes,    3548112 total
>>     - age   3:      66216 bytes,    3614328 total
>>     - age   4:     270752 bytes,    3885080 total
>>     - age   5:     615472 bytes,    4500552 total
>>     - age   6:     358440 bytes,    4858992 total
>>     : 900635K->8173K(996800K), 0.0317340 secs]
>>     1352217K->463460K(31633280K)After GC:
>
>     In this GC eden is at 900635k before the GC and is a 8173k after. 
>     That GC fills up is
>     the expected behavior.  Is that what you were asking about above? 
>     If not can you
>     send me an example of the "fills up" behavior.
>
>     Jon
>
>>     Statistics for BinaryTreeDictionary:
>>     ------------------------------------
>>     Total Free Space: -433641480
>>     Max   Chunk Size: -433641480
>>     Number of Blocks: 1
>>     Av.  Block  Size: -433641480
>>     Tree      Height: 1
>>     After GC:
>>     Statistics for BinaryTreeDictionary:
>>     ------------------------------------
>>     Total Free Space: 1227
>>     Max   Chunk Size: 631
>>     Number of Blocks: 3
>>     Av.  Block  Size: 409
>>     Tree      Height: 3
>>     , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>
>>
>>     Ashwin Jayaprakash.
>>
>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/d426f2b2/attachment-0001.html>