Multi-second ParNew collections but stable CMS

Fri Dec 19 22:10:24 UTC 2014

Disabling transparent huge pages should help those GC pauses where you are seeing high sys time reported, which should also shorten their pause times.

Thanks for also sharing your observation of khugepaged/pages_collapsed.

charlie

> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com> wrote:
> 
> @Charlie/@Holger My apologies, THP is indeed enabled. I misread the "never" and thought it was already done. In fact "cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed 904 and after an hour now, it shows 6845 on one of our machines.
> 
> @Jon I dug through some of our ElasticSearch/application logs again and tried to correlate them with the GC logs. The collection time does seem to match the GC log's "real" time. However the collected sizes don't seem to match, which is what threw me off.
> 
> Item 1:
> 
> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   31568024 bytes,   31568024 total
> - age   2:    1188576 bytes,   32756600 total
> - age   3:    1830920 bytes,   34587520 total
> - age   4:     282536 bytes,   34870056 total
> - age   5:     316640 bytes,   35186696 total
> - age   6:     249856 bytes,   35436552 total
> : 931773K->49827K(996800K), 1.3622770 secs] 22132844K->21256042K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1206815932
> Max   Chunk Size: 1206815932
> Number of Blocks: 1
> Av.  Block  Size: 1206815932
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6189459
> Max   Chunk Size: 6188544
> Number of Blocks: 3
> Av.  Block  Size: 2063153
> Tree      Height: 2
> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application threads were stopped: 1.3638970 seconds
> 
> 
> [2014-12-18T21:34:57,203Z]  [WARN ]  [elasticsearch[server00001][scheduler][T#1]]  [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young] [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old] [20.2gb]->[20.2gb]/[29.2gb]}
> 
> 
> Item 2:
> 
> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   32445776 bytes,   32445776 total
> - age   3:    6068000 bytes,   38513776 total
> - age   4:    1031528 bytes,   39545304 total
> - age   5:     255896 bytes,   39801200 total
> : 939702K->53536K(996800K), 2.9352940 secs] 21501296K->20625925K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1287922158
> Max   Chunk Size: 1287922158
> Number of Blocks: 1
> Av.  Block  Size: 1287922158
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6205476
> Max   Chunk Size: 6204928
> Number of Blocks: 2
> Av.  Block  Size: 3102738
> Tree      Height: 2
> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application threads were stopped: 2.9367640 seconds
> 
> 
> [2014-12-18T20:53:37,950Z]  [WARN ]  [elasticsearch[server00001][scheduler][T#1]]  [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young] [801.7mb]->[2.4mb]/[865.3mb]}{[survivor] [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
> 
> 
> Item 3:
> 
> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -966776244
> Max   Chunk Size: -966776244
> Number of Blocks: 1
> Av.  Block  Size: -966776244
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
> - age   1:  113315624 bytes,  113315624 total
> : 996800K->110720K(996800K), 7.3511710 secs] 5609422K->5065102K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -1009955715
> Max   Chunk Size: -1009955715
> Number of Blocks: 1
> Av.  Block  Size: -1009955715
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application threads were stopped: 7.3525250 seconds
> 
> 
> [2014-12-17T14:42:17,944Z] [WARN ] [elasticsearch[prdaes04data03][scheduler][T#1]] [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young] [695.5mb]->[14.4mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
> 
> 
> 
> 
> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
> The output:
>> cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
> 
> Tells me that transparent huge pages are enabled “always”.
> 
> I think I would change this to “never”, even though sysctl -a may be reporting no huge pages are currently in use. The system may trying to coalesce pages occasionally in attempt to make huge pages available, even though you are not currently using any.
> 
> charlie
> 
> 
>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <ashwin.jayaprakash at gmail.com <mailto:ashwin.jayaprakash at gmail.com>> wrote:
>> 
>> @Jon, thanks for clearing that up. Yes, that was my source of confusion. I was misinterpreting the user time with the real time.
>> 
>> Jon's reply from an offline conversation:
>> Are these the 7 second collections you refer to in the paragraph above?
>> If yes, the "user" time is the sum of the time spent by multiple GC threads.
>> The real time is the GC pause time that your application experiences.
>> In the above case the GC pauses are .65s, 1.10s and .67s.
>> 
>> Something that added to my confusion was the tools we are using in-house. In addition to the GC logs we have 1 tool that uses the GarbageCollectorMXBean's getCollectionTime() method. This does not seem to match the values I see in the GC logs (http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29 <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29>).
>> 
>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's LastGCInfo (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194> and https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187 <https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187>).
>> 
>> Do these methods expose the total time spent by all the parallel GC threads for the ParNew pool or the "real" time? They do not seem to match the GC log times.
>> 
>> @Gustav We do not have any swapping on the machines. It could be the disk IO experienced by the GC log writer itself, as you've suspected. The machine has 128G of RAM
>> 
>> "top" sample from a similar machine:
>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1        2408:05 java
>> 
>> "free -g":
>>              total       used       free     shared    buffers     cached
>> Mem:           120        119          0          0          0         95
>> -/+ buffers/cache:         23         96
>> Swap:            0          0          0
>> 
>> @Charlie Hugepages has already been disabled
>> 
>> sudo sysctl -a | grep hugepage
>> vm.nr_hugepages = 0
>> vm.nr_hugepages_mempolicy = 0
>> vm.hugepages_treat_as_movable = 0
>> vm.nr_overcommit_hugepages = 0
>> 
>> cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
>> 
>> 
>> Thanks all!
>> 
>> 
>> 
>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>> Ashwin,
>> 
>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode). 
>>> 
>>> While our old gen seems to be very stable with about 40% usage and no Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few seconds. These ParNew collections are taking anywhere between 1-7 seconds and is causing some of our requests to time out. The eden space keeps filling up and then cleared every 30-60 seconds. There is definitely work being done by our JVM in terms of caching/buffering objects for a few seconds, writing to disk and then clearing the objects (typical Lucene/ElasticSearch indexing and querying workload)
>> 
>> From you recent mail
>> 
>> Times: user=7.89 sys=0.55, real=0.65 secs]
>> Times: user=7.71 sys=4.59, real=1.10 secs]
>> Times: user=7.46 sys=0.32, real=0.67 secs]
>> 
>> Are these the 7 second collections you refer to in the paragraph above?
>> If yes, the "user" time is the sum of the time spent by multiple GC threads.
>> The real time is the GC pause time that your application experiences.
>> In the above case the GC pauses are .65s, 1.10s and .67s.
>> 
>> Comment below  regarding  "eden space keeps filling up".
>> 
>>> 
>>> These long pauses are not good for our server throughput and I was doing some reading. I got some conflicting reports on how Cassandra is configured compared to Hadoop. There are also references <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/> to this old ParNew+CMS bug <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought would've been addressed in the JRE version we are using. Cassandra recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>> 
>>> Since most of our allocations seem to be quite short lived, is there a way to avoid these "long" young gen pauses?
>>> 
>>> Thanks in advance. Here are some details.
>>> 
>>> Heap settings:
>>> java -Xmx31000m -Xms31000m 
>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m 
>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 
>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure 
>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution 
>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps 
>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails 
>>> 
>>> 
>>> Last few lines of "kill -3 pid" output:
>>> Heap
>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000, 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078, 0x00007fa1d2190000)
>>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000, 0x00007fa1cb570000)
>>>  concurrent mark-sweep generation total 30636480K, used 12036523K [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>> 
>>> Sample gc log:
>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:    2956312 bytes,    2956312 total
>>> - age   2:     591800 bytes,    3548112 total
>>> - age   3:      66216 bytes,    3614328 total
>>> - age   4:     270752 bytes,    3885080 total
>>> - age   5:     615472 bytes,    4500552 total
>>> - age   6:     358440 bytes,    4858992 total
>>> : 900635K->8173K(996800K), 0.0317340 secs] 1352217K->463460K(31633280K)After GC:
>> 
>> In this GC eden is at 900635k before the GC and is a 8173k after.  That GC fills up is
>> the expected behavior.  Is that what you were asking about above?  If not can you
>> send me an example of the "fills up" behavior.
>> 
>> Jon
>> 
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -433641480
>>> Max   Chunk Size: -433641480
>>> Number of Blocks: 1
>>> Av.  Block  Size: -433641480
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1227
>>> Max   Chunk Size: 631
>>> Number of Blocks: 3
>>> Av.  Block  Size: 409
>>> Tree      Height: 3
>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] 
>>> 
>>> 
>>> Ashwin Jayaprakash.
>>> 
>>> 
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141219/f5a4042a/attachment.html>