Multi-second ParNew collections but stable CMS

Mon Dec 22 17:49:50 UTC 2014

All, I'm happy to report that disabling THP made a big difference. We do
not see multi-second minor GCs in our cluster anymore.

Thanks for your help.

On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:

> Disabling transparent huge pages should help those GC pauses where you are
> seeing high sys time reported, which should also shorten their pause times.
>
> Thanks for also sharing your observation of khugepaged/pages_collapsed.
>
> charlie
>
> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>
> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
> "never" and thought it was already done. In fact "cat
> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed
> 904 and after an hour now, it shows 6845 on one of our machines.
>
> *@Jon* I dug through some of our ElasticSearch/application logs again and
> tried to correlate them with the GC logs. The collection time does seem to
> match the GC log's "real" time. However the collected sizes don't seem to
> match, which is what threw me off.
>
> *Item 1:*
>
> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   31568024 bytes,   31568024 total
> - age   2:    1188576 bytes,   32756600 total
> - age   3:    1830920 bytes,   34587520 total
> - age   4:     282536 bytes,   34870056 total
> - age   5:     316640 bytes,   35186696 total
> - age   6:     249856 bytes,   35436552 total
> : 931773K->49827K(996800K), 1.3622770 secs]
> 22132844K->21256042K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1206815932
> Max   Chunk Size: 1206815932
> Number of Blocks: 1
> Av.  Block  Size: 1206815932
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6189459
> Max   Chunk Size: 6188544
> Number of Blocks: 3
> Av.  Block  Size: 2063153
> Tree      Height: 2
> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application
> threads were stopped: 1.3638970 seconds
>
>
> [2014-12-18T21:34:57,203Z]  [WARN ]
> [elasticsearch[server00001][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration
> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
> [20.2gb]->[20.2gb]/[29.2gb]}
>
>
> *Item 2:*
>
> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:   32445776 bytes,   32445776 total
> - age   3:    6068000 bytes,   38513776 total
> - age   4:    1031528 bytes,   39545304 total
> - age   5:     255896 bytes,   39801200 total
> : 939702K->53536K(996800K), 2.9352940 secs]
> 21501296K->20625925K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1287922158
> Max   Chunk Size: 1287922158
> Number of Blocks: 1
> Av.  Block  Size: 1287922158
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6205476
> Max   Chunk Size: 6204928
> Number of Blocks: 2
> Av.  Block  Size: 3102738
> Tree      Height: 2
> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application
> threads were stopped: 2.9367640 seconds
>
>
> [2014-12-18T20:53:37,950Z]  [WARN ]
> [elasticsearch[server00001][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration
> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>
>
> *Item 3:*
>
> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -966776244
> Max   Chunk Size: -966776244
> Number of Blocks: 1
> Av.  Block  Size: -966776244
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
> - age   1:  113315624 bytes,  113315624 total
> : 996800K->110720K(996800K), 7.3511710 secs]
> 5609422K->5065102K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -1009955715
> Max   Chunk Size: -1009955715
> Number of Blocks: 1
> Av.  Block  Size: -1009955715
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max   Chunk Size: 268
> Number of Blocks: 2
> Av.  Block  Size: 265
> Tree      Height: 2
> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application
> threads were stopped: 7.3525250 seconds
>
>
> [2014-12-17T14:42:17,944Z] [WARN ]
> [elasticsearch[prdaes04data03][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration
> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>
>
>
>
> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>>
>> The output:
>>
>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>> [always] madvise never
>>
>>
>> Tells me that transparent huge pages are enabled “always”.
>>
>> I think I would change this to “never”, even though sysctl -a may be
>> reporting no huge pages are currently in use. The system may trying to
>> coalesce pages occasionally in attempt to make huge pages available, even
>> though you are not currently using any.
>>
>> charlie
>>
>>
>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
>> ashwin.jayaprakash at gmail.com> wrote:
>>
>> *@Jon*, thanks for clearing that up. Yes, that was my source of
>> confusion. I was misinterpreting the user time with the real time.
>>
>> *Jon's reply from an offline conversation:*
>>
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>> threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>
>>
>> Something that added to my confusion was the tools we are using in-house.
>> In addition to the GC logs we have 1 tool that uses the
>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
>> match the values I see in the GC logs (
>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
>> ).
>>
>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
>> LastGCInfo (
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
>> and
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
>> ).
>>
>> Do these methods expose the total time spent by all the parallel GC
>> threads for the ParNew pool or the "real" time? They do not seem to match
>> the GC log times.
>>
>> *@Gustav* We do not have any swapping on the machines. It could be the
>> disk IO experienced by the GC log writer itself, as you've suspected. The
>> machine has 128G of RAM
>>
>> *"top" sample from a similar machine:*
>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
>> 2408:05 java
>>
>>
>> *"free -g":*
>>              total       used       free     shared    buffers     cached
>> Mem:           120        119          0          0          0         95
>> -/+ buffers/cache:         23         96
>> Swap:            0          0          0
>>
>> *@Charlie* Hugepages has already been disabled
>>
>> *sudo sysctl -a | grep hugepage*
>> vm.nr_hugepages = 0
>> vm.nr_hugepages_mempolicy = 0
>> vm.hugepages_treat_as_movable = 0
>> vm.nr_overcommit_hugepages = 0
>>
>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>> [always] madvise never
>>
>>
>> Thanks all!
>>
>>
>>
>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>> wrote:
>>>
>>>  Ashwin,
>>>
>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>
>>>  Hi, we have a cluster of ElasticSearch servers running with 31G heap
>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>>
>>> While our old gen seems to be very stable with about 40% usage and no
>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>>> and is causing some of our requests to time out. The eden space keeps
>>> filling up and then cleared every 30-60 seconds. There is definitely work
>>> being done by our JVM in terms of caching/buffering objects for a few
>>> seconds, writing to disk and then clearing the objects (typical
>>> Lucene/ElasticSearch indexing and querying workload)
>>>
>>>
>>> From you recent mail
>>>
>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>>
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>> threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>
>>> Comment below  regarding  "eden space keeps filling up".
>>>
>>>
>>>  These long pauses are not good for our server throughput and I was
>>> doing some reading. I got some conflicting reports on how Cassandra is
>>> configured compared to Hadoop. There are also references
>>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>> to this old ParNew+CMS bug
>>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>> would've been addressed in the JRE version we are using. Cassandra
>>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
>>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>>
>>>  Since most of our allocations seem to be quite short lived, is there a
>>> way to avoid these "long" young gen pauses?
>>>
>>>  Thanks in advance. Here are some details.
>>>
>>> * Heap settings:*
>>> java -Xmx31000m -Xms31000m
>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>> -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>>
>>>
>>> * Last few lines of "kill -3 pid" output:*
>>> Heap
>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
>>> 0x00007fa1c4950000)
>>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
>>> 0x00007fa1d2190000)
>>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
>>> 0x00007fa1cb570000)
>>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>
>>>  *Sample gc log:*
>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:    2956312 bytes,    2956312 total
>>> - age   2:     591800 bytes,    3548112 total
>>> - age   3:      66216 bytes,    3614328 total
>>> - age   4:     270752 bytes,    3885080 total
>>> - age   5:     615472 bytes,    4500552 total
>>> - age   6:     358440 bytes,    4858992 total
>>> : 900635K->8173K(996800K), 0.0317340 secs]
>>> 1352217K->463460K(31633280K)After GC:
>>>
>>>
>>> In this GC eden is at 900635k before the GC and is a 8173k after.  That
>>> GC fills up is
>>> the expected behavior.  Is that what you were asking about above?  If
>>> not can you
>>> send me an example of the "fills up" behavior.
>>>
>>> Jon
>>>
>>>   Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -433641480
>>> Max   Chunk Size: -433641480
>>> Number of Blocks: 1
>>> Av.  Block  Size: -433641480
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1227
>>> Max   Chunk Size: 631
>>> Number of Blocks: 3
>>> Av.  Block  Size: 409
>>> Tree      Height: 3
>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>>
>>>
>>>  Ashwin Jayaprakash.
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>  _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/89d91d30/attachment.html>