Multi-second ParNew collections but stable CMS
Ashwin Jayaprakash
ashwin.jayaprakash at gmail.com
Mon Dec 22 17:49:50 UTC 2014
All, I'm happy to report that disabling THP made a big difference. We do
not see multi-second minor GCs in our cluster anymore.
Thanks for your help.
On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:
> Disabling transparent huge pages should help those GC pauses where you are
> seeing high sys time reported, which should also shorten their pause times.
>
> Thanks for also sharing your observation of khugepaged/pages_collapsed.
>
> charlie
>
> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>
> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
> "never" and thought it was already done. In fact "cat
> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed
> 904 and after an hour now, it shows 6845 on one of our machines.
>
> *@Jon* I dug through some of our ElasticSearch/application logs again and
> tried to correlate them with the GC logs. The collection time does seem to
> match the GC log's "real" time. However the collected sizes don't seem to
> match, which is what threw me off.
>
> *Item 1:*
>
> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age 1: 31568024 bytes, 31568024 total
> - age 2: 1188576 bytes, 32756600 total
> - age 3: 1830920 bytes, 34587520 total
> - age 4: 282536 bytes, 34870056 total
> - age 5: 316640 bytes, 35186696 total
> - age 6: 249856 bytes, 35436552 total
> : 931773K->49827K(996800K), 1.3622770 secs]
> 22132844K->21256042K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1206815932
> Max Chunk Size: 1206815932
> Number of Blocks: 1
> Av. Block Size: 1206815932
> Tree Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6189459
> Max Chunk Size: 6188544
> Number of Blocks: 3
> Av. Block Size: 2063153
> Tree Height: 2
> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which application
> threads were stopped: 1.3638970 seconds
>
>
> [2014-12-18T21:34:57,203Z] [WARN ]
> [elasticsearch[server00001][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm] [server00001] [gc][young][163563][20423] duration
> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
> [20.2gb]->[20.2gb]/[29.2gb]}
>
>
> *Item 2:*
>
> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age 1: 32445776 bytes, 32445776 total
> - age 3: 6068000 bytes, 38513776 total
> - age 4: 1031528 bytes, 39545304 total
> - age 5: 255896 bytes, 39801200 total
> : 939702K->53536K(996800K), 2.9352940 secs]
> 21501296K->20625925K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1287922158
> Max Chunk Size: 1287922158
> Number of Blocks: 1
> Av. Block Size: 1287922158
> Tree Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6205476
> Max Chunk Size: 6204928
> Number of Blocks: 2
> Av. Block Size: 3102738
> Tree Height: 2
> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which application
> threads were stopped: 2.9367640 seconds
>
>
> [2014-12-18T20:53:37,950Z] [WARN ]
> [elasticsearch[server00001][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm] [server00001] [gc][young][161091][19838] duration
> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>
>
> *Item 3:*
>
> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -966776244
> Max Chunk Size: -966776244
> Number of Blocks: 1
> Av. Block Size: -966776244
> Tree Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max Chunk Size: 268
> Number of Blocks: 2
> Av. Block Size: 265
> Tree Height: 2
> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
> - age 1: 113315624 bytes, 113315624 total
> : 996800K->110720K(996800K), 7.3511710 secs]
> 5609422K->5065102K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -1009955715
> Max Chunk Size: -1009955715
> Number of Blocks: 1
> Av. Block Size: -1009955715
> Tree Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 530
> Max Chunk Size: 268
> Number of Blocks: 2
> Av. Block Size: 265
> Tree Height: 2
> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which application
> threads were stopped: 7.3525250 seconds
>
>
> [2014-12-17T14:42:17,944Z] [WARN ]
> [elasticsearch[prdaes04data03][scheduler][T#1]]
> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration
> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>
>
>
>
> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>>
>> The output:
>>
>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>> [always] madvise never
>>
>>
>> Tells me that transparent huge pages are enabled “always”.
>>
>> I think I would change this to “never”, even though sysctl -a may be
>> reporting no huge pages are currently in use. The system may trying to
>> coalesce pages occasionally in attempt to make huge pages available, even
>> though you are not currently using any.
>>
>> charlie
>>
>>
>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
>> ashwin.jayaprakash at gmail.com> wrote:
>>
>> *@Jon*, thanks for clearing that up. Yes, that was my source of
>> confusion. I was misinterpreting the user time with the real time.
>>
>> *Jon's reply from an offline conversation:*
>>
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>> threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>
>>
>> Something that added to my confusion was the tools we are using in-house.
>> In addition to the GC logs we have 1 tool that uses the
>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
>> match the values I see in the GC logs (
>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
>> ).
>>
>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
>> LastGCInfo (
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
>> and
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
>> ).
>>
>> Do these methods expose the total time spent by all the parallel GC
>> threads for the ParNew pool or the "real" time? They do not seem to match
>> the GC log times.
>>
>> *@Gustav* We do not have any swapping on the machines. It could be the
>> disk IO experienced by the GC log writer itself, as you've suspected. The
>> machine has 128G of RAM
>>
>> *"top" sample from a similar machine:*
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 106856 xxx 20 0 68.9g 25g 9.9g S 72.4 21.1
>> 2408:05 java
>>
>>
>> *"free -g":*
>> total used free shared buffers cached
>> Mem: 120 119 0 0 0 95
>> -/+ buffers/cache: 23 96
>> Swap: 0 0 0
>>
>> *@Charlie* Hugepages has already been disabled
>>
>> *sudo sysctl -a | grep hugepage*
>> vm.nr_hugepages = 0
>> vm.nr_hugepages_mempolicy = 0
>> vm.hugepages_treat_as_movable = 0
>> vm.nr_overcommit_hugepages = 0
>>
>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>> [always] madvise never
>>
>>
>> Thanks all!
>>
>>
>>
>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>> wrote:
>>>
>>> Ashwin,
>>>
>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>
>>> Hi, we have a cluster of ElasticSearch servers running with 31G heap
>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>>
>>> While our old gen seems to be very stable with about 40% usage and no
>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>>> and is causing some of our requests to time out. The eden space keeps
>>> filling up and then cleared every 30-60 seconds. There is definitely work
>>> being done by our JVM in terms of caching/buffering objects for a few
>>> seconds, writing to disk and then clearing the objects (typical
>>> Lucene/ElasticSearch indexing and querying workload)
>>>
>>>
>>> From you recent mail
>>>
>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>>
>>> Are these the 7 second collections you refer to in the paragraph above?
>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>> threads.
>>> The real time is the GC pause time that your application experiences.
>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>
>>> Comment below regarding "eden space keeps filling up".
>>>
>>>
>>> These long pauses are not good for our server throughput and I was
>>> doing some reading. I got some conflicting reports on how Cassandra is
>>> configured compared to Hadoop. There are also references
>>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>> to this old ParNew+CMS bug
>>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>> would've been addressed in the JRE version we are using. Cassandra
>>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
>>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>>
>>> Since most of our allocations seem to be quite short lived, is there a
>>> way to avoid these "long" young gen pauses?
>>>
>>> Thanks in advance. Here are some details.
>>>
>>> * Heap settings:*
>>> java -Xmx31000m -Xms31000m
>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>> -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>>
>>>
>>> * Last few lines of "kill -3 pid" output:*
>>> Heap
>>> par new generation total 996800K, used 865818K [0x00007fa18e800000,
>>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>> eden space 886080K, 94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
>>> 0x00007fa1c4950000)
>>> from space 110720K, 25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
>>> 0x00007fa1d2190000)
>>> to space 110720K, 0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
>>> 0x00007fa1cb570000)
>>> concurrent mark-sweep generation total 30636480K, used 12036523K
>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>> concurrent-mark-sweep perm gen total 128856K, used 77779K
>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>
>>> *Sample gc log:*
>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age 1: 2956312 bytes, 2956312 total
>>> - age 2: 591800 bytes, 3548112 total
>>> - age 3: 66216 bytes, 3614328 total
>>> - age 4: 270752 bytes, 3885080 total
>>> - age 5: 615472 bytes, 4500552 total
>>> - age 6: 358440 bytes, 4858992 total
>>> : 900635K->8173K(996800K), 0.0317340 secs]
>>> 1352217K->463460K(31633280K)After GC:
>>>
>>>
>>> In this GC eden is at 900635k before the GC and is a 8173k after. That
>>> GC fills up is
>>> the expected behavior. Is that what you were asking about above? If
>>> not can you
>>> send me an example of the "fills up" behavior.
>>>
>>> Jon
>>>
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -433641480
>>> Max Chunk Size: -433641480
>>> Number of Blocks: 1
>>> Av. Block Size: -433641480
>>> Tree Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1227
>>> Max Chunk Size: 631
>>> Number of Blocks: 3
>>> Av. Block Size: 409
>>> Tree Height: 3
>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>>
>>>
>>> Ashwin Jayaprakash.
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/89d91d30/attachment.html>
More information about the hotspot-gc-use
mailing list