Multi-second ParNew collections but stable CMS

Ashwin Jayaprakash ashwin.jayaprakash at gmail.com
Mon Dec 22 21:32:06 UTC 2014


Just to summarize, we did disable THP and noticed minor GC times going down
considerably. What still puzzles me is that the OS still reports huge pages
in use but only a little bit - some food for thought:


[user1 at server0001 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]


[user1 at server0001 ~]# cat
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
0


[user1 at server0001 ~]# grep AnonHugePages /proc/meminfo
AnonHugePages:    331776 kB


[user1 at server0001 ~]# egrep 'trans|thp' /proc/vmstat
nr_anon_transparent_hugepages 162
thp_fault_alloc 209
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 11


(Huge page use per process - https://access.redhat.com/solutions/46111)

[user1 at server0001 ~]# grep -e AnonHugePages  /proc/*/smaps | awk  '{
if($2>4) print $0} ' |  awk -F "/"  '{print $0; system("ps -fp " $3)} '
/proc/1330/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1330/smaps:AnonHugePages:    116736 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1330/smaps:AnonHugePages:     43008 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1330/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       1330   1310 23 Dec19 ?        16:38:37 java -Xmx31000m
-Xms31000m -Xss512k -XX:MaxPermSize=512m -XX:ReservedC
/proc/1346/smaps:AnonHugePages:     12288 kB


Thanks,
Ashwin Jayaprakash.







On Mon, Dec 22, 2014 at 12:45 PM, Gustav Åkesson <gustav.r.akesson at gmail.com
> wrote:

> Hi,
>
> Indeed thank you for reporting back on this. TIL about THP, so to say...
>
> Best Regards,
> GustavÅkesson
> Den 22 dec 2014 18:52 skrev "charlie hunt" <charlie.hunt at oracle.com>:
>
> Thanks for reporting back, and great to hear disabling THP has solved your
>> multi-second minor GCs issue. :-)
>>
>> charlie
>>
>> On Dec 22, 2014, at 11:49 AM, Ashwin Jayaprakash <
>> ashwin.jayaprakash at gmail.com> wrote:
>>
>> All, I'm happy to report that disabling THP made a big difference. We do
>> not see multi-second minor GCs in our cluster anymore.
>>
>> Thanks for your help.
>>
>> On Fri, Dec 19, 2014 at 2:10 PM, charlie hunt <charlie.hunt at oracle.com>
>> wrote:
>>
>>> Disabling transparent huge pages should help those GC pauses where you
>>> are seeing high sys time reported, which should also shorten their pause
>>> times.
>>>
>>> Thanks for also sharing your observation of khugepaged/pages_collapsed.
>>>
>>> charlie
>>>
>>> On Dec 18, 2014, at 4:41 PM, Ashwin Jayaprakash <
>>> ashwin.jayaprakash at gmail.com> wrote:
>>>
>>> *@Charlie/@Holger* My apologies, THP is indeed enabled. I misread the
>>> "never" and thought it was already done. In fact "cat
>>> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed" showed
>>> 904 and after an hour now, it shows 6845 on one of our machines.
>>>
>>> *@Jon* I dug through some of our ElasticSearch/application logs again
>>> and tried to correlate them with the GC logs. The collection time does seem
>>> to match the GC log's "real" time. However the collected sizes don't seem
>>> to match, which is what threw me off.
>>>
>>> *Item 1:*
>>>
>>> 2014-12-18T21:34:55.837+0000: 163793.979: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:   31568024 bytes,   31568024 total
>>> - age   2:    1188576 bytes,   32756600 total
>>> - age   3:    1830920 bytes,   34587520 total
>>> - age   4:     282536 bytes,   34870056 total
>>> - age   5:     316640 bytes,   35186696 total
>>> - age   6:     249856 bytes,   35436552 total
>>> : 931773K->49827K(996800K), 1.3622770 secs]
>>> 22132844K->21256042K(31633280K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1206815932
>>> Max   Chunk Size: 1206815932
>>> Number of Blocks: 1
>>> Av.  Block  Size: 1206815932
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 6189459
>>> Max   Chunk Size: 6188544
>>> Number of Blocks: 3
>>> Av.  Block  Size: 2063153
>>> Tree      Height: 2
>>> , 1.3625110 secs] [Times: user=15.92 sys=1.36, real=1.36 secs]
>>> 2014-12-18T21:34:57.200+0000: 163795.342: Total time for which
>>> application threads were stopped: 1.3638970 seconds
>>>
>>>
>>> [2014-12-18T21:34:57,203Z]  [WARN ]
>>> [elasticsearch[server00001][scheduler][T#1]]
>>> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][163563][20423] duration
>>> [1.3s], collections [1]/[2.1s], total [1.3s]/[17.9m], memory
>>> [20.7gb]->[20.2gb]/[30.1gb], all_pools {[young]
>>> [543.2mb]->[1mb]/[865.3mb]}{[survivor] [44.6mb]->[48.6mb]/[108.1mb]}{[old]
>>> [20.2gb]->[20.2gb]/[29.2gb]}
>>>
>>>
>>> *Item 2:*
>>>
>>> 2014-12-18T20:53:35.011+0000: 161313.153: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>> - age   1:   32445776 bytes,   32445776 total
>>> - age   3:    6068000 bytes,   38513776 total
>>> - age   4:    1031528 bytes,   39545304 total
>>> - age   5:     255896 bytes,   39801200 total
>>> : 939702K->53536K(996800K), 2.9352940 secs]
>>> 21501296K->20625925K(31633280K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1287922158
>>> Max   Chunk Size: 1287922158
>>> Number of Blocks: 1
>>> Av.  Block  Size: 1287922158
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 6205476
>>> Max   Chunk Size: 6204928
>>> Number of Blocks: 2
>>> Av.  Block  Size: 3102738
>>> Tree      Height: 2
>>> , 2.9355580 secs] [Times: user=33.82 sys=3.11, real=2.94 secs]
>>> 2014-12-18T20:53:37.947+0000: 161316.089: Total time for which
>>> application threads were stopped: 2.9367640 seconds
>>>
>>>
>>> [2014-12-18T20:53:37,950Z]  [WARN ]
>>> [elasticsearch[server00001][scheduler][T#1]]
>>> [org.elasticsearch.monitor.jvm]  [server00001] [gc][young][161091][19838] duration
>>> [2.9s], collections [1]/[3s], total [2.9s]/[17.3m], memory
>>> [20.4gb]->[19.6gb]/[30.1gb], all_pools {[young]
>>> [801.7mb]->[2.4mb]/[865.3mb]}{[survivor]
>>> [52.3mb]->[52.2mb]/[108.1mb]}{[old] [19.6gb]->[19.6gb]/[29.2gb]}
>>>
>>>
>>> *Item 3:*
>>>
>>> 2014-12-17T14:42:10.590+0000: 52628.731: [GCBefore GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -966776244
>>> Max   Chunk Size: -966776244
>>> Number of Blocks: 1
>>> Av.  Block  Size: -966776244
>>> Tree      Height: 1
>>> Before GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 530
>>> Max   Chunk Size: 268
>>> Number of Blocks: 2
>>> Av.  Block  Size: 265
>>> Tree      Height: 2
>>> 2014-12-17T14:42:10.590+0000: 52628.731: [ParNew
>>> Desired survivor size 56688640 bytes, new threshold 1 (max 6)
>>> - age   1:  113315624 bytes,  113315624 total
>>> : 996800K->110720K(996800K), 7.3511710 secs]
>>> 5609422K->5065102K(31633280K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: -1009955715
>>> Max   Chunk Size: -1009955715
>>> Number of Blocks: 1
>>> Av.  Block  Size: -1009955715
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 530
>>> Max   Chunk Size: 268
>>> Number of Blocks: 2
>>> Av.  Block  Size: 265
>>> Tree      Height: 2
>>> , 7.3514180 secs] [Times: user=36.50 sys=17.99, real=7.35 secs]
>>> 2014-12-17T14:42:17.941+0000: 52636.083: Total time for which
>>> application threads were stopped: 7.3525250 seconds
>>>
>>>
>>> [2014-12-17T14:42:17,944Z] [WARN ]
>>> [elasticsearch[prdaes04data03][scheduler][T#1]]
>>> [org.elasticsearch.monitor.jvm] [prdaes04data03] [gc][young][52582][5110] duration
>>> [7.3s], collections [1]/[7.6s], total [7.3s]/[4.2m], memory
>>> [5.1gb]->[4.8gb]/[30.1gb], all_pools {[young]
>>> [695.5mb]->[14.4mb]/[865.3mb]}{[survivor]
>>> [108.1mb]->[108.1mb]/[108.1mb]}{[old] [4.3gb]->[4.7gb]/[29.2gb]}
>>>
>>>
>>>
>>>
>>> On Thu, Dec 18, 2014 at 1:10 PM, charlie hunt <charlie.hunt at oracle.com>
>>> wrote:
>>>>
>>>> The output:
>>>>
>>>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>>>> [always] madvise never
>>>>
>>>>
>>>> Tells me that transparent huge pages are enabled “always”.
>>>>
>>>> I think I would change this to “never”, even though sysctl -a may be
>>>> reporting no huge pages are currently in use. The system may trying to
>>>> coalesce pages occasionally in attempt to make huge pages available, even
>>>> though you are not currently using any.
>>>>
>>>> charlie
>>>>
>>>>
>>>> On Dec 18, 2014, at 2:00 PM, Ashwin Jayaprakash <
>>>> ashwin.jayaprakash at gmail.com> wrote:
>>>>
>>>> *@Jon*, thanks for clearing that up. Yes, that was my source of
>>>> confusion. I was misinterpreting the user time with the real time.
>>>>
>>>> *Jon's reply from an offline conversation:*
>>>>
>>>>> Are these the 7 second collections you refer to in the paragraph above?
>>>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>>>> threads.
>>>>> The real time is the GC pause time that your application experiences.
>>>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>>>
>>>>
>>>> Something that added to my confusion was the tools we are using
>>>> in-house. In addition to the GC logs we have 1 tool that uses the
>>>> GarbageCollectorMXBean's getCollectionTime() method. This does not seem to
>>>> match the values I see in the GC logs (
>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html#getCollectionTime%28%29
>>>> ).
>>>>
>>>> The other tool is the ElasticSearch JVM stats logger which uses GarbageCollectorMXBean's
>>>> LastGCInfo (
>>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmStats.java#L194
>>>> and
>>>> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/monitor/jvm/JvmMonitorService.java#L187
>>>> ).
>>>>
>>>> Do these methods expose the total time spent by all the parallel GC
>>>> threads for the ParNew pool or the "real" time? They do not seem to match
>>>> the GC log times.
>>>>
>>>> *@Gustav* We do not have any swapping on the machines. It could be the
>>>> disk IO experienced by the GC log writer itself, as you've suspected. The
>>>> machine has 128G of RAM
>>>>
>>>> *"top" sample from a similar machine:*
>>>>    PID    USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 106856 xxx          20   0   68.9g  25g   9.9g  S 72.4     21.1
>>>> 2408:05 java
>>>>
>>>>
>>>> *"free -g":*
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:           120        119          0          0          0
>>>> 95
>>>> -/+ buffers/cache:         23         96
>>>> Swap:            0          0          0
>>>>
>>>> *@Charlie* Hugepages has already been disabled
>>>>
>>>> *sudo sysctl -a | grep hugepage*
>>>> vm.nr_hugepages = 0
>>>> vm.nr_hugepages_mempolicy = 0
>>>> vm.hugepages_treat_as_movable = 0
>>>> vm.nr_overcommit_hugepages = 0
>>>>
>>>> *cat /sys/kernel/mm/transparent_hugepage/enabled*
>>>> [always] madvise never
>>>>
>>>>
>>>> Thanks all!
>>>>
>>>>
>>>>
>>>> On Wed, Dec 17, 2014 at 5:00 PM, Jon Masamitsu <
>>>> jon.masamitsu at oracle.com> wrote:
>>>>>
>>>>>  Ashwin,
>>>>>
>>>>> On 12/16/2014 8:47 PM, Ashwin Jayaprakash wrote:
>>>>>
>>>>>  Hi, we have a cluster of ElasticSearch servers running with 31G heap
>>>>> and OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>>>>
>>>>> While our old gen seems to be very stable with about 40% usage and no
>>>>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>>>>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>>>>> and is causing some of our requests to time out. The eden space keeps
>>>>> filling up and then cleared every 30-60 seconds. There is definitely work
>>>>> being done by our JVM in terms of caching/buffering objects for a few
>>>>> seconds, writing to disk and then clearing the objects (typical
>>>>> Lucene/ElasticSearch indexing and querying workload)
>>>>>
>>>>>
>>>>> From you recent mail
>>>>>
>>>>> Times: user=7.89 sys=0.55, real=0.65 secs]
>>>>> Times: user=7.71 sys=4.59, real=1.10 secs]
>>>>> Times: user=7.46 sys=0.32, real=0.67 secs]
>>>>>
>>>>> Are these the 7 second collections you refer to in the paragraph above?
>>>>> If yes, the "user" time is the sum of the time spent by multiple GC
>>>>> threads.
>>>>> The real time is the GC pause time that your application experiences.
>>>>> In the above case the GC pauses are .65s, 1.10s and .67s.
>>>>>
>>>>> Comment below  regarding  "eden space keeps filling up".
>>>>>
>>>>>
>>>>>  These long pauses are not good for our server throughput and I was
>>>>> doing some reading. I got some conflicting reports on how Cassandra is
>>>>> configured compared to Hadoop. There are also references
>>>>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>>>>> to this old ParNew+CMS bug
>>>>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>>>>> would've been addressed in the JRE version we are using. Cassandra
>>>>> recommends
>>>>> <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a larger
>>>>> NewSize with just 1 for max tenuring, whereas Hadoop recommends
>>>>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>>>>
>>>>>  Since most of our allocations seem to be quite short lived, is there
>>>>> a way to avoid these "long" young gen pauses?
>>>>>
>>>>>  Thanks in advance. Here are some details.
>>>>>
>>>>> * Heap settings:*
>>>>> java -Xmx31000m -Xms31000m
>>>>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>>>>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>>>>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>>>>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>>>>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC
>>>>> -XX:PrintFLSStatistics=1 -XX:+PrintGCDetails
>>>>>
>>>>>
>>>>> * Last few lines of "kill -3 pid" output:*
>>>>> Heap
>>>>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>>>>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>>>>   eden space 886080K,  94% used [0x00007fa18e800000,
>>>>> 0x00007fa1c1a659e0, 0x00007fa1c4950000)
>>>>>   from space 110720K,  25% used [0x00007fa1cb570000,
>>>>> 0x00007fa1cd091078, 0x00007fa1d2190000)
>>>>>   to   space 110720K,   0% used [0x00007fa1c4950000,
>>>>> 0x00007fa1c4950000, 0x00007fa1cb570000)
>>>>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>>>>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>>>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>>>>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>>>>
>>>>>  *Sample gc log:*
>>>>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>>>>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>>>>> - age   1:    2956312 bytes,    2956312 total
>>>>> - age   2:     591800 bytes,    3548112 total
>>>>> - age   3:      66216 bytes,    3614328 total
>>>>> - age   4:     270752 bytes,    3885080 total
>>>>> - age   5:     615472 bytes,    4500552 total
>>>>> - age   6:     358440 bytes,    4858992 total
>>>>> : 900635K->8173K(996800K), 0.0317340 secs]
>>>>> 1352217K->463460K(31633280K)After GC:
>>>>>
>>>>>
>>>>> In this GC eden is at 900635k before the GC and is a 8173k after.
>>>>> That GC fills up is
>>>>> the expected behavior.  Is that what you were asking about above?  If
>>>>> not can you
>>>>> send me an example of the "fills up" behavior.
>>>>>
>>>>> Jon
>>>>>
>>>>>   Statistics for BinaryTreeDictionary:
>>>>> ------------------------------------
>>>>> Total Free Space: -433641480
>>>>> Max   Chunk Size: -433641480
>>>>> Number of Blocks: 1
>>>>> Av.  Block  Size: -433641480
>>>>> Tree      Height: 1
>>>>> After GC:
>>>>> Statistics for BinaryTreeDictionary:
>>>>> ------------------------------------
>>>>> Total Free Space: 1227
>>>>> Max   Chunk Size: 631
>>>>> Number of Blocks: 3
>>>>> Av.  Block  Size: 409
>>>>> Tree      Height: 3
>>>>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>>>>
>>>>>
>>>>>  Ashwin Jayaprakash.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141222/e4c83ca2/attachment-0001.html>


More information about the hotspot-gc-use mailing list