Multi-second ParNew collections but stable CMS

Thu Dec 18 08:05:38 UTC 2014

Hi,

I see a significant increase in systime which (to my experience) usually is
because of either page swaps (some parts of the heap has to be paged in in
STW phase), or long latency for GC writing the logs to disc (which is a
synchronous operation as part of GC cycle).

When these multi-second YGCs occur, have you noticed an increase of page
swaps? What is the resident size of this particular Java process? Could you
try to write the GC logs to a RAM disc and see if the problem goes away?

Best Regards,
Gustav Åkesson

On Thu, Dec 18, 2014 at 12:12 AM, Ashwin Jayaprakash <
ashwin.jayaprakash at gmail.com> wrote:
>
> I've uploaded our latest GC log files to -
> https://drive.google.com/file/d/0Bw3dCdVLk-NvV3ozdkNacU5SU2M/view?usp=sharing
>
> I've also summarized the top pause times by running "grep -oE "Times:
> user=.*" gc.log.0 | sort -nr | head -25"
>
> Times: user=7.89 sys=0.55, real=0.65 secs]
> Times: user=7.71 sys=4.59, real=1.10 secs]
> Times: user=7.46 sys=0.32, real=0.67 secs]
> Times: user=6.55 sys=0.96, real=0.68 secs]
> Times: user=6.40 sys=0.27, real=0.57 secs]
> Times: user=6.27 sys=0.65, real=0.55 secs]
> Times: user=6.24 sys=0.29, real=0.52 secs]
> Times: user=5.25 sys=0.26, real=0.45 secs]
> Times: user=4.95 sys=0.49, real=0.53 secs]
> Times: user=4.90 sys=0.54, real=0.45 secs]
> Times: user=4.55 sys=1.46, real=0.61 secs]
> Times: user=4.39 sys=0.26, real=0.40 secs]
> Times: user=3.61 sys=0.39, real=0.50 secs]
> Times: user=3.59 sys=0.18, real=0.35 secs]
> Times: user=3.16 sys=0.00, real=3.17 secs]
> Times: user=3.06 sys=0.14, real=0.25 secs]
> Times: user=3.05 sys=0.24, real=0.33 secs]
> Times: user=3.03 sys=0.14, real=0.25 secs]
> Times: user=2.97 sys=0.38, real=0.33 secs]
> Times: user=2.77 sys=0.14, real=0.25 secs]
> Times: user=2.51 sys=0.08, real=0.22 secs]
> Times: user=2.49 sys=0.13, real=0.21 secs]
> Times: user=2.25 sys=0.32, real=0.26 secs]
> Times: user=2.06 sys=0.12, real=0.19 secs]
> Times: user=2.06 sys=0.11, real=0.17 secs]
>
> I wonder if we should enable "UseCondCardMark"?
>
> Thanks.
>
>
>
>
>
>
>
> On Tue, Dec 16, 2014 at 8:47 PM, Ashwin Jayaprakash <
> ashwin.jayaprakash at gmail.com> wrote:
>>
>> Hi, we have a cluster of ElasticSearch servers running with 31G heap and
>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>>
>> While our old gen seems to be very stable with about 40% usage and no
>> Full GCs so far, our young gen keeps growing from ~50MB to 850MB every few
>> seconds. These ParNew collections are taking anywhere between 1-7 seconds
>> and is causing some of our requests to time out. The eden space keeps
>> filling up and then cleared every 30-60 seconds. There is definitely work
>> being done by our JVM in terms of caching/buffering objects for a few
>> seconds, writing to disk and then clearing the objects (typical
>> Lucene/ElasticSearch indexing and querying workload)
>>
>> These long pauses are not good for our server throughput and I was doing
>> some reading. I got some conflicting reports on how Cassandra is configured
>> compared to Hadoop. There are also references
>> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
>> to this old ParNew+CMS bug
>> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
>> would've been addressed in the JRE version we are using. Cassandra
>> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning>
>> a larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
>> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>>
>> Since most of our allocations seem to be quite short lived, is there a
>> way to avoid these "long" young gen pauses?
>>
>> Thanks in advance. Here are some details.
>>
>> *Heap settings:*
>> java -Xmx31000m -Xms31000m
>> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
>> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=70
>> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
>> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
>> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
>> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
>> -XX:+PrintGCDetails
>>
>>
>> *Last few lines of "kill -3 pid" output:*
>> Heap
>>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
>> 0x00007fa1d2190000, 0x00007fa1d2190000)
>>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
>> 0x00007fa1c4950000)
>>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
>> 0x00007fa1d2190000)
>>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
>> 0x00007fa1cb570000)
>>  concurrent mark-sweep generation total 30636480K, used 12036523K
>> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>>  concurrent-mark-sweep perm gen total 128856K, used 77779K
>> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>>
>> *Sample gc log:*
>> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
>> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
>> - age   1:    2956312 bytes,    2956312 total
>> - age   2:     591800 bytes,    3548112 total
>> - age   3:      66216 bytes,    3614328 total
>> - age   4:     270752 bytes,    3885080 total
>> - age   5:     615472 bytes,    4500552 total
>> - age   6:     358440 bytes,    4858992 total
>> : 900635K->8173K(996800K), 0.0317340 secs]
>> 1352217K->463460K(31633280K)After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: -433641480
>> Max   Chunk Size: -433641480
>> Number of Blocks: 1
>> Av.  Block  Size: -433641480
>> Tree      Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> ------------------------------------
>> Total Free Space: 1227
>> Max   Chunk Size: 631
>> Number of Blocks: 3
>> Av.  Block  Size: 409
>> Tree      Height: 3
>> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>>
>>
>> Ashwin Jayaprakash.
>>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141218/168c75ed/attachment-0001.html>