Multi-second ParNew collections but stable CMS

Wed Dec 17 23:12:14 UTC 2014

I've uploaded our latest GC log files to -
https://drive.google.com/file/d/0Bw3dCdVLk-NvV3ozdkNacU5SU2M/view?usp=sharing

I've also summarized the top pause times by running "grep -oE "Times:
user=.*" gc.log.0 | sort -nr | head -25"

Times: user=7.89 sys=0.55, real=0.65 secs]
Times: user=7.71 sys=4.59, real=1.10 secs]
Times: user=7.46 sys=0.32, real=0.67 secs]
Times: user=6.55 sys=0.96, real=0.68 secs]
Times: user=6.40 sys=0.27, real=0.57 secs]
Times: user=6.27 sys=0.65, real=0.55 secs]
Times: user=6.24 sys=0.29, real=0.52 secs]
Times: user=5.25 sys=0.26, real=0.45 secs]
Times: user=4.95 sys=0.49, real=0.53 secs]
Times: user=4.90 sys=0.54, real=0.45 secs]
Times: user=4.55 sys=1.46, real=0.61 secs]
Times: user=4.39 sys=0.26, real=0.40 secs]
Times: user=3.61 sys=0.39, real=0.50 secs]
Times: user=3.59 sys=0.18, real=0.35 secs]
Times: user=3.16 sys=0.00, real=3.17 secs]
Times: user=3.06 sys=0.14, real=0.25 secs]
Times: user=3.05 sys=0.24, real=0.33 secs]
Times: user=3.03 sys=0.14, real=0.25 secs]
Times: user=2.97 sys=0.38, real=0.33 secs]
Times: user=2.77 sys=0.14, real=0.25 secs]
Times: user=2.51 sys=0.08, real=0.22 secs]
Times: user=2.49 sys=0.13, real=0.21 secs]
Times: user=2.25 sys=0.32, real=0.26 secs]
Times: user=2.06 sys=0.12, real=0.19 secs]
Times: user=2.06 sys=0.11, real=0.17 secs]

I wonder if we should enable "UseCondCardMark"?

Thanks.

On Tue, Dec 16, 2014 at 8:47 PM, Ashwin Jayaprakash <
ashwin.jayaprakash at gmail.com> wrote:
>
> Hi, we have a cluster of ElasticSearch servers running with 31G heap and
> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).
>
> While our old gen seems to be very stable with about 40% usage and no Full
> GCs so far, our young gen keeps growing from ~50MB to 850MB every few
> seconds. These ParNew collections are taking anywhere between 1-7 seconds
> and is causing some of our requests to time out. The eden space keeps
> filling up and then cleared every 30-60 seconds. There is definitely work
> being done by our JVM in terms of caching/buffering objects for a few
> seconds, writing to disk and then clearing the objects (typical
> Lucene/ElasticSearch indexing and querying workload)
>
> These long pauses are not good for our server throughput and I was doing
> some reading. I got some conflicting reports on how Cassandra is configured
> compared to Hadoop. There are also references
> <http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
> to this old ParNew+CMS bug
> <http://bugs.java.com/view_bug.do?bug_id=6459113> which I thought
> would've been addressed in the JRE version we are using. Cassandra
> recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
> larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
> <http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.
>
> Since most of our allocations seem to be quite short lived, is there a way
> to avoid these "long" young gen pauses?
>
> Thanks in advance. Here are some details.
>
> *Heap settings:*
> java -Xmx31000m -Xms31000m
> -Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
> -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
> -XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
> -XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
> -XX:+PrintGCDetails
>
>
> *Last few lines of "kill -3 pid" output:*
> Heap
>  par new generation   total 996800K, used 865818K [0x00007fa18e800000,
> 0x00007fa1d2190000, 0x00007fa1d2190000)
>   eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
> 0x00007fa1c4950000)
>   from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
> 0x00007fa1d2190000)
>   to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
> 0x00007fa1cb570000)
>  concurrent mark-sweep generation total 30636480K, used 12036523K
> [0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
>  concurrent-mark-sweep perm gen total 128856K, used 77779K
> [0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)
>
> *Sample gc log:*
> 2014-12-11T23:32:16.121+0000: 710.618: [ParNew
> Desired survivor size 56688640 bytes, new threshold 6 (max 6)
> - age   1:    2956312 bytes,    2956312 total
> - age   2:     591800 bytes,    3548112 total
> - age   3:      66216 bytes,    3614328 total
> - age   4:     270752 bytes,    3885080 total
> - age   5:     615472 bytes,    4500552 total
> - age   6:     358440 bytes,    4858992 total
> : 900635K->8173K(996800K), 0.0317340 secs]
> 1352217K->463460K(31633280K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: -433641480
> Max   Chunk Size: -433641480
> Number of Blocks: 1
> Av.  Block  Size: -433641480
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1227
> Max   Chunk Size: 631
> Number of Blocks: 3
> Av.  Block  Size: 409
> Tree      Height: 3
> , 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]
>
>
> Ashwin Jayaprakash.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141217/e3e012fc/attachment.html>