Multi-second ParNew collections but stable CMS

Wed Dec 17 04:47:36 UTC 2014

Hi, we have a cluster of ElasticSearch servers running with 31G heap and
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode).

While our old gen seems to be very stable with about 40% usage and no Full
GCs so far, our young gen keeps growing from ~50MB to 850MB every few
seconds. These ParNew collections are taking anywhere between 1-7 seconds
and is causing some of our requests to time out. The eden space keeps
filling up and then cleared every 30-60 seconds. There is definitely work
being done by our JVM in terms of caching/buffering objects for a few
seconds, writing to disk and then clearing the objects (typical
Lucene/ElasticSearch indexing and querying workload)

These long pauses are not good for our server throughput and I was doing
some reading. I got some conflicting reports on how Cassandra is configured
compared to Hadoop. There are also references
<http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/>
to this old ParNew+CMS bug <http://bugs.java.com/view_bug.do?bug_id=6459113>
which I thought would've been addressed in the JRE version we are
using. Cassandra
recommends <http://aryanet.com/blog/cassandra-garbage-collector-tuning> a
larger NewSize with just 1 for max tenuring, whereas Hadoop recommends
<http://wiki.apache.org/hadoop/PerformanceTuning> a small NewSize.

Since most of our allocations seem to be quite short lived, is there a way
to avoid these "long" young gen pauses?

Thanks in advance. Here are some details.

*Heap settings:*
java -Xmx31000m -Xms31000m
-Xss512k -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:+PrintPromotionFailure
-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintTenuringDistribution
-XX:GCLogFileSize=512m -XX:NumberOfGCLogFiles=2 -XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation -XX:+DisableExplicitGC -XX:PrintFLSStatistics=1
-XX:+PrintGCDetails

*Last few lines of "kill -3 pid" output:*
Heap
 par new generation   total 996800K, used 865818K [0x00007fa18e800000,
0x00007fa1d2190000, 0x00007fa1d2190000)
  eden space 886080K,  94% used [0x00007fa18e800000, 0x00007fa1c1a659e0,
0x00007fa1c4950000)
  from space 110720K,  25% used [0x00007fa1cb570000, 0x00007fa1cd091078,
0x00007fa1d2190000)
  to   space 110720K,   0% used [0x00007fa1c4950000, 0x00007fa1c4950000,
0x00007fa1cb570000)
 concurrent mark-sweep generation total 30636480K, used 12036523K
[0x00007fa1d2190000, 0x00007fa920000000, 0x00007fa920000000)
 concurrent-mark-sweep perm gen total 128856K, used 77779K
[0x00007fa920000000, 0x00007fa927dd6000, 0x00007fa940000000)

*Sample gc log:*
2014-12-11T23:32:16.121+0000: 710.618: [ParNew
Desired survivor size 56688640 bytes, new threshold 6 (max 6)
- age   1:    2956312 bytes,    2956312 total
- age   2:     591800 bytes,    3548112 total
- age   3:      66216 bytes,    3614328 total
- age   4:     270752 bytes,    3885080 total
- age   5:     615472 bytes,    4500552 total
- age   6:     358440 bytes,    4858992 total
: 900635K->8173K(996800K), 0.0317340 secs]
1352217K->463460K(31633280K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: -433641480
Max   Chunk Size: -433641480
Number of Blocks: 1
Av.  Block  Size: -433641480
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1227
Max   Chunk Size: 631
Number of Blocks: 3
Av.  Block  Size: 409
Tree      Height: 3
, 0.0318920 secs] [Times: user=0.38 sys=0.01, real=0.03 secs]

Ashwin Jayaprakash.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20141216/2ee52398/attachment-0001.html>