CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled

Sun Jun 15 00:05:04 UTC 2014

Thanks for the answer Gustav,

The fact that you have been running in production for months makes me confident enough to try this on at least one our nodes… (this is actually cassandra)

Current GC related options are at the bottom - these nodes have 256G of RAM, and they aren’t swapping, and we are certainly used to a pause within the first 10 seconds or so, but the nodes haven’t even joined the ring yet, so we don’t really care. yeah ms != mx is bad; we want one heap size and to stick with it.

I will gather data via -XX:+CMSEdenChunksRecordAlways, however I’d be interested if a developer has an answer as to when we expect potential chunk recording… Otherwise I’ll have to go dig into the code a bit deeper - my assumption was that this call would not be in the inlined allocation code, but I had thought that even allocation of a new TLAB was inlined by the compilers - perhaps not.

Current GC related settings - note we were running with a lower CMSInitiatingOccupancyFraction until recently - seems to have gotten changed back by accident, but that is kind of tangential.

-Xms24576M
-Xmx24576M
-Xmn8192M
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=30
-XX:GCLogFileSize=20M
-XX:+PrintGCApplicationConcurrentTime

Thanks, Graham

P.S. Note tuning here is rather interesting since we use this cassandra cluster for lots of different data with very different usage patterns - sometimes we’ll suddenly dump 50G of data in over the course of a few minutes. Also cassandra doesn’t really mind a node being paused for a while due to GC, but things get a little more annoying if they pause at the same time… even though promotion failure can we worse for us (that is a separate issue), we’ve seen STW pauses up to about 6-8 seconds in re mark (presumably when things go horribly wrong and you only get one chunk). Basically I’m on a mission to minimize all pauses, since their effects can propagate (timeouts are very short in a lot of places)

I will report back with my findings

On Jun 14, 2014, at 6:29 PM, Gustav Åkesson <gustav.r.akesson at gmail.com> wrote:

> Hi,
> 
> Even though I won't answer all your questions I'd like to share my experience with these settings (plus additional thoughts) even though I haven't yet have had the time to dig into details.
> 
> We've been using these flags for several months in production (yes, Java 7 even before latest update release) and we've seen a lot of improvements for CMS old gen STW. During execution occasional initial mark of 1.5s could occur, but using these settings combined CMS pauses are consistently around ~100ms (on high-end machine as yours, they are 20-30ms). We're using 1gb and 2gb heaps with roughly half/half old/new. Obviously, YMMV but this is at least the behavior of this particular application - we've had nothing but positive outcome from using these settings. Additionally, the pauses are rather deterministic.
> 
> Not sure what your heap size settings are, but what I've also observed is that setting Xms != Xmx could also cause occasional long initial mark when heap capacity is slightly increased. I had a discussion a while back ( http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-February/001795.html ) regarding this, and this seems to be an issue with CMS.
> 
> Also, swapping/paging is another factor which could cause indeterministic / occasional long STW GCs. If you're on Linux, try swappiness=0 and see if pauses get more stable.
> 
> 
> Best Regards,
> Gustav Åkesson
> 
> 
> On Fri, Jun 13, 2014 at 6:48 AM, graham sanderson <graham at vast.com> wrote:
> I was investigating abortable preclean timeouts in our app (and associated long remark pause) so had a look at the old jdk6 code I had on my box, wondered about recording eden chunks during certain eden slow allocation paths (I wasn’t sure if TLAB allocation is just a CAS bump), and saw what looked perfect in the latest code, so was excited to install 1.7.0_60-b19
> 
> I wanted to ask what you consider the stability of these two options to be (I’m pretty sure at least the first one is new in this release)
> 
> I have just installed locally on my mac, and am aware of http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could reproduce, however I wasn’t able to reproduce it without -XX:-UseCMSCompactAtFullCollection (is this your understanding too?)
> 
> We are running our application with 8 gig young generation (6.4g eden), on boxes with 32 cores… so parallelism is good for short pauses
> 
> we already have
> 
> -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> 
> we have seen a few long(isn) initial marks, so 
> 
> -XX:+CMSParallelInitialMarkEnabled sounds good
> 
> as for 
> 
> -XX:+CMSEdenChunksRecordAlways
> 
> my question is: what constitutes a slow path such an eden chunk is potentially recorded… TLAB allocation, or more horrific things; basically (and I’ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely that I’ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in a highly multithread app than I would with sampling, or put another way… what sort of app allocation patterns if any might avoid the slow path altogether and might leave me with just one chunk?
> 
> Thanks,
> 
> Graham
> 
> P.S. less relevant I think, but our old generation is 16g
> P.P.S. I suspect the abortable preclean timeouts mostly happen after a burst of very high allocation rate followed by an almost complete lull… this is one of the patterns that can happen in our application
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140614/227a68e8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1574 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140614/227a68e8/smime.p7s>