CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled

Sat Jun 21 16:52:45 UTC 2014

Note this works great for us too … given formatting in this email is a bit flaky, I’ll refer you to our numbers I posted in a Cassandra issue I opened to add these flags as defaults for ParNew/CMS (on the appropriate JVMs)

https://issues.apache.org/jira/browse/CASSANDRA-7432

On Jun 14, 2014, at 7:05 PM, graham sanderson <graham at vast.com> wrote:

> Thanks for the answer Gustav,
> 
> The fact that you have been running in production for months makes me confident enough to try this on at least one our nodes… (this is actually cassandra)
> 
> Current GC related options are at the bottom - these nodes have 256G of RAM, and they aren’t swapping, and we are certainly used to a pause within the first 10 seconds or so, but the nodes haven’t even joined the ring yet, so we don’t really care. yeah ms != mx is bad; we want one heap size and to stick with it.
> 
> I will gather data via -XX:+CMSEdenChunksRecordAlways, however I’d be interested if a developer has an answer as to when we expect potential chunk recording… Otherwise I’ll have to go dig into the code a bit deeper - my assumption was that this call would not be in the inlined allocation code, but I had thought that even allocation of a new TLAB was inlined by the compilers - perhaps not.
> 
> Current GC related settings - note we were running with a lower CMSInitiatingOccupancyFraction until recently - seems to have gotten changed back by accident, but that is kind of tangential.
> 
> -Xms24576M
> -Xmx24576M
> -Xmn8192M
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> -XX:+UseCondCardMark
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:PrintFLSStatistics=1
> -Xloggc:/var/log/cassandra/gc.log
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=30
> -XX:GCLogFileSize=20M
> -XX:+PrintGCApplicationConcurrentTime
> 
> Thanks, Graham
> 
> P.S. Note tuning here is rather interesting since we use this cassandra cluster for lots of different data with very different usage patterns - sometimes we’ll suddenly dump 50G of data in over the course of a few minutes. Also cassandra doesn’t really mind a node being paused for a while due to GC, but things get a little more annoying if they pause at the same time… even though promotion failure can we worse for us (that is a separate issue), we’ve seen STW pauses up to about 6-8 seconds in re mark (presumably when things go horribly wrong and you only get one chunk). Basically I’m on a mission to minimize all pauses, since their effects can propagate (timeouts are very short in a lot of places)
> 
> I will report back with my findings
> 
> On Jun 14, 2014, at 6:29 PM, Gustav Åkesson <gustav.r.akesson at gmail.com> wrote:
> 
>> Hi,
>> 
>> Even though I won't answer all your questions I'd like to share my experience with these settings (plus additional thoughts) even though I haven't yet have had the time to dig into details.
>> 
>> We've been using these flags for several months in production (yes, Java 7 even before latest update release) and we've seen a lot of improvements for CMS old gen STW. During execution occasional initial mark of 1.5s could occur, but using these settings combined CMS pauses are consistently around ~100ms (on high-end machine as yours, they are 20-30ms). We're using 1gb and 2gb heaps with roughly half/half old/new. Obviously, YMMV but this is at least the behavior of this particular application - we've had nothing but positive outcome from using these settings. Additionally, the pauses are rather deterministic.
>> 
>> Not sure what your heap size settings are, but what I've also observed is that setting Xms != Xmx could also cause occasional long initial mark when heap capacity is slightly increased. I had a discussion a while back ( http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-February/001795.html ) regarding this, and this seems to be an issue with CMS.
>> 
>> Also, swapping/paging is another factor which could cause indeterministic / occasional long STW GCs. If you're on Linux, try swappiness=0 and see if pauses get more stable.
>> 
>> 
>> Best Regards,
>> Gustav Åkesson
>> 
>> 
>> On Fri, Jun 13, 2014 at 6:48 AM, graham sanderson <graham at vast.com> wrote:
>> I was investigating abortable preclean timeouts in our app (and associated long remark pause) so had a look at the old jdk6 code I had on my box, wondered about recording eden chunks during certain eden slow allocation paths (I wasn’t sure if TLAB allocation is just a CAS bump), and saw what looked perfect in the latest code, so was excited to install 1.7.0_60-b19
>> 
>> I wanted to ask what you consider the stability of these two options to be (I’m pretty sure at least the first one is new in this release)
>> 
>> I have just installed locally on my mac, and am aware of http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could reproduce, however I wasn’t able to reproduce it without -XX:-UseCMSCompactAtFullCollection (is this your understanding too?)
>> 
>> We are running our application with 8 gig young generation (6.4g eden), on boxes with 32 cores… so parallelism is good for short pauses
>> 
>> we already have
>> 
>> -XX:+UseParNewGC 
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled
>> 
>> we have seen a few long(isn) initial marks, so 
>> 
>> -XX:+CMSParallelInitialMarkEnabled sounds good
>> 
>> as for 
>> 
>> -XX:+CMSEdenChunksRecordAlways
>> 
>> my question is: what constitutes a slow path such an eden chunk is potentially recorded… TLAB allocation, or more horrific things; basically (and I’ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely that I’ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in a highly multithread app than I would with sampling, or put another way… what sort of app allocation patterns if any might avoid the slow path altogether and might leave me with just one chunk?
>> 
>> Thanks,
>> 
>> Graham
>> 
>> P.S. less relevant I think, but our old generation is 16g
>> P.P.S. I suspect the abortable preclean timeouts mostly happen after a burst of very high allocation rate followed by an almost complete lull… this is one of the patterns that can happen in our application
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140621/d01bff00/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1574 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140621/d01bff00/smime-0001.p7s>