CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled
Pas
pasthelod at gmail.com
Wed Jul 16 00:12:38 UTC 2014
Hello,
I was wondering, how come no one (else) uses ParGCCardsPerStrideChunk on
large heaps? It's shown to decrease (precious STW) time spent during minor
collections by ParNew. (
http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html
and also we started to use it on 20+ GB heaps and it was helpful, though we
did a more than one setting at a time, so I can't say that it was just this
setting.)
Regards,
Pas
On Sat, Jun 21, 2014 at 6:52 PM, graham sanderson <graham at vast.com> wrote:
> Note this works great for us too … given formatting in this email is a bit
> flaky, I’ll refer you to our numbers I posted in a Cassandra issue I opened
> to add these flags as defaults for ParNew/CMS (on the appropriate JVMs)
>
> https://issues.apache.org/jira/browse/CASSANDRA-7432
>
> On Jun 14, 2014, at 7:05 PM, graham sanderson <graham at vast.com> wrote:
>
> Thanks for the answer Gustav,
>
> The fact that you have been running in production for months makes me
> confident enough to try this on at least one our nodes… (this is actually
> cassandra)
>
> Current GC related options are at the bottom - these nodes have 256G of
> RAM, and they aren’t swapping, and we are certainly used to a pause within
> the first 10 seconds or so, but the nodes haven’t even joined the ring yet,
> so we don’t really care. yeah ms != mx is bad; we want one heap size and to
> stick with it.
>
> I will gather data via -XX:+CMSEdenChunksRecordAlways, however I’d be
> interested if a developer has an answer as to when we expect potential
> chunk recording… Otherwise I’ll have to go dig into the code a bit deeper -
> my assumption was that this call would not be in the inlined allocation
> code, but I had thought that even allocation of a new TLAB was inlined by
> the compilers - perhaps not.
>
> Current GC related settings - note we were running with a lower
> CMSInitiatingOccupancyFraction until recently - seems to have gotten
> changed back by accident, but that is kind of tangential.
>
> -Xms24576M
> -Xmx24576M
> -Xmn8192M
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> -XX:+UseCondCardMark
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:PrintFLSStatistics=1
> -Xloggc:/var/log/cassandra/gc.log
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=30
> -XX:GCLogFileSize=20M
> -XX:+PrintGCApplicationConcurrentTime
>
> Thanks, Graham
>
> P.S. Note tuning here is rather interesting since we use this cassandra
> cluster for lots of different data with very different usage patterns -
> sometimes we’ll suddenly dump 50G of data in over the course of a few
> minutes. Also cassandra doesn’t really mind a node being paused for a while
> due to GC, but things get a little more annoying if they pause at the same
> time… even though promotion failure can we worse for us (that is a separate
> issue), we’ve seen STW pauses up to about 6-8 seconds in re mark
> (presumably when things go horribly wrong and you only get one chunk).
> Basically I’m on a mission to minimize all pauses, since their effects can
> propagate (timeouts are very short in a lot of places)
>
> I will report back with my findings
>
> On Jun 14, 2014, at 6:29 PM, Gustav Åkesson <gustav.r.akesson at gmail.com>
> wrote:
>
> Hi,
>
> Even though I won't answer all your questions I'd like to share my
> experience with these settings (plus additional thoughts) even though I
> haven't yet have had the time to dig into details.
>
> We've been using these flags for several months in production (yes, Java 7
> even before latest update release) and we've seen a lot of improvements for
> CMS old gen STW. During execution occasional initial mark of 1.5s could
> occur, but using these settings combined CMS pauses are consistently around
> ~100ms (on high-end machine as yours, they are 20-30ms). We're using 1gb
> and 2gb heaps with roughly half/half old/new. Obviously, YMMV but this is
> at least the behavior of this particular application - we've had nothing
> but positive outcome from using these settings. Additionally, the pauses
> are rather deterministic.
>
> Not sure what your heap size settings are, but what I've also observed is
> that setting Xms != Xmx could also cause occasional long initial mark when
> heap capacity is slightly increased. I had a discussion a while back (
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-February/001795.html
> ) regarding this, and this seems to be an issue with CMS.
>
> Also, swapping/paging is another factor which could cause indeterministic
> / occasional long STW GCs. If you're on Linux, try swappiness=0 and see if
> pauses get more stable.
>
>
> Best Regards,
> Gustav Åkesson
>
>
> On Fri, Jun 13, 2014 at 6:48 AM, graham sanderson <graham at vast.com> wrote:
>
>> I was investigating abortable preclean timeouts in our app (and
>> associated long remark pause) so had a look at the old jdk6 code I had on
>> my box, wondered about recording eden chunks during certain eden slow
>> allocation paths (I wasn’t sure if TLAB allocation is just a CAS bump), and
>> saw what looked perfect in the latest code, so was excited to
>> install 1.7.0_60-b19
>>
>> I wanted to ask what you consider the stability of these two options to
>> be (I’m pretty sure at least the first one is new in this release)
>>
>> I have just installed locally on my mac, and am aware of
>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I
>> could reproduce, however I wasn’t able to reproduce it without -XX:-UseCMSCompactAtFullCollection
>> (is this your understanding too?)
>>
>> We are running our application with 8 gig young generation (6.4g eden),
>> on boxes with 32 cores… so parallelism is good for short pauses
>>
>> we already have
>>
>> -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled
>>
>> we have seen a few long(isn) initial marks, so
>>
>> -XX:+CMSParallelInitialMarkEnabled sounds good
>>
>> as for
>>
>> -XX:+CMSEdenChunksRecordAlways
>>
>> my question is: what constitutes a slow path such an eden chunk is
>> potentially recorded… TLAB allocation, or more horrific things; basically
>> (and I’ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely
>> that I’ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in
>> a highly multithread app than I would with sampling, or put another way…
>> what sort of app allocation patterns if any might avoid the slow path
>> altogether and might leave me with just one chunk?
>>
>> Thanks,
>>
>> Graham
>>
>> P.S. less relevant I think, but our old generation is 16g
>> P.P.S. I suspect the abortable preclean timeouts mostly happen after a
>> burst of very high allocation rate followed by an almost complete lull…
>> this is one of the patterns that can happen in our application
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140716/cf30fd78/attachment-0001.html>
More information about the hotspot-gc-use
mailing list