Long remark due to young generation occupancy

Wed Jan 15 11:41:14 PST 2014

Hi Jon,

Thanks for looking into this.

A clarification for "Wait up to 30 seconds for the remark to be scheduled after
a young collection":
Is this really the case? Is this timeout used after a young collection? I
was under the impression that CMSMaxAbortablePrecleanTime precleans and
waits for a young collection, and if one occurs (or timeout) the remark is
scheduled. Here we wait for 30 seconds to a young to happen. Right..?

The work for remark is to revisit updated objects  and trace from roots
again (missing something? ah, and reference processing, but that is
practically no overhead for us). What is usually the biggest cost of the
remark? To scan the dirty cards or to trace from roots? Perhaps this
depends on the application - you're talking about "not always the case".
What do you refer to? If we have en empty young generation, what could
bring the remark phase to e.g. 200ms on a high-end server like ours?

For my application, it seems that it tracing from roots that is the most
expensive. In such scenario, spreading the pause seems as beneficial as not
running a young collection prior to initial mark (which is highly dependent
on occupancy in young). Especially since young collection is so fast, at
least for us.

Regarding the last section that we wait a long time for sweeping - does
this really matter? Yes, we have a lot of floating garbage in case young
collections are infrequent and we keep on precleaning, but that also means
no promotions. The garbage is just sitting there on the heap taking space,
but no one is claiming that space until a young collection. And by then the
sweeping proceeds. Or am I missing something?

Best Regards,

Gustav Åkesson

On Wed, Jan 15, 2014 at 7:24 PM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:

>
>  -XX:CMSScheduleRemarkEdenPenetration=0
>
> Schedule the remark pause immediately after the
> next young collection.
>
>  -XX:CMSScheduleRemarkEdenSizeThreshold=0
>
> Any sized eden should allow scheduling of the remark
> pause.  That is, no eden is too small to schedule.
>
> -XX:CMSMaxAbortablePrecleanTime=30000
>
>
> Wait up to 30 seconds for the remark to be scheduled
> after a young collection.  Otherwise, wait only up to
> the default of 5 seconds.
>
>
>  So, to the question - is there any obvious drawbacks with the three
> settings above? Why does eden have to be 50% (default) in order for a
> remark to be scheduled (besides spreading the pause)? It does only seem to
> do harm. Any reason?
>
>
> The default is 50% to try and place the remark pause between two young
> pauses
> (spread it out as you say).   I don't believe it is always the case that
> the remark
> pause is very small if it is scheduled immediately after a young
> collection.  In
> such cases we still want to spread out the pauses.
>
> If the remark is delayed to wait for the next young collection,
> the sweeping is also delayed.  You're not using up space in the
> CMS (tenured) generation but you're also not collecting garbage
> and not making additional space available for reuse (which the
> concurrent sweep does).
>
> Jon
>
>
> On 01/13/2014 02:50 AM, Gustav Åkesson wrote:
>
>  Hi,
>
> This is a topic which has been discussed before, but I think I have some
> new findings. We're experiencing problems with CMS pauses.
>
> Settings we are using.
>
> -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=68
> -XX:MaxTenuringThreshold=0
> -XX:+UseParNewGC
> -XX:+ScavengeBeforeFullGC
> -XX:CMSWaitDuration=30000
> -Xmx2048M
> -Xms2048M
> -Xmn1024M
>
> Note that MaxTenuringThreshold is 0. This is only done during test to
> provoke the CMS to run more frequently (otherwise it runs once every
> day...). Due to this, promotion to old generation is around 400K to 1M per
> second.
>
> We have an allocation rate of roughly 1G per second, meaning that YGC runs
> once every second.
>
>  We're running JDK7u17.
>
>
> This is a log entry when running with above settings. This entry is the
> typical example to all of the CMS collections in this test.
>
>
> *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark:
> 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: user=0.00
> sys=0.00, real=0.01 secs] *2014-01-13T09:31:52.506+0100: 661.677:
> [CMS-concurrent-mark-start]
> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: 0.138/0.138
> secs] [Times: user=1.96 sys=0.11, real=0.13 secs]
> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start]
> 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean:
> 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs]
> 2014-01-13T09:31:52.655+0100: 661.826:
> [CMS-concurrent-abortable-preclean-start]
> 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K),
> 0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
> 2014-01-13T09:31:54.146+0100: 663.317: [CMS-concurrent-abortable-preclean:
> 0.831/1.491 secs] [Times: user=16.76 sys=1.54, real=1.49 secs]
>
> *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K
> (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: [weak
> refs processing, 0.0008740 secs]663.520: [scrub string table, 0.0006940
> secs] [1 CMS-remark: 525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs]
> [Times: user=3.53 sys=0.01, real=0.20 secs] *2014-01-13T09:31:54.350+0100:
> 663.521: [CMS-concurrent-sweep-start]
> 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K),
> 0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]
> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: 0.665/0.670
> secs] [Times: user=7.77 sys=0.71, real=0.67 secs]
> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start]
> 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: 0.003/0.003
> secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
>
> The initial pause is fine. Then I investigated how to reduce the remark
> phase, and activated -XX:+CMSScavengeBeforeRemark. That flag partly solves
> this issue (not active in the log above), but I've seen cases when it does
> not scavenge (I suspect JNI critical section), which is bad and generates
> yet again long remark pause. And yet again the pause is correlated towards
> the occupancy in young.
>
> So instead, I tried setting...
>
> -XX:CMSScheduleRemarkEdenPenetration=0
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
>
> This is a log entry with the settings at the top plus the two above...
>
>
> *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark:
> 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: user=0.00
> sys=0.00, real=0.01 secs] *2014-01-13T10:18:25.760+0100: 590.201:
> [CMS-concurrent-mark-start]
> 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: 0.144/0.144
> secs] [Times: user=1.98 sys=0.15, real=0.14 secs]
> 2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start]
> 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean:
> 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
> 2014-01-13T10:18:25.912+0100: 590.354:
> [CMS-concurrent-abortable-preclean-start]
> 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K),
> 0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
> 2014-01-13T10:18:26.842+0100: 591.283: [CMS-concurrent-abortable-preclean:
> 0.608/0.929 secs] [Times: user=10.77 sys=0.97, real=0.93 secs]
>
> *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K (1047616
> K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: [weak refs
> processing, 0.0007760 secs]591.288: [scrub string table, 0.0006440 secs] [1
> CMS-remark: 525477K(1048576K)] 546415K(2096192K), 0.0040480 secs] [Times:
> user=0.03 sys=0.00, real=0.00 secs] *2014-01-13T10:18:26.848+0100:
> 591.289: [CMS-concurrent-sweep-start]
> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: 0.726/0.726
> secs] [Times: user=8.50 sys=0.76, real=0.73 secs]
> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start]
> 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: 0.003/0.003
> secs] [Times: user=0.03 sys=0.01, real=0.00 secs]
>
> This means that when I set these two, CMS STWs go from ~200ms to below
> 10ms.
>
> I'm leaning towards activating...
>
>  -XX:CMSScheduleRemarkEdenPenetration=0
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
>  -XX:CMSMaxAbortablePrecleanTime=30000
>
>
> What I have seen with these flags is that as soon as a young is completely
> collected during abortable preclean, the remark is scheduled and since it
> can start when eden is nearly empty, it is ridicously fast. In case it
> takes a long time for preclean to catch a young collection, it is also fine
> because no promotion is being made. We can live with the pause of young
> plus a consecutive remark (for us, young is ~10ms).
>
> So, to the question - is there any obvious drawbacks with the three
> settings above? Why does eden have to be 50% (default) in order for a
> remark to be scheduled (besides spreading the pause)? It does only seem to
> do harm. Any reason?
>
> -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be
> completely trusted. Usually it helps, but that is not good enough since the
> pauses get irregular in case it fails. And with these settings above, it
> will only add to the CMS pause.
>
>
> Best Regards,
>
> Gustav Åkesson
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/64e00278/attachment-0001.html