Why abortable-preclean phase is not being aborted after YG occupancy exceeds 50%?

Srinivas Ramakrishna ysr1729 at gmail.com
Wed Aug 22 01:03:25 PDT 2012


Hi Bernd --

Yes, this has been observed (albeit in a different context by Michal Frajt
as well; see emails from a couple of weeks ago).
I am not sure why, with regular CMS, we should have this kind of
upredictable delay from a scavenge to a CMS initial mark.
It must be OS/scheduling and load etc. which we cannot control (although a
delay of 177 s seems excessive and must either mean
that the CMS wait duration was exceeded or something like that.

In any case, as you observed, the length of the CMS initial pause is
related to the ocupancy of the young generation. Thus,
even if it were to occur immediately after a scavenge (when Eden is nearly
empty), the use of large (and fully used) survivor spaces
can make the pause longer.

As we have noted in earlier email, the real solution is to multi-thread the
CMS initial mark pause so that the work can be done much faster.

An easier if less pleasant and less efficient alternative is to implement
CMSScavengeBeforeInitialMark, but that alone would not address
the large fully-used survivor space problem I mentioned above, only the
issue with scheduling the initial mark. (and in that case, the pause-time
for the scavenge would be additive, albeit because it is parallel, would
likely be much faster even for a large Eden).

I'd be curious to know if you get to the bottom of the cause for the long
delay between scavenge and initial mark pause.

regards.
-- ramki

On Tue, Aug 21, 2012 at 9:14 PM, Bernd Eckenfels <
bernd.eckenfels at googlemail.com> wrote:

> Am 22.08.2012, 04:41 Uhr, schrieb Srinivas Ramakrishna <ysr1729 at gmail.com
> >:
> > Initial mark is typically scheduled immediately after a scavenge, so no
> > timeout specification should be necessary. Perhaps I misunderstood yr
> > question and may be you can elaborate a bit more on what you want to
> > achieve?
>
> Well, I have a gclog which contains some STW situations > 1s (which
> violates my SLA). If I check the GCLog file there are some initial-marks
> and some remarks causing the problem. For the slow initial-marks I see the
> pattern that the time difference to the preceeding scavenger run is large.
> For the initial marks which run sub second, they happen all directly after
> a scavenger run.
>
> So here is a slow samples:
>
> 159430.703: [GC 159430.705: [ParNew: 20646923K->582368K(22649280K),
> 0.4311960 secs]
>                21710818K->1665223K(47815104K), 0.4343870 secs] [Times:
> user=1.92 sys=0.02, real=0.43 secs]
> 159607.370: [GC [1 CMS-initial-mark: 1082855K(25165824K)]
> 14734770K(47815104K), 11.1184690 secs]
>                [Times: user=11.06 sys=0.03, real=11.12 secs]
> 159618.490: [CMS-concurrent-mark-start]
> 159618.930: [CMS-concurrent-mark: 0.440/0.440 secs] [Times: user=4.59
> sys=0.16, real=0.44 secs]
>
> Difference 176s, 11s STW
>
> And here is the next run, which is typically fast:
>
> 166807.592: [GC 166807.594: [ParNew: 21200224K->372584K(22649280K),
> 0.4462060 secs]
>                22444233K->1629155K(47815104K), 0.4493750 secs] [Times:
> user=1.43 sys=0.01, real=0.45 secs]
> 166808.057: [GC [1 CMS-initial-mark: 1256570K(25165824K)]
> 1629155K(47815104K), 0.3039830 secs]
>                [Times: user=0.31 sys=0.00, real=0.31 secs]
>
> Difference 0.4s, 0.3s STW
>
> I need to collect the actual jvm parameters, version and gclogfile and
> will provide it. I am actually waiting for a CMSStatistics=2 version.
>
>
> Greetings
> Bernd
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120822/1d608bad/attachment-0001.html 


More information about the hotspot-gc-use mailing list