CMS initial mark pauses

Thu Oct 14 23:00:30 UTC 2010

Hi Adam --

...
> 
> I understood before that "initial" is not done in parallel.  I'm curious 
> - why not?

When it was first implemented CMS did all its work single-threaded
over a serial scavenger. It was incrementally parallelized over time
but because initial-mark pauses ere usually not a concern (small edens,
small survivor spaces, initial mark immediately following a scavenge)
it never rose high enough in priority to parallelize. Clearly we have
reached a point where the old assumptions no longer hold and it's
time to parallelize it. Or better still to move to G1 which is fully
parallel and concurrent, and have other advantages as well.

> 
> I have CMSInitiatingOccupancyFraction=50 because I was concerned about 
> some finalization issues in our application, and I thought I remembered 
> reference processing wasn't done in young GC's.  After enabling 
> PrintReferenceGC, the logs imply  ParNewGC also clears references - is 
> that true?  If so, it may not be necessary for us to include that option 
> anyway.

Yes, scavenges do process unreachable Reference objects found in the young gen.
However, once these get into the old gen, you are right that you will need a
CMS cycle to identify them as unreachable and to process them appropriately.

> 
>     Here's an excerpt from the "workaround" section of that bug
>     (reproduced here because i cannot seem to get bugs.sun.com
>     <http://bugs.sun.com>
>     to display it) :-
> 
>         This is not really a viable workaround since it might lead to
>         suboptimal
>         heap configuration:
>         (1) use no survivor spaces (at the risk of larger scavenge
>         pauses, larger remark pauses,
>            even concurrent mode failures)
>         (2) use a sufficiently large heap so as to be able to afford to
>         set a
>            mark initiation threshold above the low water-mark (after a major
>            collection cycle). This will keep init-mark's riding on the
>         coat-tails
>            of scavenges.
>         *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com
>         <mailto:xxxx at oracle.com>
> 
> 
> 
> The customer's application appears to fit neatly in a 2.4G heap, and we 
> have -Xmx4g, so I believe we might be able to apply (2) here.  Is (1) 
> above required along with (2), or do these workarounds address the 
> problem independently?  I ask because (a) this customer is already 
> concerned about pause times, so I don't have a lot of room to increase 
> remark and scavenge times, and (b) I'm concerned about eliminating 
> survivor spaces since we've dealt with significant heap fragmentation in 
> the past.

Precisely. The two are actually additive, but either by itself may not
be sufficient, and as you pointed out (1) may not even be always feasible.

> 
> One other data point is that we have a large number of mostly idle 
> threads (3826 at one count), with most of the idle threads holding onto 
> approximately 2MB of object data.  I don't know if that would 
> significantly contribute to the initial mark pause, but my intuition is 
> that it would increase the time if some of that time is spent marking 
> the stack locals.

Yes, that could be, but probably less significant than a large Eden or survivor
space, given that when the CMS initial-mark pauses come immediately after
a scavenge, the pauses are much shorter, so the larger contribution is
from the large Eden. If you pour your GC logs into GCHisto, you
should probably see that the CMS intial-mark pauses increase as
the most recent scavenge becomes more distant (or you could plot that
via a spreadsheet and note that relationship).

-- ramki

> 
>  
> 
> 
> 
>         Also, if using iCMS (Inceremental CMS), drop the Incremental
>         mode and revert to
>         vanilla CMS.
>         *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com
>         <mailto:xxxx at oracle.com>
> 
> 
>     If you have support, you can try escalating it via your support channels
>     to get this addressed, especially if the workaround/retuning doesn't
>     do the job.
> 
>     -- ramki
> 
> 
> My option seems to be to eliminate the CMSInitiatingOccupancyFraction=50 
> and keep the -Xmx4g.  Would it be prudent to set -Xms4g also?
> 
> And the log excerpt from a steady-state in the application.  The sigma 
> on pause times for young gc and remark is 17ms and 26ms - they're like 
> clockwork.  The initial mark is higher, 334ms due to the large-valued 
> outliers.
> 
> 

...
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use