“abort preclean due to time” in Concurrent Mark & Sweep

Tue May 3 17:17:49 UTC 2011

Hi LiLi --

On 05/03/11 03:31, Li Li wrote:
> hi all
>     I confronted a strange case. The hotspot jvm was always doing gc
> and consumed many cpu resources(from 50% to 300% cpu usage). And when
> I turned on gc information. I
> found "abort preclean due to time" in the gc logs.
>     So I googled and found some similar questions in
> http://stackoverflow.com/questions/1834501/abort-preclean-due-to-time-in-concurrent-mark-sweep
> and http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2008-October/000482.html.
> And http://blogs.sun.com/jonthecollector/entry/did_you_know is
> suggested to read.
>     I read the blog post and can't understand well.
>     As it says, CMS full gc has follwoing phases:
>         STW initial mark
>         Concurrent marking
>         Concurrent precleaning
>         STW remark
>         Concurrent sweeping
>         Concurrent reset
> 
>     "Ok, so here's the punch line for all this. When we're doing the
> precleaning we do the sampling of the young generation top for a fixed
> amount of time before starting the remark. That fixed amount of time
> is CMSMaxAbortablePrecleanTime and its default value is 5 seconds. The
> best situation is to have a minor collection happen during the
> sampling. When that happens the sampling is done over the entire
> region in the young generation from its start to its final top. If a
> minor collection is not done during that 5 seconds then the region
> below the first sample is 1 chunk and it might be the majority of the
> young generation. Such a chunking doesn't spread the work out evenly
> to the GC threads so reduces the effective parallelism. " --quoted
> from this post.
> 
>     In my option, Concurrent precleaning is the preparing stage for
> remark. It will split the young generation to chunks so remark can do
> it parallelly. It expected a young gc in order
> to split chunks evenly. If there is no young gc before time
> out(CMSMaxAbortablePrecleanTime ), it seems it this gc will fail and
> all following phases will be skipped.

Not quite. Reread the above para. What it says is that the splitting
might be uneven if the time between scavenges is much larger than the
default timeout because the first chunk may be much larger than the
rest, and it would be the "long pole" in the parallelization.

> 
>     So when the system load is light(which means there will be no
> minor gc), precleaning  will always  time out and full gc will always
> fail. cpu is waste.

It won't fail. It'll be less parallel (i.e. less efficient, and would
have a longer pause time, for lesser work).

> 
>    Some suggested enlarge CMSMaxAbortablePrecleanTime. Maybe it can
> solve this problem. But CMS collector,not like other collectors that
> will perform gc when full. it will
> perform gc when space usage is larger than 92%(68% for older version
> of hotspot) or jvm feel it should do it. if this value is too large,
> it will stop the world longer.

If you are right at the edge, you may be right. But the idea is to make
the CMSMaxAbortablePrecleanTime about twice the inter-scavenge time
and then you will almost never have the uneven splitting. If you are
getting concurrent mode failure because of making CMSMaxBortablePrecleanTime
too large, then you must be in a regime where your CMS trigger threshold
is much too high for comfort and preclean or not you run a high
risk of concurrent mode failure. I don't think not setting a larger
timeout will save you there.

> 
>      "Based on recent history, the concurrent collector maintains
> estimates of the time remaining before the tenured generation will be
> exhausted and of the time needed for a concurrent collection cycle.
> Based on these dynamic estimates, a concurrent collection cycle will
> be started with the aim of completing the collection cycle before the
> tenured generation is exhausted. These estimates are padded for
> safety, since the concurrent mode failure can be very costly.
> 
>       A concurrent collection will also start if the occupancy of the
> tenured generation exceeds an initiating occupancy, a percentage of
> the tenured generation. The default value of this initiating occupancy
> threshold is approximately 92%, but the value is subject to change
> from release to release. "
>     --quoted from
> http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms
> 
> 
>     Another solution: "There is an option CMSScavengeBeforeRemark
> which is off by default. If turned on, it will cause a minor
> collection to occur just before the remark. That's good because it
> will reduce the remark pause. That's bad because there is a minor
> collection pause followed immediately by the remark pause which looks
> like 1 big fat pause.l "
> 
>      My question is that why the collector so stupid that it don't do
> it  like this. If the system is busy, it works like before. Because
> it's busy, minor gc will occur and precleaning will success in the
> future. If the system is idling, it can adjust the
> CMSMaxAbortablePrecleanTime or turning CMSScavengeBeforeRemark  on.

Absolutely. There's in fact an open RFE to do just that, but we have been frying
more important fish recently and have not gotten to that RFE. I'll
dig up the CR id for you shortly.

-- ramki