Using CMS, any chance of forewarning a serial full GC is imminent?

Y. Srinivas Ramakrishna Y.S.Ramakrishna at Sun.COM
Sun Mar 21 02:18:31 UTC 2010


Hello St.Ack --

Stack wrote:
> Our app, a distributed database, usually does fine running CMS but
> when we trip a full serial GC, its disruptive (speaking
> euphemistically).
> 
> Monitoring the running application, watching the logs or patching into
> the JVM-TI, is there any indicator that you know of that would give us
> forewarning of an imminent full serial GC?  If we had this, we could
> take evasive action.

Probably the most important input factors are the statistics internally maintained
by the CMS collector regarding the population spread of free blocks,
their expected historical demand and their recent demand, some
combination of which should, at least theoretically, yield a suitable
statistical predictor of the imminency of promotion failure.
Unfortunately such a predictor has not been synthesized by us yet (and
would probably be a challenge if we are to avoid false positives; and
we would almost certainly need help from an expert statistician to
synthesize such a predictor).
If properly designed (and that's a big if, such a predictor could
be exported via a suitable GC MBean and could be polled to
automate the triggering of evasive action (i.e. move load to
another node and trigger a GC on this node; it would be an
interesting distributed coordination problem to avoid situations
where a large majority of nodes decide at roughly the same time
that a full GC is imminent and try to push their load to a
neighbour that is ill-prepared to handle it).
May be you can tell us whether -- when promotion failure
occurs in your current distributed system, whether it is an isolated
incident on one node or if it has a more catastrophic quality where
multiple nodes succumb to the problem at about the same time, causing
a promotion failure contagion, as it were, to spread rapidly through
the entire system.

> 
> Thanks,
> St.Ack
> 
> P.S. G1 is what we really need but going by the response up on this
> and by how easy our application crashes recent releases, it looks like
> its going to be a good while before it'll work for our case (We're an
> open source database so talking to our sun/oracle vendor is not an
> option).

Do give G1 a try in 6u20 though. The reliability has been much improved
since 6u18.

Relatedly, and out of curiosity, focusing on a single node/jvm of yr system,
what is the rough frequency of promotion failure that you see?

-- ramki

> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list