Intermittent issue with long concurrent marking phase

Srini Padman srini_was at yahoo.com
Fri Sep 30 13:05:13 UTC 2011


Hi Ramki,
 
Apologies, both for mis-reading your original response (re: long initial-mark phases) and for choosing the wrong list. Thank you very much for redirecting it to gc-use. 

I just want to clarify a couple of points from your last response, for the record.

To answer the question about the long stop-the-world initial marking phase: this is the longest I know of, but we have seen other instances where it lasted 3-4 seconds. In those cases as well, the "user/sys" times were much smaller than the "real" time so things clearly seem to be completely stalled. Also as a matter of background - the reason we moved to using the CMS collectors was that, prior to this, we were occasionally seeing extremely long (sometimes lasting more than a minute) full GCs. It is quite possible that the same factors that caused such long full GCs in the past are causing somewhat shorter (but still not _short_) initial mark with the CMS collector. In any case, I didn't view this as being related to 6692906 to begin with, and am glad to get confirmation that you don't think it is either.

Regarding your point "A definite symptom of 6692906 can be diagnosed if the JVM completely stalls into a livelock (a few threads in the JVM intermittently active, but your application will forever stop  forward progress from that point on). It doesn't look like you have observed that latter symptom however?" - my understanding of the symptom (based on the summary at http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2010-February/001549.html, for example) was that the wait/stall resolves itself after tens of seconds for reasons unknown. Our symptom is closer to this, in that the application does not stop _forever_ from that point onwards. [It is entirely possible that there were further findings beyond what is in the posting that I am not aware of.]

We will initiate efforts to reach out to JVM support - of course, in the meanwhile, any feedback or help on this forum is very welcome!

Regards,
Srini.

--- On Thu, 9/29/11, Y. S. Ramakrishna <y.s.ramakrishna at oracle.com> wrote:


From: Y. S. Ramakrishna <y.s.ramakrishna at oracle.com>
Subject: Re: Intermittent issue with long concurrent marking phase
To: "Srini Padman" <srini_was at yahoo.com>
Cc: hotspot-gc-use at openjdk.java.net
Date: Thursday, September 29, 2011, 8:54 PM


Hi Srini -- As I indicated, if you cannot upgrade easily to test
if the issue is fixed, you should probably engage JVM support to
get to a proper diagnosis of the issue affecting your production systems.

more inline below ...

On 09/29/11 07:44, Srini Padman wrote:
> Hi Ramki,
>  Thank you very much for your reply.
>  It is not *always* that the concurrent marking phase takes this long, although it happens often enough. For example, in the full GC log corresponding to the snippet I pasted in my posting (attached, zipped) there is only that one instance.

I see that there is one instance of the long _initial-mark_ pause
and as i stated the whole process seems stalled at that time.
It's definitely not the stall/livelock issue of CR 6692906.

[In other words, you may be dealing with several different issues
here and you will need to disentangle them.]

>  I think I know why you are asking - based on my understanding of Bug # 6692906 (more accurately, based on discussions around it on this list), I was under the impression that such long CM phases will happen all the time (if they happen at all). Does the fact that it is intermittent raise the possibility that this is a different issue? I realize that you might not be able to answer this based on the bits of information you have, but perhaps the full GC log will tell you something that you don't already know.

Again, my question about once/many was only about the long initial mark pause
of which there is exactly one in your log.

The stall of mutators during concurrent marking, which you are
conjecturing is 6692906, is a different issue: for that you should
either upgrade and test, or seek JVM support help. It is definitely
possible that the symptoms of 6692906 will happen infrequently or
intermittently. A definite symptom of 6692906 can be diagnosed if
the JVM completely stalls into a livelock (a few threads in the
JVM intermittently active, but your application will forever
stop forward progress from that point on). It doesn't look like
you have observed that latter symptom however?

Sorry I can't really help more at this time, but perhaps someone
from the community may be able to. But I really suggest either
an upgrade or seeking JVM support help. (This is not a professional
support alias.)

In general, the GC-use alias is better suited to questions such as this.
GC-dev should be used for GC development questions involving
the main development trunk. Issues of uses in the field of
older versions should be addressed to hotspot-gc-use at o.j.n
So I've taken the liberty to send this to hotspot-gc-use at o.j.n

All the best!
-- ramki

>  Regards,
> Srini.
> 
> --- On *Thu, 9/29/11, Ramki Ramakrishna /<y.s.ramakrishna at oracle.com>/* wrote:
> 
> 
>     From: Ramki Ramakrishna <y.s.ramakrishna at oracle.com>
>     Subject: Re: Intermittent issue with long concurrent marking phase
>     To: "Srini Padman" <srini_was at yahoo.com>
>     Cc: hotspot-gc-dev at openjdk.java.net
>     Date: Thursday, September 29, 2011, 4:24 AM
> 
>     Hi Srini -- (inline below)
> 
>     On 9/28/2011 4:50 AM, Srini Padman wrote:
>> 
>>     Questions:
>>          1\ is it clear based on the description above that the issue is
>>     identical to 6692906 (http://bugs.sun.com/view_bug.do?bug_id=6692906)?
>> 
> 
>     Very likely the same bug.
> 
>>     2\ will we benefit by upgrading to a more recent JRE [1.6.0_26
>>     being the one under consideration]?
>> 
> 
>     Definitely worth trying.
> 
>>     3\ I have seen recommendations to use
>>     "-XX:-CMSConcurrentMTEnabled" on some web forums - but I have
>>     concerns about this; if we don't allow for concurrent marking to
>>     use multiple threads, then isnt there a danger of marking
>>     proceeding so slowly that we might end up running out of memory
>>     i.e., garbage created much faster than it is collected]?
>> 
> 
>     Your concerns are very legitimate (especially given the length of
>     the concurrent mark phase) and the number of cores you have.
> 
>>          Any help is greatly appreciated. Please let me know if any
>>     additional information is needed at all. I haven't attached the
>>     full GC log (it caused problems with posting) but will gladly send
>>     it directly to anybody who would like.
>> 
> 
>     The long initial mark pause is definitely concerning -- Does it show
>     up regularly
>     in the GC logs or is the snippet above an anomaly? Curisously, as
>     the process time
>     shows, the user and system time are both low but the elapsed time is
>     very large.
>     That looks like a total stall of the process, and I have no conjectures
>     based on available data.
> 
>     I suggest talking with your Java support folk if you reproduce this
>     after upgrading to
>     6u28 (or whatever).
> 
>     best regards.
>     -- ramki
> 
>> 
>>     Regards,
>>     Srini.
>> 

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list