Long "stop-the-world" pauses in CMS GC mode

Tue Aug 30 00:19:52 PDT 2011

David, I missed one of the longer pauses that you'd specifically drawn 
attention to:-

> On 8/29/2011 2:25 AM, David Tavoularis wrote:
>> *1. Here is the first pattern : a _61-second pause_, but I don't see 
>> any suspicious message in GC logs:*
>> ...
>> 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew
>> Desired survivor size 53673984 bytes, new threshold 1 (max 4)
>> - age 1: 99505080 bytes, 99505080 total
>> : 943744K->104832K(943744K), 0.2010508 secs] 
>> 21542906K->20852742K(50226816K), 0.2022636 secs] *[Times: user=5.67 
>> sys=0.02, real=59.52 secs]*

If you look at the timestamps above, the GC event starts off at 
44242.537 seconds, but then the GC itself does not
commence until 44301.853 seconds, i.e. a full 59.32 seconds later. So 
the pause is associated not with
GC work itself (which is correctly reported as 202 ms), but rather with 
a preamble to the GC, perhaps
with bringing threads to a safepoint, I am guessing. Once again 
-XX:+PrintSafepointStatistics (which
i mentioned in previous email wrt the 20 s pause in the middle of 
noweher) would likely provide
some clues. I have heard apocryphal stories of -XX:+UseMembar having 
worked to get rid of
overly long safepointing pauses,. and I have heard -XX:-UseBiasedLocking 
for pauses associated
with bulk bias revocations. But, without +PrintSafepointStatistics data 
to draw inferences
from, those incantations would just constitute superstitious mumbo-jumbo.

-- ramki

>> Heap after GC invocations=1188 (full 12):
>> par new generation total 943744K, used 104832K [0xfffffff353c00000, 
>> 0xfffffff393c00000, 0xfffffff393c00000)
>> eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 
>> 0xfffffff386f40000)
>> from space 104832K, 100% used [0xfffffff38d5a0000, 
>> 0xfffffff393c00000, 0xfffffff393c00000)
>> to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 
>> 0xfffffff38d5a0000)
>> concurrent mark-sweep generation total 49283072K, used 20747910K 
>> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
>> concurrent-mark-sweep perm gen total 524288K, used 42905K 
>> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
>> }
>> *Total time for which application threads were stopped: 61.5576519 
>> seconds*
>> Application time: 0.0245838 seconds
>> Total time for which application threads were stopped: 9.8331189 seconds
>> Application time: 0.0012626 seconds
>> Total time for which application threads were stopped: 0.0090404 seconds
>> Application time: 0.0008943 seconds
>> Total time for which application threads were stopped: 0.0020415 seconds
>> Application time: 0.0008181 seconds
>> Total time for which application threads were stopped: 0.2338605 seconds
>> Application time: 0.0018822 seconds
>>
>> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 
>> secs]", which means that the "real" duration is a lot higher than 
>> "user" CPU time.
>> Because "sys" duration is low, it also means that the server is not 
>> swapping.
>> What could explain this 61 seconds pause ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110830/8ef5e553/attachment.html