unexplained CMS pauses

Y.S.Ramakrishna at Sun.COM Y.S.Ramakrishna at Sun.COM
Mon Sep 28 16:37:26 PDT 2009


Hi Paul, that would be 100 us + minor gc time (=30 ms) = 30.1 ms.
It does not explain the 10- to 40-fold increase in txn times observed
here (of 300 ms - 1200 ms).

-- ramki

On 09/28/09 16:24, Paul Hohensee wrote:
> Might have nothing to do with concurrent mark running, rather with minor 
> collections
> happening during a java->native call.
> 
> Minor collections are stop-the-world, and if one occurs during a 
> java->native call, the
> java thread making the call to native will block on return from the call 
> to native until
> the minor collection is over.  If the outliers are outliers in the time 
> it takes to execute
> the native call, and the minor gc duration and timing are exactly right, 
> you could end
> up spending most of the minor gc time blocked.  Pause time would then be 
> 100ms
> + minor gc time.
> 
> Paul
> 
> Shane Cox wrote:
>> Ramki,
>> We're running Solaris 10 on 8-core Intel Xeon's.  1 Java instance per 
>> box.  JRE 1.6, update 14.  64-bit JVM.
>>
>> Our app is making JNI calls.  Each call requires approximately the 
>> same amount of work (so we expect them to perform similarly).  
>> Internally, we measure how long it takes to perform these calls. +99% 
>> of these calls complete in less than 100 micros.  However, we have 
>> outliers in the 300-1200ms range.  After some research, we have found 
>> that these extreme outliers coincide with the Concurrent Mark phase of 
>> CMS (based on timestamps), and ONLY if there is a minor GC during that 
>> phase.
>>
>> In a given day, our app will execute CMS collections 500-1000 times.  
>> Of these, less than 10 will have a minor collection execute during the 
>> Concurrent Mark.  For each of these, our app reports a large pause in 
>> the 300-1200ms range.  In fact, all of the large pauses reported by 
>> our app correlate with a  minor GC during a Concurrent Mark, without 
>> exception.
>>
>> Thanks
>>
>> On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com 
>> <mailto:Y.S.Ramakrishna at sun.com>> wrote:
>>
>>     What platform are you on (#cpu's etc.)
>>     and when you say "app reports a pause of 300 ms"
>>     is it that the odd transaction sees a latency
>>     of 300 ms (coincident with concurrent mark),
>>     whereas most transactions complete much more
>>     quickly?
>>
>>     I am trying to first understand how you determine
>>     that the application is seeing "long pauses"
>>     when a minor gc occurs during concurrent mark.
>>
>>     PS: for example, if two gc pauses (say a scavenge of 30 ms
>>     and an initial mark of 13 ms occur in quick succession,
>>     your application might notice a  pause of 43 ms, etc.)
>>
>>     -- ramki
>>
>>
>>     On 09/28/09 10:07, Shane Cox wrote:
>>
>>         Our application is reporting long pauses when a minor GC
>>         occurs during the Concurrent Mark phase of CMS.  The output
>>         below is a specific example.  All of the GC pauses are less
>>         than 30ms (initial mark, remark, minor GC).  However, our app
>>         reported a 300ms pause.
>>
>>         56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)]
>>         719045K(1551616K), 0.0131859 secs]
>>         56750.947: [CMS-concurrent-mark-start]
>>         56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K),
>>         0.0237615 secs] 846857K->719330K(1551616K), 0.0239988 secs]
>>         56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>>         56752.162: [CMS-concurrent-preclean-start]
>>         56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>>         56752.243: [CMS-concurrent-abortable-preclean-start]
>>         56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522 secs]
>>         56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766:
>>         [Rescan (parallel) , 0.0065730 secs]56752.773: [weak refs
>>         processing, 0.0001983 secs] [1 CMS-remark: 707208K(1402496K)]
>>         784631K(1551616K), 0.0068908 secs]
>>         56752.773: [CMS-concurrent-sweep-start]
>>         56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>>         56753.209: [CMS-concurrent-reset-start]
>>         56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>>
>>
>>         We only observe this behavior when a minor GC occurs during
>>         the Concurrent Mark (which is rare).  Our app has reported
>>         pauses up to 1.2 seconds ... which is generally the time it
>>         takes to perform a Concurrent Mark.
>>
>>
>>         Any insight/help that you could provide would be much 
>> appreciated.
>>
>>         Thanks
>>
>>
>>



More information about the hotspot-gc-use mailing list