unexplained CMS pauses
Y.S.Ramakrishna at Sun.COM
Y.S.Ramakrishna at Sun.COM
Mon Sep 28 23:37:26 UTC 2009
Hi Paul, that would be 100 us + minor gc time (=30 ms) = 30.1 ms.
It does not explain the 10- to 40-fold increase in txn times observed
here (of 300 ms - 1200 ms).
-- ramki
On 09/28/09 16:24, Paul Hohensee wrote:
> Might have nothing to do with concurrent mark running, rather with minor
> collections
> happening during a java->native call.
>
> Minor collections are stop-the-world, and if one occurs during a
> java->native call, the
> java thread making the call to native will block on return from the call
> to native until
> the minor collection is over. If the outliers are outliers in the time
> it takes to execute
> the native call, and the minor gc duration and timing are exactly right,
> you could end
> up spending most of the minor gc time blocked. Pause time would then be
> 100ms
> + minor gc time.
>
> Paul
>
> Shane Cox wrote:
>> Ramki,
>> We're running Solaris 10 on 8-core Intel Xeon's. 1 Java instance per
>> box. JRE 1.6, update 14. 64-bit JVM.
>>
>> Our app is making JNI calls. Each call requires approximately the
>> same amount of work (so we expect them to perform similarly).
>> Internally, we measure how long it takes to perform these calls. +99%
>> of these calls complete in less than 100 micros. However, we have
>> outliers in the 300-1200ms range. After some research, we have found
>> that these extreme outliers coincide with the Concurrent Mark phase of
>> CMS (based on timestamps), and ONLY if there is a minor GC during that
>> phase.
>>
>> In a given day, our app will execute CMS collections 500-1000 times.
>> Of these, less than 10 will have a minor collection execute during the
>> Concurrent Mark. For each of these, our app reports a large pause in
>> the 300-1200ms range. In fact, all of the large pauses reported by
>> our app correlate with a minor GC during a Concurrent Mark, without
>> exception.
>>
>> Thanks
>>
>> On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com
>> <mailto:Y.S.Ramakrishna at sun.com>> wrote:
>>
>> What platform are you on (#cpu's etc.)
>> and when you say "app reports a pause of 300 ms"
>> is it that the odd transaction sees a latency
>> of 300 ms (coincident with concurrent mark),
>> whereas most transactions complete much more
>> quickly?
>>
>> I am trying to first understand how you determine
>> that the application is seeing "long pauses"
>> when a minor gc occurs during concurrent mark.
>>
>> PS: for example, if two gc pauses (say a scavenge of 30 ms
>> and an initial mark of 13 ms occur in quick succession,
>> your application might notice a pause of 43 ms, etc.)
>>
>> -- ramki
>>
>>
>> On 09/28/09 10:07, Shane Cox wrote:
>>
>> Our application is reporting long pauses when a minor GC
>> occurs during the Concurrent Mark phase of CMS. The output
>> below is a specific example. All of the GC pauses are less
>> than 30ms (initial mark, remark, minor GC). However, our app
>> reported a 300ms pause.
>>
>> 56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)]
>> 719045K(1551616K), 0.0131859 secs]
>> 56750.947: [CMS-concurrent-mark-start]
>> 56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K),
>> 0.0237615 secs] 846857K->719330K(1551616K), 0.0239988 secs]
>> 56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>> 56752.162: [CMS-concurrent-preclean-start]
>> 56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>> 56752.243: [CMS-concurrent-abortable-preclean-start]
>> 56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522 secs]
>> 56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766:
>> [Rescan (parallel) , 0.0065730 secs]56752.773: [weak refs
>> processing, 0.0001983 secs] [1 CMS-remark: 707208K(1402496K)]
>> 784631K(1551616K), 0.0068908 secs]
>> 56752.773: [CMS-concurrent-sweep-start]
>> 56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>> 56753.209: [CMS-concurrent-reset-start]
>> 56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>>
>>
>> We only observe this behavior when a minor GC occurs during
>> the Concurrent Mark (which is rare). Our app has reported
>> pauses up to 1.2 seconds ... which is generally the time it
>> takes to perform a Concurrent Mark.
>>
>>
>> Any insight/help that you could provide would be much
>> appreciated.
>>
>> Thanks
>>
>>
>>
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
More information about the hotspot-gc-dev
mailing list