unexplained CMS pauses
Paul Hohensee
Paul.Hohensee at Sun.COM
Mon Sep 28 16:39:38 PDT 2009
Ouch. Missed that.
Paul
Y.S.Ramakrishna at Sun.COM wrote:
> Hi Paul, that would be 100 us + minor gc time (=30 ms) = 30.1 ms.
> It does not explain the 10- to 40-fold increase in txn times observed
> here (of 300 ms - 1200 ms).
>
> -- ramki
>
> On 09/28/09 16:24, Paul Hohensee wrote:
>> Might have nothing to do with concurrent mark running, rather with
>> minor collections
>> happening during a java->native call.
>>
>> Minor collections are stop-the-world, and if one occurs during a
>> java->native call, the
>> java thread making the call to native will block on return from the
>> call to native until
>> the minor collection is over. If the outliers are outliers in the
>> time it takes to execute
>> the native call, and the minor gc duration and timing are exactly
>> right, you could end
>> up spending most of the minor gc time blocked. Pause time would then
>> be 100ms
>> + minor gc time.
>>
>> Paul
>>
>> Shane Cox wrote:
>>> Ramki,
>>> We're running Solaris 10 on 8-core Intel Xeon's. 1 Java instance
>>> per box. JRE 1.6, update 14. 64-bit JVM.
>>>
>>> Our app is making JNI calls. Each call requires approximately the
>>> same amount of work (so we expect them to perform similarly).
>>> Internally, we measure how long it takes to perform these calls.
>>> +99% of these calls complete in less than 100 micros. However, we
>>> have outliers in the 300-1200ms range. After some research, we have
>>> found that these extreme outliers coincide with the Concurrent Mark
>>> phase of CMS (based on timestamps), and ONLY if there is a minor GC
>>> during that phase.
>>>
>>> In a given day, our app will execute CMS collections 500-1000
>>> times. Of these, less than 10 will have a minor collection execute
>>> during the Concurrent Mark. For each of these, our app reports a
>>> large pause in the 300-1200ms range. In fact, all of the large
>>> pauses reported by our app correlate with a minor GC during a
>>> Concurrent Mark, without exception.
>>>
>>> Thanks
>>>
>>> On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com
>>> <mailto:Y.S.Ramakrishna at sun.com>> wrote:
>>>
>>> What platform are you on (#cpu's etc.)
>>> and when you say "app reports a pause of 300 ms"
>>> is it that the odd transaction sees a latency
>>> of 300 ms (coincident with concurrent mark),
>>> whereas most transactions complete much more
>>> quickly?
>>>
>>> I am trying to first understand how you determine
>>> that the application is seeing "long pauses"
>>> when a minor gc occurs during concurrent mark.
>>>
>>> PS: for example, if two gc pauses (say a scavenge of 30 ms
>>> and an initial mark of 13 ms occur in quick succession,
>>> your application might notice a pause of 43 ms, etc.)
>>>
>>> -- ramki
>>>
>>>
>>> On 09/28/09 10:07, Shane Cox wrote:
>>>
>>> Our application is reporting long pauses when a minor GC
>>> occurs during the Concurrent Mark phase of CMS. The output
>>> below is a specific example. All of the GC pauses are less
>>> than 30ms (initial mark, remark, minor GC). However, our app
>>> reported a 300ms pause.
>>>
>>> 56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)]
>>> 719045K(1551616K), 0.0131859 secs]
>>> 56750.947: [CMS-concurrent-mark-start]
>>> 56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K),
>>> 0.0237615 secs] 846857K->719330K(1551616K), 0.0239988 secs]
>>> 56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>>> 56752.162: [CMS-concurrent-preclean-start]
>>> 56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>>> 56752.243: [CMS-concurrent-abortable-preclean-start]
>>> 56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522
>>> secs]
>>> 56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766:
>>> [Rescan (parallel) , 0.0065730 secs]56752.773: [weak refs
>>> processing, 0.0001983 secs] [1 CMS-remark: 707208K(1402496K)]
>>> 784631K(1551616K), 0.0068908 secs]
>>> 56752.773: [CMS-concurrent-sweep-start]
>>> 56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>>> 56753.209: [CMS-concurrent-reset-start]
>>> 56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>>>
>>>
>>> We only observe this behavior when a minor GC occurs during
>>> the Concurrent Mark (which is rare). Our app has reported
>>> pauses up to 1.2 seconds ... which is generally the time it
>>> takes to perform a Concurrent Mark.
>>>
>>>
>>> Any insight/help that you could provide would be much
>>> appreciated.
>>>
>>> Thanks
>>>
>>>
>>>
>
More information about the hotspot-gc-use
mailing list