unexpected full gc time spike

Fri May 17 10:29:48 PDT 2013

I found it

8005060: Promotion failure code does not scale 
<https://jbs.oracle.com/bugs/browse/JDK-8005060>


On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote:
> Looks like the search functionality of bugs.sun.com is no longer
> available. I tried searching the new bugzilla portal for the bug I had
> submitted around that time, but that doesn't bring up the bug when i
> use the normal search terms, so I do not know if the bug report is
> still in review or not, and whether it ever made it into the set of
> hotspot/gc bugs or not, but the Review ID i recvd was:-
>
>       "Your Report (Review ID: 2391561) - Promotion failure code does not scale "
>
> I'll try and dig up the (raw, tentative) patch and send it in soon.
>
> -- ramki
>
>
> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>> Hi Leon --
>>
>> Here's the history of that discussion, starting with this email
>> (follow subject thread):
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>>
>> On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
>> <the.6th.month at gmail.com> wrote:
>>> hi, Ramki:
>>> btw, could you possibly explain what the bugs are and how those bugs affect
>>> the fallback fullgc time? I am really curious about the reason.
>>> thanks very much.
>>>
>>> all the best,
>>> Leon
>>>
>>> On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com>
>>> wrote:
>>>> thanks very much indeed, hope we can see your patch soon
>>>>
>>>>
>>>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>>>>> Hi Leon --
>>>>>
>>>>> Yes, there are a couple of performance bugs related to promotion
>>>>> failure handling with ParNew+CMS that can cause this time to balloon.
>>>>> Here the unwind of the failed promotion took 177 s. I have at least a
>>>>> partial fix for this which I had written up a few months ago but never
>>>>> quite got around to collecting sufficient performance data to submit
>>>>> it as an official patch.
>>>>>
>>>>> I'll try and revive that patch and submit it... May be someone else
>>>>> can check if it helps sufficiently in the performance with promotion
>>>>> failure.
>>>>>
>>>>> -- ramki
>>>>>
>>>>>
>>>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
>>>>> <the.6th.month at gmail.com> wrote:
>>>>>> hi, all:
>>>>>> We just had a situation that I don't quite understand with CMS gc. When
>>>>>> I
>>>>>> examined the gc log, I found that there was a cms gc which resulted in
>>>>>> a
>>>>>> parnew promotion failure and concurrent mode failure at the same time,
>>>>>> and
>>>>>> then the full gc lasted for slightly over three minutes. Here is the gc
>>>>>> log:
>>>>>> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>>>> 7.056/7.860
>>>>>> secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>>>> 2013-05-17T10:12:55.984+0800: 45168.775:
>>>>>> [CMS-concurrent-preclean-start]
>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>>>> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544:
>>>>>> [CMS-concurrent-abortable-preclean-start]
>>>>>> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>>>> (promotion
>>>>>> failed)
>>>>>> Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>>>> - age   1:   70527216 bytes,   70527216 total
>>>>>> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>>>> abort
>>>>>> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>>>> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>>>> user=44.72
>>>>>> sys=13.59, real=179.45 secs]
>>>>>>   (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>>>> secs]
>>>>>> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>>>> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>>>>
>>>>>> the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>>>> lasted
>>>>>> for 193 seconds. I understand that when there's a parnew gc happens
>>>>>> during
>>>>>> cms and to space is not large enough to hold all survived objects, or
>>>>>> the
>>>>>> remaining space in old gen cannot cope with memory allocation in old
>>>>>> gen,
>>>>>> full gc happens. But I don't understand why it hangs so long.
>>>>>> I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>>>> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>>>> -XX:SurvivorRatio=6
>>>>>> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>>>> -Xverify:none
>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>>>> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>>>> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>>>>
>>>>>> Could it be a bug that results in the long full gc in case of promotion
>>>>>> failure or something else? Could anyone offer me some help, and I
>>>>>> really
>>>>>> appreciate your help.
>>>>>>
>>>>>> Looking forward to any reply.
>>>>>>
>>>>>> All the best,
>>>>>> Leon
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing list
>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>
>>>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/a80f8f34/attachment.html