unexpected full gc time spike
Bengt Rutisson
bengt.rutisson at oracle.com
Sun May 19 22:39:40 PDT 2013
Hi Leon,
Here is the link to the bug that should be available to you:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8005060
Ramki,
The bug report made it in to the hotspot/gc bug set just a couple of
days after you filed it, in December 2012. It has been classified as an
enhancement and since we are in a bug fixing phase right now we don't
have anybody assigned to fixing this right now.
I haven't been following this thread closely enough to have a strong
opinion about whether it is a bug or an enhancement. Let me know if you
think we should update the the report to be a bug rather than an
enhancement.
If you have a patch it would be great to try to get it out for review.
Sounds like a good thing to fix no matter how we define the bug report :)
Thanks,
Bengt
On 5/18/13 11:05 AM, the.6th.month at gmail.com wrote:
> Hi, Jon & Ramki:
> Sorry I can't get access to that page, the browser says the webpage is
> currently unavailable. But I did look through the whole mail thread
> regarding this issue. I am wondering if I get it right. If each thread
> failed fast and started to fall back to single-threaded full gc
> immediately, there wouldn't be such a long pause. But under the
> current mechanism, there's no such a global flag to coordinate threads
> to fall back, and each thread tends to retry the allocation of new
> plab which could result in highly active locking contention and hence
> the extremely long pause.
> Is that correct?
>
> Leon
>
>
> On 18 May 2013 01:29, Jon Masamitsu <jon.masamitsu at oracle.com
> <mailto:jon.masamitsu at oracle.com>> wrote:
>
> I found it
>
> 8005060: Promotion failure code does not scale
> <https://jbs.oracle.com/bugs/browse/JDK-8005060>
>
>
>
> On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote:
>> Looks like the search functionality ofbugs.sun.com <http://bugs.sun.com> is no longer
>> available. I tried searching the new bugzilla portal for the bug I had
>> submitted around that time, but that doesn't bring up the bug when i
>> use the normal search terms, so I do not know if the bug report is
>> still in review or not, and whether it ever made it into the set of
>> hotspot/gc bugs or not, but the Review ID i recvd was:-
>>
>> "Your Report (Review ID: 2391561) - Promotion failure code does not scale "
>>
>> I'll try and dig up the (raw, tentative) patch and send it in soon.
>>
>> -- ramki
>>
>>
>> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna<ysr1729 at gmail.com> <mailto:ysr1729 at gmail.com> wrote:
>>> Hi Leon --
>>>
>>> Here's the history of that discussion, starting with this email
>>> (follow subject thread):
>>>
>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>>>
>>> On Thu, May 16, 2013 at 10:06 PM,the.6th.month at gmail.com <mailto:the.6th.month at gmail.com>
>>> <the.6th.month at gmail.com> <mailto:the.6th.month at gmail.com> wrote:
>>>> hi, Ramki:
>>>> btw, could you possibly explain what the bugs are and how those bugs affect
>>>> the fallback fullgc time? I am really curious about the reason.
>>>> thanks very much.
>>>>
>>>> all the best,
>>>> Leon
>>>>
>>>> On 17 May 2013 12:39,"the.6th.month at gmail.com" <mailto:the.6th.month at gmail.com> <the.6th.month at gmail.com> <mailto:the.6th.month at gmail.com>
>>>> wrote:
>>>>> thanks very much indeed, hope we can see your patch soon
>>>>>
>>>>>
>>>>> On 17 May 2013 12:38, Srinivas Ramakrishna<ysr1729 at gmail.com> <mailto:ysr1729 at gmail.com> wrote:
>>>>>> Hi Leon --
>>>>>>
>>>>>> Yes, there are a couple of performance bugs related to promotion
>>>>>> failure handling with ParNew+CMS that can cause this time to balloon.
>>>>>> Here the unwind of the failed promotion took 177 s. I have at least a
>>>>>> partial fix for this which I had written up a few months ago but never
>>>>>> quite got around to collecting sufficient performance data to submit
>>>>>> it as an official patch.
>>>>>>
>>>>>> I'll try and revive that patch and submit it... May be someone else
>>>>>> can check if it helps sufficiently in the performance with promotion
>>>>>> failure.
>>>>>>
>>>>>> -- ramki
>>>>>>
>>>>>>
>>>>>> On Thu, May 16, 2013 at 9:19 PM,the.6th.month at gmail.com <mailto:the.6th.month at gmail.com>
>>>>>> <the.6th.month at gmail.com> <mailto:the.6th.month at gmail.com> wrote:
>>>>>>> hi, all:
>>>>>>> We just had a situation that I don't quite understand with CMS gc. When
>>>>>>> I
>>>>>>> examined the gc log, I found that there was a cms gc which resulted in
>>>>>>> a
>>>>>>> parnew promotion failure and concurrent mode failure at the same time,
>>>>>>> and
>>>>>>> then the full gc lasted for slightly over three minutes. Here is the gc
>>>>>>> log:
>>>>>>> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>>>>> 7.056/7.860
>>>>>>> secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>>>>> 2013-05-17T10:12:55.984+0800: 45168.775:
>>>>>>> [CMS-concurrent-preclean-start]
>>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>>>>> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544:
>>>>>>> [CMS-concurrent-abortable-preclean-start]
>>>>>>> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>>>>> (promotion
>>>>>>> failed)
>>>>>>> Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>>>>> - age 1: 70527216 bytes, 70527216 total
>>>>>>> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>>>>> abort
>>>>>>> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>>>>> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>>>>> user=44.72
>>>>>>> sys=13.59, real=179.45 secs]
>>>>>>> (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>>>>> secs]
>>>>>>> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>>>>> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>>>>>
>>>>>>> the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>>>>> lasted
>>>>>>> for 193 seconds. I understand that when there's a parnew gc happens
>>>>>>> during
>>>>>>> cms and to space is not large enough to hold all survived objects, or
>>>>>>> the
>>>>>>> remaining space in old gen cannot cope with memory allocation in old
>>>>>>> gen,
>>>>>>> full gc happens. But I don't understand why it hangs so long.
>>>>>>> I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>>>>> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>>>>> -XX:SurvivorRatio=6
>>>>>>> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>>>>> -Xverify:none
>>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>>>>> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>>>>> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>>>>> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>>>>>
>>>>>>> Could it be a bug that results in the long full gc in case of promotion
>>>>>>> failure or something else? Could anyone offer me some help, and I
>>>>>>> really
>>>>>>> appreciate your help.
>>>>>>>
>>>>>>> Looking forward to any reply.
>>>>>>>
>>>>>>> All the best,
>>>>>>> Leon
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> hotspot-gc-use mailing list
>>>>>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> <mailto:hotspot-gc-use at openjdk.java.net>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130520/a7064d63/attachment.html
More information about the hotspot-gc-use
mailing list