unexpected full gc time spike

Sun May 19 22:39:40 PDT 2013

Hi Leon,

Here is the link to the bug that should be available to you:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8005060

Ramki,

The bug report made it in to the hotspot/gc bug set just a couple of 
days after you filed it, in December 2012. It has been classified as an 
enhancement and since we are in a bug fixing phase right now we don't 
have anybody assigned to fixing this right now.

I haven't been following this thread closely enough to have a strong 
opinion about whether it is a bug or an enhancement. Let me know if you 
think we should update the  the report to be a bug rather than an 
enhancement.

If you have a patch it would be great to try to get it out for review. 
Sounds like a good thing to fix no matter how we define the bug report :)

Thanks,
Bengt

On 5/18/13 11:05 AM, the.6th.month at gmail.com wrote:
> Hi, Jon & Ramki:
> Sorry I can't get access to that page, the browser says the webpage is 
> currently unavailable. But I did  look through the whole mail thread 
> regarding this issue. I am wondering if I get it right. If each thread 
> failed fast and started to fall back to single-threaded full gc 
> immediately, there wouldn't be such a long pause. But under the 
> current mechanism, there's no such a global flag to coordinate threads 
> to fall back, and each thread tends to retry the allocation of new 
> plab which could result in highly active locking contention and hence 
> the extremely long pause.
> Is that correct?
>
> Leon
>
>
> On 18 May 2013 01:29, Jon Masamitsu <jon.masamitsu at oracle.com 
> <mailto:jon.masamitsu at oracle.com>> wrote:
>
>     I found it
>
>     8005060: Promotion failure code does not scale
>     <https://jbs.oracle.com/bugs/browse/JDK-8005060>
>
>
>
>     On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote:
>>     Looks like the search functionality ofbugs.sun.com  <http://bugs.sun.com>  is no longer
>>     available. I tried searching the new bugzilla portal for the bug I had
>>     submitted around that time, but that doesn't bring up the bug when i
>>     use the normal search terms, so I do not know if the bug report is
>>     still in review or not, and whether it ever made it into the set of
>>     hotspot/gc bugs or not, but the Review ID i recvd was:-
>>
>>           "Your Report (Review ID: 2391561) - Promotion failure code does not scale "
>>
>>     I'll try and dig up the (raw, tentative) patch and send it in soon.
>>
>>     -- ramki
>>
>>
>>     On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna<ysr1729 at gmail.com>  <mailto:ysr1729 at gmail.com>  wrote:
>>>     Hi Leon --
>>>
>>>     Here's the history of that discussion, starting with this email
>>>     (follow subject thread):
>>>
>>>     http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>>>
>>>     On Thu, May 16, 2013 at 10:06 PM,the.6th.month at gmail.com  <mailto:the.6th.month at gmail.com>
>>>     <the.6th.month at gmail.com>  <mailto:the.6th.month at gmail.com>  wrote:
>>>>     hi, Ramki:
>>>>     btw, could you possibly explain what the bugs are and how those bugs affect
>>>>     the fallback fullgc time? I am really curious about the reason.
>>>>     thanks very much.
>>>>
>>>>     all the best,
>>>>     Leon
>>>>
>>>>     On 17 May 2013 12:39,"the.6th.month at gmail.com"  <mailto:the.6th.month at gmail.com>  <the.6th.month at gmail.com>  <mailto:the.6th.month at gmail.com>
>>>>     wrote:
>>>>>     thanks very much indeed, hope we can see your patch soon
>>>>>
>>>>>
>>>>>     On 17 May 2013 12:38, Srinivas Ramakrishna<ysr1729 at gmail.com>  <mailto:ysr1729 at gmail.com>  wrote:
>>>>>>     Hi Leon --
>>>>>>
>>>>>>     Yes, there are a couple of performance bugs related to promotion
>>>>>>     failure handling with ParNew+CMS that can cause this time to balloon.
>>>>>>     Here the unwind of the failed promotion took 177 s. I have at least a
>>>>>>     partial fix for this which I had written up a few months ago but never
>>>>>>     quite got around to collecting sufficient performance data to submit
>>>>>>     it as an official patch.
>>>>>>
>>>>>>     I'll try and revive that patch and submit it... May be someone else
>>>>>>     can check if it helps sufficiently in the performance with promotion
>>>>>>     failure.
>>>>>>
>>>>>>     -- ramki
>>>>>>
>>>>>>
>>>>>>     On Thu, May 16, 2013 at 9:19 PM,the.6th.month at gmail.com  <mailto:the.6th.month at gmail.com>
>>>>>>     <the.6th.month at gmail.com>  <mailto:the.6th.month at gmail.com>  wrote:
>>>>>>>     hi, all:
>>>>>>>     We just had a situation that I don't quite understand with CMS gc. When
>>>>>>>     I
>>>>>>>     examined the gc log, I found that there was a cms gc which resulted in
>>>>>>>     a
>>>>>>>     parnew promotion failure and concurrent mode failure at the same time,
>>>>>>>     and
>>>>>>>     then the full gc lasted for slightly over three minutes. Here is the gc
>>>>>>>     log:
>>>>>>>     2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>>>>>     7.056/7.860
>>>>>>>     secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>>>>>     2013-05-17T10:12:55.984+0800: 45168.775:
>>>>>>>     [CMS-concurrent-preclean-start]
>>>>>>>     2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>>>>>     0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>>>>>     2013-05-17T10:12:56.753+0800: 45169.544:
>>>>>>>     [CMS-concurrent-abortable-preclean-start]
>>>>>>>     2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>>>>>     (promotion
>>>>>>>     failed)
>>>>>>>     Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>>>>>     - age   1:   70527216 bytes,   70527216 total
>>>>>>>     : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>>>>>     abort
>>>>>>>     preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>>>>>     [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>>>>>     user=44.72
>>>>>>>     sys=13.59, real=179.45 secs]
>>>>>>>       (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>>>>>     secs]
>>>>>>>     3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>>>>>     193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>>>>>
>>>>>>>     the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>>>>>     lasted
>>>>>>>     for 193 seconds. I understand that when there's a parnew gc happens
>>>>>>>     during
>>>>>>>     cms and to space is not large enough to hold all survived objects, or
>>>>>>>     the
>>>>>>>     remaining space in old gen cannot cope with memory allocation in old
>>>>>>>     gen,
>>>>>>>     full gc happens. But I don't understand why it hangs so long.
>>>>>>>     I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>>>>>     -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>>>>>     -XX:SurvivorRatio=6
>>>>>>>     -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>>>>>     -Xverify:none
>>>>>>>     -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>>>>>     -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>>>>>     -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>>>>>     -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>>>>>     -XX:+UseCMSInitiatingOccupancyOnly
>>>>>>>
>>>>>>>     Could it be a bug that results in the long full gc in case of promotion
>>>>>>>     failure or something else? Could anyone offer me some help, and I
>>>>>>>     really
>>>>>>>     appreciate your help.
>>>>>>>
>>>>>>>     Looking forward to any reply.
>>>>>>>
>>>>>>>     All the best,
>>>>>>>     Leon
>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     hotspot-gc-use mailing list
>>>>>>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>>>>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130520/a7064d63/attachment.html