unexpected full gc time spike

Fri May 17 10:14:01 PDT 2013

thanks ramki, looking forward to it

Leon
On 18 May 2013 01:05, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

> Looks like the search functionality of bugs.sun.com is no longer
> available. I tried searching the new bugzilla portal for the bug I had
> submitted around that time, but that doesn't bring up the bug when i
> use the normal search terms, so I do not know if the bug report is
> still in review or not, and whether it ever made it into the set of
> hotspot/gc bugs or not, but the Review ID i recvd was:-
>
>      "Your Report (Review ID: 2391561) - Promotion failure code does not
> scale "
>
> I'll try and dig up the (raw, tentative) patch and send it in soon.
>
> -- ramki
>
>
> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
> > Hi Leon --
> >
> > Here's the history of that discussion, starting with this email
> > (follow subject thread):
> >
> >
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
> >
> > On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
> > <the.6th.month at gmail.com> wrote:
> >> hi, Ramki:
> >> btw, could you possibly explain what the bugs are and how those bugs
> affect
> >> the fallback fullgc time? I am really curious about the reason.
> >> thanks very much.
> >>
> >> all the best,
> >> Leon
> >>
> >> On 17 May 2013 12:39, "the.6th.month at gmail.com" <
> the.6th.month at gmail.com>
> >> wrote:
> >>>
> >>> thanks very much indeed, hope we can see your patch soon
> >>>
> >>>
> >>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> >>>>
> >>>> Hi Leon --
> >>>>
> >>>> Yes, there are a couple of performance bugs related to promotion
> >>>> failure handling with ParNew+CMS that can cause this time to balloon.
> >>>> Here the unwind of the failed promotion took 177 s. I have at least a
> >>>> partial fix for this which I had written up a few months ago but never
> >>>> quite got around to collecting sufficient performance data to submit
> >>>> it as an official patch.
> >>>>
> >>>> I'll try and revive that patch and submit it... May be someone else
> >>>> can check if it helps sufficiently in the performance with promotion
> >>>> failure.
> >>>>
> >>>> -- ramki
> >>>>
> >>>>
> >>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
> >>>> <the.6th.month at gmail.com> wrote:
> >>>> > hi, all:
> >>>> > We just had a situation that I don't quite understand with CMS gc.
> When
> >>>> > I
> >>>> > examined the gc log, I found that there was a cms gc which resulted
> in
> >>>> > a
> >>>> > parnew promotion failure and concurrent mode failure at the same
> time,
> >>>> > and
> >>>> > then the full gc lasted for slightly over three minutes. Here is
> the gc
> >>>> > log:
> >>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
> >>>> > 7.056/7.860
> >>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
> >>>> > 2013-05-17T10:12:55.984+0800: 45168.775:
> >>>> > [CMS-concurrent-preclean-start]
> >>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
> >>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
> >>>> > 2013-05-17T10:12:56.753+0800: 45169.544:
> >>>> > [CMS-concurrent-abortable-preclean-start]
> >>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
> >>>> > (promotion
> >>>> > failed)
> >>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
> >>>> > - age   1:   70527216 bytes,   70527216 total
> >>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
> >>>> > abort
> >>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
> >>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
> >>>> > user=44.72
> >>>> > sys=13.59, real=179.45 secs]
> >>>> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
> >>>> > secs]
> >>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
> >>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
> >>>> >
> >>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it
> >>>> > lasted
> >>>> > for 193 seconds. I understand that when there's a parnew gc happens
> >>>> > during
> >>>> > cms and to space is not large enough to hold all survived objects,
> or
> >>>> > the
> >>>> > remaining space in old gen cannot cope with memory allocation in old
> >>>> > gen,
> >>>> > full gc happens. But I don't understand why it hangs so long.
> >>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
> >>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
> >>>> > -XX:SurvivorRatio=6
> >>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
> >>>> > -Xverify:none
> >>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> >>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
> >>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
> >>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
> >>>> > -XX:+UseCMSInitiatingOccupancyOnly
> >>>> >
> >>>> > Could it be a bug that results in the long full gc in case of
> promotion
> >>>> > failure or something else? Could anyone offer me some help, and I
> >>>> > really
> >>>> > appreciate your help.
> >>>> >
> >>>> > Looking forward to any reply.
> >>>> >
> >>>> > All the best,
> >>>> > Leon
> >>>> >
> >>>> >
> >>>> > _______________________________________________
> >>>> > hotspot-gc-use mailing list
> >>>> > hotspot-gc-use at openjdk.java.net
> >>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>>> >
> >>>
> >>>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130518/6633aaca/attachment-0001.html