unexpected full gc time spike

Fri May 17 10:05:16 PDT 2013

Looks like the search functionality of bugs.sun.com is no longer
available. I tried searching the new bugzilla portal for the bug I had
submitted around that time, but that doesn't bring up the bug when i
use the normal search terms, so I do not know if the bug report is
still in review or not, and whether it ever made it into the set of
hotspot/gc bugs or not, but the Review ID i recvd was:-

     "Your Report (Review ID: 2391561) - Promotion failure code does not scale "

I'll try and dig up the (raw, tentative) patch and send it in soon.

-- ramki

On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> Hi Leon --
>
> Here's the history of that discussion, starting with this email
> (follow subject thread):
>
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>
> On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
> <the.6th.month at gmail.com> wrote:
>> hi, Ramki:
>> btw, could you possibly explain what the bugs are and how those bugs affect
>> the fallback fullgc time? I am really curious about the reason.
>> thanks very much.
>>
>> all the best,
>> Leon
>>
>> On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com>
>> wrote:
>>>
>>> thanks very much indeed, hope we can see your patch soon
>>>
>>>
>>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>>>>
>>>> Hi Leon --
>>>>
>>>> Yes, there are a couple of performance bugs related to promotion
>>>> failure handling with ParNew+CMS that can cause this time to balloon.
>>>> Here the unwind of the failed promotion took 177 s. I have at least a
>>>> partial fix for this which I had written up a few months ago but never
>>>> quite got around to collecting sufficient performance data to submit
>>>> it as an official patch.
>>>>
>>>> I'll try and revive that patch and submit it... May be someone else
>>>> can check if it helps sufficiently in the performance with promotion
>>>> failure.
>>>>
>>>> -- ramki
>>>>
>>>>
>>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
>>>> <the.6th.month at gmail.com> wrote:
>>>> > hi, all:
>>>> > We just had a situation that I don't quite understand with CMS gc. When
>>>> > I
>>>> > examined the gc log, I found that there was a cms gc which resulted in
>>>> > a
>>>> > parnew promotion failure and concurrent mode failure at the same time,
>>>> > and
>>>> > then the full gc lasted for slightly over three minutes. Here is the gc
>>>> > log:
>>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>> > 7.056/7.860
>>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>> > 2013-05-17T10:12:55.984+0800: 45168.775:
>>>> > [CMS-concurrent-preclean-start]
>>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>> > 2013-05-17T10:12:56.753+0800: 45169.544:
>>>> > [CMS-concurrent-abortable-preclean-start]
>>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>> > (promotion
>>>> > failed)
>>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>> > - age   1:   70527216 bytes,   70527216 total
>>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>> > abort
>>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>> > user=44.72
>>>> > sys=13.59, real=179.45 secs]
>>>> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>> > secs]
>>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>> >
>>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>> > lasted
>>>> > for 193 seconds. I understand that when there's a parnew gc happens
>>>> > during
>>>> > cms and to space is not large enough to hold all survived objects, or
>>>> > the
>>>> > remaining space in old gen cannot cope with memory allocation in old
>>>> > gen,
>>>> > full gc happens. But I don't understand why it hangs so long.
>>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>> > -XX:SurvivorRatio=6
>>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>> > -Xverify:none
>>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>> > -XX:+UseCMSInitiatingOccupancyOnly
>>>> >
>>>> > Could it be a bug that results in the long full gc in case of promotion
>>>> > failure or something else? Could anyone offer me some help, and I
>>>> > really
>>>> > appreciate your help.
>>>> >
>>>> > Looking forward to any reply.
>>>> >
>>>> > All the best,
>>>> > Leon
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > hotspot-gc-use mailing list
>>>> > hotspot-gc-use at openjdk.java.net
>>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>> >
>>>
>>>
>>