does UseParallelOldGC guarantee a better full gc performance

Fri Apr 20 16:20:21 PDT 2012

Hi Leon -- (sorry for overloading standard replicated database terminology
here which may have confused you.)

Here's the relevant explanation from Peter Kessler:-

   http://markmail.org/message/fhoffb4ksczxk26q

The URL also contains the discussion earlier this year on this list that I
had alluded to before.

-- ramki

On Fri, Apr 20, 2012 at 8:01 AM, the.6th.month at gmail.com <
the.6th.month at gmail.com> wrote:

> Hi, Srinivas:
> Can you explain more about "since in general the incidence of the deferred
> updates phase may be affected by the number and size of the deferred
> objects and their oop-richness". I don't quite understand what it means and
> if it doesn't bother you too much, can you possible give some explanations
> about what a deferred object means.
> Thanks a million.
>
> All the best,
> Leon
>
>
> On 20 April 2012 17:44, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
>> BTW, max compaction doesn't happen every time, i think it happens in the
>> 4th gc and then every 20th gc or so.
>> It;s those occasional gc's that would be impacted. (And that had been our
>> experience with generally good performance
>> but the occasional much slower pause. Don't know if your experience is
>> similar.)
>>
>> No I don't think excessive deadwood is an issue. What is an issue is how
>> well this keeps up,
>> since in general the incidence of the deferred updates phase may be
>> affected by the number and
>> size of the deferred objects and their oop-richness, so I am not sure how
>> good a mitigant
>> avoiding maximal compaction is for long-lived JVM's with churn of latge
>> objects in the old
>> gen.
>>
>> -- ramki
>>
>>
>> On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com <
>> the.6th.month at gmail.com> wrote:
>>
>>> hi, Srinivas:
>>> that explains, i do observe that no performance gain has been obtained
>>> thru par old gc via the jmx mark_sweep_time (i have a monitoring system
>>> collecting that and print out with rrdtool). hopefully that's the result of
>>> maximum compaction, but i am keen to ask whether it will bring about any
>>> negative impact on performance, like leaving lots of fragmentations
>>> unreclaimed.
>>>
>>> all th best
>>> Leon
>>> On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" <ysr1729 at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu <
>>>> jon.masamitsu at oracle.com> wrote:
>>>>
>>>>> Leon,
>>>>>
>>>>> I don't think I've actually seen logs with the same flags except
>>>>> changing
>>>>> parallel old for serial old so hard for me to say.  Simon's  comment
>>>>>
>>>>> > Well, maybe. But it shows that the parallel collector does its work,
>>>>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores.
>>>>>
>>>>
>>>> I think Simon's "speed up" is a bit misleading. He shows that the
>>>> wall-time of 13.06 s
>>>> does user time eqvt work worth 41.91 seconds, so indeed a lot of
>>>> user-level work is
>>>> done in those 13.06 seconds. I'd call that "intrinsic parallelism"
>>>> rather than speed-up.
>>>> However, that's a misleading way to define speed-up because
>>>> (for all that the user cares about) all of that parallel work may be
>>>> overhead of the parallel algorithm
>>>> so that the bottom-line speed-up disappears. Rather, Simon and Leon,
>>>> you want to compare
>>>> the wall-clock pause-time seen with parallel old with that seen with
>>>> serial old (which i believe
>>>> is what Leon may have been referring to) which is how speed-up should
>>>> be defined when
>>>> comparing a parallel algorithm with a serial couterpart.
>>>>
>>>> Leon, in the past we observed (and you will likely find some discussion
>>>> in the archives) that
>>>> a particular phase called the "deferred updates" phase was taking a
>>>> bulk of the time
>>>> when we encountered longer pauses with parallel old. That's phase when
>>>> work is done
>>>> single-threaded and would exhibit lower parallelism. Typically, but not
>>>> always, this
>>>> would happen during the full gc pauses during which maximal compaction
>>>> was forced.
>>>> (This is done by default during the first and every 20 subsequent full
>>>> collections -- or so.)
>>>> We worked around that by turning off maximal compaction and letting the
>>>> dense prefix
>>>> alone.
>>>>
>>>> I believe a bug may have been filed following that discussion and it
>>>> had been my intention to
>>>> try and fix it (per discussion on the list). Unfortunately, other
>>>> matters intervened and I was
>>>> unable to get back to that work.
>>>>
>>>> PrintParallelGC{Task,Phase}Times (i think) will give you more
>>>> visibility into the various phases etc. and
>>>> might help you diagnose the performance issue.
>>>>
>>>> -- ramki
>>>>
>>>>
>>>>> says there is a parallel speed up, however, so I'll let you investigate
>>>>> you application
>>>>> and leave it at that.
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
>>>>> > Hi, Jon,
>>>>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about
>>>>> better
>>>>> > full gc performance as seen from JMX metrics but bring unexpected
>>>>> swap
>>>>> > consumption.
>>>>> > I am gonna look into my application instead for some inspiration.
>>>>> >
>>>>> > Leon
>>>>> >
>>>>> > On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>
>>>>>  wrote:
>>>>> >
>>>>> >> **
>>>>> >> Leon,
>>>>> >>
>>>>> >> In this log you see as part of an entry "PSOldGen:" which says
>>>>> you're
>>>>> >> using the serial mark sweep.  I see in your later posts that
>>>>> "ParOldGen:"
>>>>> >> appears in your log and that is the parallel mark sweep collector.
>>>>> >>
>>>>> >> Jon
>>>>> >>
>>>>> >>
>>>>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>>>>> >>
>>>>> >> Hi, Simon:
>>>>> >>
>>>>> >> this is the full gc log for your concern.
>>>>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC
>>>>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>>>>> >>   [PSYoungGen: 236288K->8126K(247616K)]
>>>>> 4054802K->3830711K(4081472K),
>>>>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>>>>> >>
>>>>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>>>>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>>>>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>>>>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>>>>> >>
>>>>> >> the full gc time is almost unchanged since I enabled paralleloldgc.
>>>>> >>
>>>>> >> Do you have any recommendation for an appropriate young gen size?
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >> All the best,
>>>>> >> Leon
>>>>> >>
>>>>> >>
>>>>> >> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <
>>>>> sbordet at intalio.com>  wrote:
>>>>> >>
>>>>> >>
>>>>> >>   Hi,
>>>>> >>
>>>>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<
>>>>> the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
>>>>> >>
>>>>> >>   hi all:
>>>>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC,
>>>>> expecting
>>>>> >> that would enhance the full gc efficiency and decrease the
>>>>> mark-sweep
>>>>> >>
>>>>> >>   time
>>>>> >>
>>>>> >>   by using multiple-core. The JAVA_OPTS is as below:
>>>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>>> -XX:+PrintTenuringDistribution
>>>>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>>>>> >> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>>>>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>>>>> >> as shown in jinfo output, the settings have taken effect, and the
>>>>> >> ParallelGCThreads is 4 since the jvm is running on a four-core
>>>>> server.
>>>>> >> But what's strange is that the mark-sweep time remains almost
>>>>> unchanged
>>>>> >>
>>>>> >>   (at
>>>>> >>
>>>>> >>   around 6-8 seconds), do I miss something here? Does anyone have
>>>>> the same
>>>>> >> experience or any idea about the reason behind?
>>>>> >> Thanks very much for help
>>>>> >>
>>>>> >>   The young generation is fairly small for a 4GiB heap.
>>>>> >>
>>>>> >> Can we see the lines you mention from the logs ?
>>>>> >>
>>>>> >> Simon
>>>>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
>>>>> >> ----
>>>>> >> Finally, no matter how good the architecture and design are,
>>>>> >> to deliver bug-free software with optimal performance and
>>>>> reliability,
>>>>> >> the implementation technique must be flawless.   Victoria Livschitz
>>>>> >> _______________________________________________
>>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> hotspot-gc-use mailing list
>>>>> >> hotspot-gc-use at openjdk.java.net
>>>>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>> >>
>>>>> >>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/d1d387f6/attachment.html