Parallel vs Serial Old (was Re: G1 issue: falling over to Full GC)

Srinivas Ramakrishna ysr1729 at gmail.com
Sun Nov 4 10:27:11 PST 2012


Hi Andreas --

Great to hear from you; it's been a while! Hope you are doing well!

On Sun, Nov 4, 2012 at 12:13 AM, Andreas Müller
<Andreas.Mueller at mgm-tp.com>wrote:

>  Hi,****
>
> ** **
>
> I can confirm Vitaly’s observation that ParallelOldGC in many cases does
> not bring about much benefit.****
>
> Sometimes I saw usr/real time ratio stayed close to 1 and sometimes it was
> higher but with very little effect on the Full GC pause times. ****
>
> BTW, do you expect much effect with that option on a 2-CPU-machine? What
> percentage range?
>

For a 2 virtual cpu's, i've found it a wash. But anything above that
definitely improves average performance for whole heap gc's, in my
experience.
The occasoional longer whole heap GC can, however, make the overall
experience negative because that pause is sometimes worse than a
serial gc pause (see remarks in my previous email).


> ****
>
> ** **
>
> I also found that presentation again which claimed that “-XX:+UseParallelOldGC
> (on by default with ParallelGC in JDK 6”:****
>
> http://www.austinjug.org/presentations/JDK6PerfUpdate_Dec2009.pdf****
>
> which had confused me for a while because I could not get usr/real>1
> during Full GC runs without adding –XX:+UseParallelOldGC explicitly.
>

Yes, that presentation was probably written when there may have been some
debate on the default, and the change of defaults
never made it because of a performance anomaly caught late in the release
cycle.

I can confirm for example that with 7u5, ParallelOld is the default, and
with 6u29 it not the default.


$ /usr/lib/jvm/jdk1.6.0_29/bin/java -XX:+PrintFlagsFinal -version | grep
ParallelOldGC
     bool PrintParallelOldGCPhaseTimes              = false
{product}
     bool TraceParallelOldGCTasks                   = false
{product}
     bool UseParallelOldGC                          = false
{product}
     bool UseParallelOldGCCompacting                = true
{product}
     bool UseParallelOldGCDensePrefix               = true
{product}
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)

$ /usr/lib/jvm/jdk1.7.0_05/bin/java -XX:+PrintFlagsFinal -version | grep
ParallelOldGC
     bool PrintParallelOldGCPhaseTimes              = false
{product}
     bool TraceParallelOldGCTasks                   = false
{product}
     bool UseParallelOldGC                          = true
{product}
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)

best regards.
-- ramki


****
>
> ** **
>
> Best regards****
>
> Andreas****
>
> ** **
>
> ***Von:* Srinivas Ramakrishna [mailto:ysr1729 at gmail.com]
> *Gesendet:* Samstag, 3. November 2012 08:04
> *An:* Vitaly Davidovich
> *Cc:* Charlie Hunt; Andreas Müller; hotspot-gc-use; Simone Bordet
> *Betreff:* Parallel vs Serial Old (was Re: G1 issue: falling over to Full
> GC)****
>
> ** **
>
> [Edited subject line to show actual subject of discussion in last few
> emails in the thread]
>
> One issue I have found with ParallelOld vs Serial for sufficiently large
> heaps is that if there are large oop-rich objects,
> the deferred updates phase which is single-threaded and slow greatly
> dominates the pause time. There's discussion of this
> in an earlier thread (late last year or early this year), and I promised
> to work on a patch although never got around to it. We partially
> worked around it by preventing full compaction (i.e. compaction below
> dense prefix), but it doesn't work for all cases,
> for instance when an application churns large oop-rich objects (i.e.
> object arrays) through the old generation.
> Don't know if a CR was filed tracking that sighting and discussion.
>
> Other than those anomalies, I have usually seen user/elapsed time ratios
> of 10-12 using 18 worker threads in
> the cases I recall. That doesnot however mean a speed up of 10-12 versus
> serial. More like 5-6 x. YMMV of course.
>
> -- ramki****
>
> On Fri, Nov 2, 2012 at 4:29 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:****
>
> To be honest, I didn't dig in yet as I got the set up running in our plant
> towards the end of the day, and only casually looked at basic GC timestamps
> for the full GCs.****
>
> We do use some weak refs (no soft/phantom though), but I wouldn't call it
> heavy (or even medium) for that matter.  However, I'd have to look at what
> GC reports, as you mention, to make sure, but I'm pretty confident that
> it's not heavy. :)****
>
> The server is dedicated to this sole java process, and nothing else of
> significance (mem or cpu) is running on there.****
>
> I'll try to investigate next week to see if anything sticks out.  Regular
> old GC is sufficient for my use case now, so I'm merely trying to see if I
> can get some really cheap gains purely by enabling the parallel collector.
> :)****
>
> Generally speaking though, what sort of (ballpark) speedup is expected for
> parallel old vs single threaded? Let's say on a machine with a modest CPU
> count (8-16 hardware threads).  I'd imagine any contention would
> significantly reduce the speedup factor for hugely parallel machines, but
> curious about the modest space.  Are there any known issues/scenarios that
> would nullify its benefit, other than what you've already mentioned?****
>
> Thanks for all the advice and info.****
>
> Sent from my phone****
>
> On Nov 2, 2012 7:14 PM, "Charlie Hunt" <chunt at salesforce.com> wrote:****
>
> Do you have GC logs you could share?****
>
> ** **
>
> We probably are gonna need more info on what's going on within
> ParallelOld.   We might get some additional info from
>  +PrintGCTaskTimeStamps or +PrintParallelOldGCPhaseTimes.  I don't recall
> how intrusive they are though.  If you've got a lot of threads, we'll
> probably get a lot of data too.  But, hopefully there's something in there
> that lends a clue as to issue.  If there's contention, that suggests to me
> some contention in work stealing.  IIRC, there's a way to get work stealing
> info in +ParallelOld GC.  But, my mind is drawing a blank. :-|****
>
> ** **
>
> Just off the top of my head, do you know if this app makes heavy use of
> Reference objects, i.e. < Weak | Soft | Phantom | Final > References?****
>
> ** **
>
> Adding +PrintReferenceGC will tell us what kind of overhead you're
> experiencing with reference processing.  If you're seeing high values of
> reference processing, then you'll probably want to add
> -XX:+ParallelRefProcEnabled.****
>
> ** **
>
> I'd look at reference processing first before looking at the
> +PrintParallelOldGCPhaseTimes or +PrintGCTaskTimeStamps.****
>
> ** **
>
> Ooh, another thought, are there other Java apps running on the same
> system?  If so, how many GC threads and application threads tend to be
> active at any given time?****
>
> ** **
>
> hths,****
>
> ** **
>
> charlie ...****
>
> ** **
>
> On Nov 2, 2012, at 5:42 PM, Vitaly Davidovich wrote:****
>
>
>
> ****
>
> Thanks Charlie.  At a quick glance, I didn't see it benefit my case today
> (~5gb old) - wall clock time was roughly same as single threaded, but user
> time was quite high (7 secs wall, 37 sec user).  This is on an 8 way Xeon
> Linux server.****
>
> I seem to vaguely recall reading that parallel old sometimes performs
> worse than single threaded old in some cases, perhaps due to some
> contention between GC threads.****
>
> Anyway, I'll keep monitoring though.****
>
> Thanks****
>
> Sent from my phone****
>
> On Nov 2, 2012 10:15 AM, "Charlie Hunt" <chunt at salesforce.com> wrote:****
>
> Yes, I'd recommend +UseParallelOldGC on 6u23 even though it's not
> auto-enabled.****
>
> ** **
>
> hths,****
>
> ** **
>
> charlie ...****
>
> ** **
>
> On Nov 2, 2012, at 8:04 AM, Vitaly Davidovich wrote:****
>
>
>
> ****
>
> Hi Charlie,****
>
> Out of curiosity, is UseParallelOldGC advisable on, say, 6u23? It's off by
> default, as you say, until 7u4 so I'm unsure if that's for some
> good/specific reason or not.****
>
> Thanks****
>
> Sent from my phone****
>
> On Nov 2, 2012 8:36 AM, "Charlie Hunt" <chunt at salesforce.com> wrote:****
>
> Jumping in a bit late ...
>
> Strongly suggest to anyone evaluating G1 to not use anything prior to 7u4.
>  And, even better if you use (as of this writing) 7u9, or the latest
> production Java 7 HotSpot VM.
>
> Fwiw, I'm really liking what I am seeing in 7u9 with the exception on one
> issue, (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7143858),
> which is currently slated to be back ported to a future Java 7, (thanks
> Monica, John Cuthbertson and Bengt tackling this!).
>
> >From looking at your observations and others comments thus far, my
> initial reaction is that with a 1G Java heap, you might get the best
> results with -XX:+UseParallelOldGC.  Are you using -XX:+UseParallelGC, or
> -XX:+UseParallelOldGC?  Or, are you not setting a GC?  Not until 7u4 is
> -XX:+UseParallelOldGC automatically set for what's called "server class"
> machines when you don't specify a GC.
>
> The lengthy concurrent mark could be the result of the implementation of
> G1 in 6u*, or it could be that your system is swapping. Could you check if
> your system is swapping?  On Solaris you can monitor this using vmstat and
> observing, not only just free memory, but also sr == scan rate along with
> pi == page in and po == page out.  Seeing sr (page scan activity) along
> with low free memory along with pi & po activity are strong suggestions of
> swapping.  Seeing low free memory and no sr activity is ok, i.e. no
> swapping.
>
> Additionally, you are right.  "partial" was changed to "mixed" in the GC
> logs.  For those interested in a bit of history .... this change was made
> since we felt "partial" was misleading.  What partial was intended to mean
> was a partial old gen collection, which did occur.  But, on that same GC
> event it also included a young gen GC.  As a result, we changed the GC
> event name to "mixed" since that GC event was really a combination of both
> a young gen GC and portion of old gen GC.
>
> Simone also has a good suggestion with including -XX:+PrintFlagsFinal and
> -showversion as part of the GC log data to collect, especially with G1
> continuing to be improve and evolve.
>
> Look forward to seeing your GC logs!
>
> hths,
>
> charlie ....
>
> On Nov 2, 2012, at 5:46 AM, Andreas Müller wrote:
>
> > Hi Simone,
> >
> >> 4972.437: [GC pause (partial), 1.89505180 secs]
> >> that I cannot decypher (to Monica - what "partial" means ?), and no
> mixed GCs, which seems unusual as well.
> > Oops, I understand that now: 'partial' used to be what 'mixed' is now!
> > Our portal usually runs on Java 6u33. For the G1 tests I switched to 7u7
> because I had learned that G1 is far from mature in 6u33.
> > But automatic deployments can overwrite the start script and thus switch
> back to 6u33.
> >
> >> Are you sure you are actually using 1.7.0_u7 ?
> > I have checked that in the archived start scripts and the result,
> unfortunetaley, is: no.
> > The 'good case' was actually running on 7u7 (that's why it was good),
> but the 'bad case' was unwittingly run on 6u33 again.
> > That's the true reason why the results were so much worse and so
> incomprehensible.
> > Thank you very much for looking at the log and for asking good questions!
> >
> > I'll try to repeat the test and post the results on this list.
> >
> > Regards
> > Andreas
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use****
>
> ** **
>
> ** **
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121104/4a21bd1a/attachment-0001.html 


More information about the hotspot-gc-use mailing list