AW: Parallel vs Serial Old (was Re: G1 issue: falling over to Full GC)

Andreas Müller Andreas.Mueller at mgm-tp.com
Sun Nov 4 00:13:34 PDT 2012


Hi,

I can confirm Vitaly's observation that ParallelOldGC in many cases does not bring about much benefit.
Sometimes I saw usr/real time ratio stayed close to 1 and sometimes it was higher but with very little effect on the Full GC pause times.
BTW, do you expect much effect with that option on a 2-CPU-machine? What percentage range?

I also found that presentation again which claimed that "-XX:+UseParallelOldGC (on by default with ParallelGC in JDK 6":
http://www.austinjug.org/presentations/JDK6PerfUpdate_Dec2009.pdf
which had confused me for a while because I could not get usr/real>1 during Full GC runs without adding -XX:+UseParallelOldGC explicitly.

Best regards
Andreas

Von: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com]
Gesendet: Samstag, 3. November 2012 08:04
An: Vitaly Davidovich
Cc: Charlie Hunt; Andreas Müller; hotspot-gc-use; Simone Bordet
Betreff: Parallel vs Serial Old (was Re: G1 issue: falling over to Full GC)

[Edited subject line to show actual subject of discussion in last few emails in the thread]

One issue I have found with ParallelOld vs Serial for sufficiently large heaps is that if there are large oop-rich objects,
the deferred updates phase which is single-threaded and slow greatly dominates the pause time. There's discussion of this
in an earlier thread (late last year or early this year), and I promised to work on a patch although never got around to it. We partially
worked around it by preventing full compaction (i.e. compaction below dense prefix), but it doesn't work for all cases,
for instance when an application churns large oop-rich objects (i.e. object arrays) through the old generation.
Don't know if a CR was filed tracking that sighting and discussion.

Other than those anomalies, I have usually seen user/elapsed time ratios of 10-12 using 18 worker threads in
the cases I recall. That doesnot however mean a speed up of 10-12 versus serial. More like 5-6 x. YMMV of course.

-- ramki
On Fri, Nov 2, 2012 at 4:29 PM, Vitaly Davidovich <vitalyd at gmail.com<mailto:vitalyd at gmail.com>> wrote:

To be honest, I didn't dig in yet as I got the set up running in our plant towards the end of the day, and only casually looked at basic GC timestamps for the full GCs.

We do use some weak refs (no soft/phantom though), but I wouldn't call it heavy (or even medium) for that matter.  However, I'd have to look at what GC reports, as you mention, to make sure, but I'm pretty confident that it's not heavy. :)

The server is dedicated to this sole java process, and nothing else of significance (mem or cpu) is running on there.

I'll try to investigate next week to see if anything sticks out.  Regular old GC is sufficient for my use case now, so I'm merely trying to see if I can get some really cheap gains purely by enabling the parallel collector. :)

Generally speaking though, what sort of (ballpark) speedup is expected for parallel old vs single threaded? Let's say on a machine with a modest CPU count (8-16 hardware threads).  I'd imagine any contention would significantly reduce the speedup factor for hugely parallel machines, but curious about the modest space.  Are there any known issues/scenarios that would nullify its benefit, other than what you've already mentioned?

Thanks for all the advice and info.

Sent from my phone
On Nov 2, 2012 7:14 PM, "Charlie Hunt" <chunt at salesforce.com<mailto:chunt at salesforce.com>> wrote:
Do you have GC logs you could share?

We probably are gonna need more info on what's going on within ParallelOld.   We might get some additional info from  +PrintGCTaskTimeStamps or +PrintParallelOldGCPhaseTimes.  I don't recall how intrusive they are though.  If you've got a lot of threads, we'll probably get a lot of data too.  But, hopefully there's something in there that lends a clue as to issue.  If there's contention, that suggests to me some contention in work stealing.  IIRC, there's a way to get work stealing info in +ParallelOld GC.  But, my mind is drawing a blank. :-|

Just off the top of my head, do you know if this app makes heavy use of Reference objects, i.e. < Weak | Soft | Phantom | Final > References?

Adding +PrintReferenceGC will tell us what kind of overhead you're experiencing with reference processing.  If you're seeing high values of reference processing, then you'll probably want to add -XX:+ParallelRefProcEnabled.

I'd look at reference processing first before looking at the +PrintParallelOldGCPhaseTimes or +PrintGCTaskTimeStamps.

Ooh, another thought, are there other Java apps running on the same system?  If so, how many GC threads and application threads tend to be active at any given time?

hths,

charlie ...

On Nov 2, 2012, at 5:42 PM, Vitaly Davidovich wrote:



Thanks Charlie.  At a quick glance, I didn't see it benefit my case today (~5gb old) - wall clock time was roughly same as single threaded, but user time was quite high (7 secs wall, 37 sec user).  This is on an 8 way Xeon Linux server.

I seem to vaguely recall reading that parallel old sometimes performs worse than single threaded old in some cases, perhaps due to some contention between GC threads.

Anyway, I'll keep monitoring though.

Thanks

Sent from my phone
On Nov 2, 2012 10:15 AM, "Charlie Hunt" <chunt at salesforce.com<mailto:chunt at salesforce.com>> wrote:
Yes, I'd recommend +UseParallelOldGC on 6u23 even though it's not auto-enabled.

hths,

charlie ...

On Nov 2, 2012, at 8:04 AM, Vitaly Davidovich wrote:



Hi Charlie,

Out of curiosity, is UseParallelOldGC advisable on, say, 6u23? It's off by default, as you say, until 7u4 so I'm unsure if that's for some good/specific reason or not.

Thanks

Sent from my phone
On Nov 2, 2012 8:36 AM, "Charlie Hunt" <chunt at salesforce.com<mailto:chunt at salesforce.com>> wrote:
Jumping in a bit late ...

Strongly suggest to anyone evaluating G1 to not use anything prior to 7u4.  And, even better if you use (as of this writing) 7u9, or the latest production Java 7 HotSpot VM.

Fwiw, I'm really liking what I am seeing in 7u9 with the exception on one issue, (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7143858), which is currently slated to be back ported to a future Java 7, (thanks Monica, John Cuthbertson and Bengt tackling this!).

>From looking at your observations and others comments thus far, my initial reaction is that with a 1G Java heap, you might get the best results with -XX:+UseParallelOldGC.  Are you using -XX:+UseParallelGC, or -XX:+UseParallelOldGC?  Or, are you not setting a GC?  Not until 7u4 is -XX:+UseParallelOldGC automatically set for what's called "server class" machines when you don't specify a GC.

The lengthy concurrent mark could be the result of the implementation of G1 in 6u*, or it could be that your system is swapping. Could you check if your system is swapping?  On Solaris you can monitor this using vmstat and observing, not only just free memory, but also sr == scan rate along with pi == page in and po == page out.  Seeing sr (page scan activity) along with low free memory along with pi & po activity are strong suggestions of swapping.  Seeing low free memory and no sr activity is ok, i.e. no swapping.

Additionally, you are right.  "partial" was changed to "mixed" in the GC logs.  For those interested in a bit of history .... this change was made since we felt "partial" was misleading.  What partial was intended to mean was a partial old gen collection, which did occur.  But, on that same GC event it also included a young gen GC.  As a result, we changed the GC event name to "mixed" since that GC event was really a combination of both a young gen GC and portion of old gen GC.

Simone also has a good suggestion with including -XX:+PrintFlagsFinal and -showversion as part of the GC log data to collect, especially with G1 continuing to be improve and evolve.

Look forward to seeing your GC logs!

hths,

charlie ....

On Nov 2, 2012, at 5:46 AM, Andreas Müller wrote:

> Hi Simone,
>
>> 4972.437: [GC pause (partial), 1.89505180 secs]
>> that I cannot decypher (to Monica - what "partial" means ?), and no mixed GCs, which seems unusual as well.
> Oops, I understand that now: 'partial' used to be what 'mixed' is now!
> Our portal usually runs on Java 6u33. For the G1 tests I switched to 7u7 because I had learned that G1 is far from mature in 6u33.
> But automatic deployments can overwrite the start script and thus switch back to 6u33.
>
>> Are you sure you are actually using 1.7.0_u7 ?
> I have checked that in the archived start scripts and the result, unfortunetaley, is: no.
> The 'good case' was actually running on 7u7 (that's why it was good), but the 'bad case' was unwittingly run on 6u33 again.
> That's the true reason why the results were so much worse and so incomprehensible.
> Thank you very much for looking at the log and for asking good questions!
>
> I'll try to repeat the test and post the results on this list.
>
> Regards
> Andreas
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121104/aa996a32/attachment-0001.html 


More information about the hotspot-gc-use mailing list