Parallel vs Serial Old (was Re: G1 issue: falling over to Full GC)

Sat Nov 3 00:03:30 PDT 2012

[Edited subject line to show actual subject of discussion in last few
emails in the thread]

One issue I have found with ParallelOld vs Serial for sufficiently large
heaps is that if there are large oop-rich objects,
the deferred updates phase which is single-threaded and slow greatly
dominates the pause time. There's discussion of this
in an earlier thread (late last year or early this year), and I promised to
work on a patch although never got around to it. We partially
worked around it by preventing full compaction (i.e. compaction below dense
prefix), but it doesn't work for all cases,
for instance when an application churns large oop-rich objects (i.e. object
arrays) through the old generation.
Don't know if a CR was filed tracking that sighting and discussion.

Other than those anomalies, I have usually seen user/elapsed time ratios of
10-12 using 18 worker threads in
the cases I recall. That doesnot however mean a speed up of 10-12 versus
serial. More like 5-6 x. YMMV of course.

-- ramki

On Fri, Nov 2, 2012 at 4:29 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

> To be honest, I didn't dig in yet as I got the set up running in our plant
> towards the end of the day, and only casually looked at basic GC timestamps
> for the full GCs.
>
> We do use some weak refs (no soft/phantom though), but I wouldn't call it
> heavy (or even medium) for that matter.  However, I'd have to look at what
> GC reports, as you mention, to make sure, but I'm pretty confident that
> it's not heavy. :)
>
> The server is dedicated to this sole java process, and nothing else of
> significance (mem or cpu) is running on there.
>
> I'll try to investigate next week to see if anything sticks out.  Regular
> old GC is sufficient for my use case now, so I'm merely trying to see if I
> can get some really cheap gains purely by enabling the parallel collector.
> :)
>
> Generally speaking though, what sort of (ballpark) speedup is expected for
> parallel old vs single threaded? Let's say on a machine with a modest CPU
> count (8-16 hardware threads).  I'd imagine any contention would
> significantly reduce the speedup factor for hugely parallel machines, but
> curious about the modest space.  Are there any known issues/scenarios that
> would nullify its benefit, other than what you've already mentioned?
>
> Thanks for all the advice and info.
>
> Sent from my phone
> On Nov 2, 2012 7:14 PM, "Charlie Hunt" <chunt at salesforce.com> wrote:
>
>> Do you have GC logs you could share?
>>
>> We probably are gonna need more info on what's going on within
>> ParallelOld.   We might get some additional info from
>>  +PrintGCTaskTimeStamps or +PrintParallelOldGCPhaseTimes.  I don't recall
>> how intrusive they are though.  If you've got a lot of threads, we'll
>> probably get a lot of data too.  But, hopefully there's something in there
>> that lends a clue as to issue.  If there's contention, that suggests to me
>> some contention in work stealing.  IIRC, there's a way to get work stealing
>> info in +ParallelOld GC.  But, my mind is drawing a blank. :-|
>>
>> Just off the top of my head, do you know if this app makes heavy use of
>> Reference objects, i.e. < Weak | Soft | Phantom | Final > References?
>>
>> Adding +PrintReferenceGC will tell us what kind of overhead you're
>> experiencing with reference processing.  If you're seeing high values of
>> reference processing, then you'll probably want to add
>> -XX:+ParallelRefProcEnabled.
>>
>> I'd look at reference processing first before looking at the
>> +PrintParallelOldGCPhaseTimes or +PrintGCTaskTimeStamps.
>>
>> Ooh, another thought, are there other Java apps running on the same
>> system?  If so, how many GC threads and application threads tend to be
>> active at any given time?
>>
>> hths,
>>
>> charlie ...
>>
>> On Nov 2, 2012, at 5:42 PM, Vitaly Davidovich wrote:
>>
>> Thanks Charlie.  At a quick glance, I didn't see it benefit my case today
>> (~5gb old) - wall clock time was roughly same as single threaded, but user
>> time was quite high (7 secs wall, 37 sec user).  This is on an 8 way Xeon
>> Linux server.
>>
>> I seem to vaguely recall reading that parallel old sometimes performs
>> worse than single threaded old in some cases, perhaps due to some
>> contention between GC threads.
>>
>> Anyway, I'll keep monitoring though.
>>
>> Thanks
>>
>> Sent from my phone
>> On Nov 2, 2012 10:15 AM, "Charlie Hunt" <chunt at salesforce.com> wrote:
>>
>>> Yes, I'd recommend +UseParallelOldGC on 6u23 even though it's not
>>> auto-enabled.
>>>
>>> hths,
>>>
>>> charlie ...
>>>
>>> On Nov 2, 2012, at 8:04 AM, Vitaly Davidovich wrote:
>>>
>>> Hi Charlie,
>>>
>>> Out of curiosity, is UseParallelOldGC advisable on, say, 6u23? It's off
>>> by default, as you say, until 7u4 so I'm unsure if that's for some
>>> good/specific reason or not.
>>>
>>> Thanks
>>>
>>> Sent from my phone
>>> On Nov 2, 2012 8:36 AM, "Charlie Hunt" <chunt at salesforce.com> wrote:
>>>
>>>> Jumping in a bit late ...
>>>>
>>>> Strongly suggest to anyone evaluating G1 to not use anything prior to
>>>> 7u4.  And, even better if you use (as of this writing) 7u9, or the latest
>>>> production Java 7 HotSpot VM.
>>>>
>>>> Fwiw, I'm really liking what I am seeing in 7u9 with the exception on
>>>> one issue, (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7143858),
>>>> which is currently slated to be back ported to a future Java 7, (thanks
>>>> Monica, John Cuthbertson and Bengt tackling this!).
>>>>
>>>> >From looking at your observations and others comments thus far, my
>>>> initial reaction is that with a 1G Java heap, you might get the best
>>>> results with -XX:+UseParallelOldGC.  Are you using -XX:+UseParallelGC, or
>>>> -XX:+UseParallelOldGC?  Or, are you not setting a GC?  Not until 7u4 is
>>>> -XX:+UseParallelOldGC automatically set for what's called "server class"
>>>> machines when you don't specify a GC.
>>>>
>>>> The lengthy concurrent mark could be the result of the implementation
>>>> of G1 in 6u*, or it could be that your system is swapping. Could you check
>>>> if your system is swapping?  On Solaris you can monitor this using vmstat
>>>> and observing, not only just free memory, but also sr == scan rate along
>>>> with pi == page in and po == page out.  Seeing sr (page scan activity)
>>>> along with low free memory along with pi & po activity are strong
>>>> suggestions of swapping.  Seeing low free memory and no sr activity is ok,
>>>> i.e. no swapping.
>>>>
>>>> Additionally, you are right.  "partial" was changed to "mixed" in the
>>>> GC logs.  For those interested in a bit of history .... this change was
>>>> made since we felt "partial" was misleading.  What partial was intended to
>>>> mean was a partial old gen collection, which did occur.  But, on that same
>>>> GC event it also included a young gen GC.  As a result, we changed the GC
>>>> event name to "mixed" since that GC event was really a combination of both
>>>> a young gen GC and portion of old gen GC.
>>>>
>>>> Simone also has a good suggestion with including -XX:+PrintFlagsFinal
>>>> and -showversion as part of the GC log data to collect, especially with G1
>>>> continuing to be improve and evolve.
>>>>
>>>> Look forward to seeing your GC logs!
>>>>
>>>> hths,
>>>>
>>>> charlie ....
>>>>
>>>> On Nov 2, 2012, at 5:46 AM, Andreas Müller wrote:
>>>>
>>>> > Hi Simone,
>>>> >
>>>> >> 4972.437: [GC pause (partial), 1.89505180 secs]
>>>> >> that I cannot decypher (to Monica - what "partial" means ?), and no
>>>> mixed GCs, which seems unusual as well.
>>>> > Oops, I understand that now: 'partial' used to be what 'mixed' is now!
>>>> > Our portal usually runs on Java 6u33. For the G1 tests I switched to
>>>> 7u7 because I had learned that G1 is far from mature in 6u33.
>>>> > But automatic deployments can overwrite the start script and thus
>>>> switch back to 6u33.
>>>> >
>>>> >> Are you sure you are actually using 1.7.0_u7 ?
>>>> > I have checked that in the archived start scripts and the result,
>>>> unfortunetaley, is: no.
>>>> > The 'good case' was actually running on 7u7 (that's why it was good),
>>>> but the 'bad case' was unwittingly run on 6u33 again.
>>>> > That's the true reason why the results were so much worse and so
>>>> incomprehensible.
>>>> > Thank you very much for looking at the log and for asking good
>>>> questions!
>>>> >
>>>> > I'll try to repeat the test and post the results on this list.
>>>> >
>>>> > Regards
>>>> > Andreas
>>>> > _______________________________________________
>>>> > hotspot-gc-use mailing list
>>>> > hotspot-gc-use at openjdk.java.net
>>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>
>>>
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121103/cf892289/attachment-0001.html