Parallel vs Serial Old (was Re: G1 issue: falling over to Full GC)

Vitaly Davidovich vitalyd at gmail.com
Sat Nov 3 07:42:51 PDT 2012


Thanks Ramki (and thanks for moving the thread to a new subject - should've
done that myself to avoid conflating it).

Hopefully I'll have time next week to investigate further.  If I do and
find anything of interest, I'll be sure to report back.

Have a good weekend,

Vitaly

Sent from my phone
On Nov 3, 2012 3:03 AM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

> [Edited subject line to show actual subject of discussion in last few
> emails in the thread]
>
> One issue I have found with ParallelOld vs Serial for sufficiently large
> heaps is that if there are large oop-rich objects,
> the deferred updates phase which is single-threaded and slow greatly
> dominates the pause time. There's discussion of this
> in an earlier thread (late last year or early this year), and I promised
> to work on a patch although never got around to it. We partially
> worked around it by preventing full compaction (i.e. compaction below
> dense prefix), but it doesn't work for all cases,
> for instance when an application churns large oop-rich objects (i.e.
> object arrays) through the old generation.
> Don't know if a CR was filed tracking that sighting and discussion.
>
> Other than those anomalies, I have usually seen user/elapsed time ratios
> of 10-12 using 18 worker threads in
> the cases I recall. That doesnot however mean a speed up of 10-12 versus
> serial. More like 5-6 x. YMMV of course.
>
> -- ramki
>
> On Fri, Nov 2, 2012 at 4:29 PM, Vitaly Davidovich <vitalyd at gmail.com>wrote:
>
>> To be honest, I didn't dig in yet as I got the set up running in our
>> plant towards the end of the day, and only casually looked at basic GC
>> timestamps for the full GCs.
>>
>> We do use some weak refs (no soft/phantom though), but I wouldn't call it
>> heavy (or even medium) for that matter.  However, I'd have to look at what
>> GC reports, as you mention, to make sure, but I'm pretty confident that
>> it's not heavy. :)
>>
>> The server is dedicated to this sole java process, and nothing else of
>> significance (mem or cpu) is running on there.
>>
>> I'll try to investigate next week to see if anything sticks out.  Regular
>> old GC is sufficient for my use case now, so I'm merely trying to see if I
>> can get some really cheap gains purely by enabling the parallel collector.
>> :)
>>
>> Generally speaking though, what sort of (ballpark) speedup is expected
>> for parallel old vs single threaded? Let's say on a machine with a modest
>> CPU count (8-16 hardware threads).  I'd imagine any contention would
>> significantly reduce the speedup factor for hugely parallel machines, but
>> curious about the modest space.  Are there any known issues/scenarios that
>> would nullify its benefit, other than what you've already mentioned?
>>
>> Thanks for all the advice and info.
>>
>> Sent from my phone
>> On Nov 2, 2012 7:14 PM, "Charlie Hunt" <chunt at salesforce.com> wrote:
>>
>>> Do you have GC logs you could share?
>>>
>>> We probably are gonna need more info on what's going on within
>>> ParallelOld.   We might get some additional info from
>>>  +PrintGCTaskTimeStamps or +PrintParallelOldGCPhaseTimes.  I don't recall
>>> how intrusive they are though.  If you've got a lot of threads, we'll
>>> probably get a lot of data too.  But, hopefully there's something in there
>>> that lends a clue as to issue.  If there's contention, that suggests to me
>>> some contention in work stealing.  IIRC, there's a way to get work stealing
>>> info in +ParallelOld GC.  But, my mind is drawing a blank. :-|
>>>
>>> Just off the top of my head, do you know if this app makes heavy use of
>>> Reference objects, i.e. < Weak | Soft | Phantom | Final > References?
>>>
>>> Adding +PrintReferenceGC will tell us what kind of overhead you're
>>> experiencing with reference processing.  If you're seeing high values of
>>> reference processing, then you'll probably want to add
>>> -XX:+ParallelRefProcEnabled.
>>>
>>> I'd look at reference processing first before looking at the
>>> +PrintParallelOldGCPhaseTimes or +PrintGCTaskTimeStamps.
>>>
>>> Ooh, another thought, are there other Java apps running on the same
>>> system?  If so, how many GC threads and application threads tend to be
>>> active at any given time?
>>>
>>> hths,
>>>
>>> charlie ...
>>>
>>> On Nov 2, 2012, at 5:42 PM, Vitaly Davidovich wrote:
>>>
>>> Thanks Charlie.  At a quick glance, I didn't see it benefit my case
>>> today (~5gb old) - wall clock time was roughly same as single threaded, but
>>> user time was quite high (7 secs wall, 37 sec user).  This is on an 8 way
>>> Xeon Linux server.
>>>
>>> I seem to vaguely recall reading that parallel old sometimes performs
>>> worse than single threaded old in some cases, perhaps due to some
>>> contention between GC threads.
>>>
>>> Anyway, I'll keep monitoring though.
>>>
>>> Thanks
>>>
>>> Sent from my phone
>>> On Nov 2, 2012 10:15 AM, "Charlie Hunt" <chunt at salesforce.com> wrote:
>>>
>>>> Yes, I'd recommend +UseParallelOldGC on 6u23 even though it's not
>>>> auto-enabled.
>>>>
>>>> hths,
>>>>
>>>> charlie ...
>>>>
>>>> On Nov 2, 2012, at 8:04 AM, Vitaly Davidovich wrote:
>>>>
>>>> Hi Charlie,
>>>>
>>>> Out of curiosity, is UseParallelOldGC advisable on, say, 6u23? It's off
>>>> by default, as you say, until 7u4 so I'm unsure if that's for some
>>>> good/specific reason or not.
>>>>
>>>> Thanks
>>>>
>>>> Sent from my phone
>>>> On Nov 2, 2012 8:36 AM, "Charlie Hunt" <chunt at salesforce.com> wrote:
>>>>
>>>>> Jumping in a bit late ...
>>>>>
>>>>> Strongly suggest to anyone evaluating G1 to not use anything prior to
>>>>> 7u4.  And, even better if you use (as of this writing) 7u9, or the latest
>>>>> production Java 7 HotSpot VM.
>>>>>
>>>>> Fwiw, I'm really liking what I am seeing in 7u9 with the exception on
>>>>> one issue, (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7143858),
>>>>> which is currently slated to be back ported to a future Java 7, (thanks
>>>>> Monica, John Cuthbertson and Bengt tackling this!).
>>>>>
>>>>> >From looking at your observations and others comments thus far, my
>>>>> initial reaction is that with a 1G Java heap, you might get the best
>>>>> results with -XX:+UseParallelOldGC.  Are you using -XX:+UseParallelGC, or
>>>>> -XX:+UseParallelOldGC?  Or, are you not setting a GC?  Not until 7u4 is
>>>>> -XX:+UseParallelOldGC automatically set for what's called "server class"
>>>>> machines when you don't specify a GC.
>>>>>
>>>>> The lengthy concurrent mark could be the result of the implementation
>>>>> of G1 in 6u*, or it could be that your system is swapping. Could you check
>>>>> if your system is swapping?  On Solaris you can monitor this using vmstat
>>>>> and observing, not only just free memory, but also sr == scan rate along
>>>>> with pi == page in and po == page out.  Seeing sr (page scan activity)
>>>>> along with low free memory along with pi & po activity are strong
>>>>> suggestions of swapping.  Seeing low free memory and no sr activity is ok,
>>>>> i.e. no swapping.
>>>>>
>>>>> Additionally, you are right.  "partial" was changed to "mixed" in the
>>>>> GC logs.  For those interested in a bit of history .... this change was
>>>>> made since we felt "partial" was misleading.  What partial was intended to
>>>>> mean was a partial old gen collection, which did occur.  But, on that same
>>>>> GC event it also included a young gen GC.  As a result, we changed the GC
>>>>> event name to "mixed" since that GC event was really a combination of both
>>>>> a young gen GC and portion of old gen GC.
>>>>>
>>>>> Simone also has a good suggestion with including -XX:+PrintFlagsFinal
>>>>> and -showversion as part of the GC log data to collect, especially with G1
>>>>> continuing to be improve and evolve.
>>>>>
>>>>> Look forward to seeing your GC logs!
>>>>>
>>>>> hths,
>>>>>
>>>>> charlie ....
>>>>>
>>>>> On Nov 2, 2012, at 5:46 AM, Andreas Müller wrote:
>>>>>
>>>>> > Hi Simone,
>>>>> >
>>>>> >> 4972.437: [GC pause (partial), 1.89505180 secs]
>>>>> >> that I cannot decypher (to Monica - what "partial" means ?), and no
>>>>> mixed GCs, which seems unusual as well.
>>>>> > Oops, I understand that now: 'partial' used to be what 'mixed' is
>>>>> now!
>>>>> > Our portal usually runs on Java 6u33. For the G1 tests I switched to
>>>>> 7u7 because I had learned that G1 is far from mature in 6u33.
>>>>> > But automatic deployments can overwrite the start script and thus
>>>>> switch back to 6u33.
>>>>> >
>>>>> >> Are you sure you are actually using 1.7.0_u7 ?
>>>>> > I have checked that in the archived start scripts and the result,
>>>>> unfortunetaley, is: no.
>>>>> > The 'good case' was actually running on 7u7 (that's why it was
>>>>> good), but the 'bad case' was unwittingly run on 6u33 again.
>>>>> > That's the true reason why the results were so much worse and so
>>>>> incomprehensible.
>>>>> > Thank you very much for looking at the log and for asking good
>>>>> questions!
>>>>> >
>>>>> > I'll try to repeat the test and post the results on this list.
>>>>> >
>>>>> > Regards
>>>>> > Andreas
>>>>> > _______________________________________________
>>>>> > hotspot-gc-use mailing list
>>>>> > hotspot-gc-use at openjdk.java.net
>>>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121103/aabe9d04/attachment.html 


More information about the hotspot-gc-use mailing list