RFR (M): 8146987: Improve Parallel GC Full GC by caching results of live_words_in_range() [Was: Re: [PATCH] enhancement to ParallelScavenge Full GC]

Fri Jan 15 14:10:36 UTC 2016

Hi Jon, Thomas

We initially implemented this in consideration of the completeness in
execution logic.
We don't have detailed data to show the effect of lines 144-145
individually.
We believe that these two lines (if not hit) incurs little overhead, but
this logic can help to cache more in some occasions, so we just kept it.
Thanks Thomas, for helping us prove the effectiveness.

2016-01-15 20:49 GMT+08:00 Thomas Schatzl <thomas.schatzl at oracle.com>:

> On Thu, 2016-01-14 at 12:09 -0800, Jon Masamitsu wrote:
> > http://cr.openjdk.java.net/~tschatzl/8146987/webrev.1/src/share/vm/gc
> > /parallel/parMarkBitMap.cpp.frames.html
> >
> >  143   } else if (end_obj < last_obj) {
> >  144     if (pointer_delta((HeapWord*)end_obj, (HeapWord*)beg_addr) >
> > pointer_delta((HeapWord*)last_obj, (HeapWord*)end_obj)) {
> >  145       last_ret = last_ret -
> > live_words_in_range_helper((HeapWord*)end_obj, last_obj);
> >  146     } else {
> >  147       last_ret = live_words_in_range_helper(beg_addr, end_obj);
> >  148     }
> >
> > Did you measure the performance improvement afforded by  lines 144 -
> > 145?
> >
> > The calculation of the new address is used in two cases.  One is when
> > the live object is being moved to its new location.  In that case
> > I would expect that the overwhelmingly common case would be
> > end_obj > last_obj.  The calculation of the new location for a live
> > object
> > (where live_words_in_range() is used) proceeds from left to right
> > (lower
> > to higher addresses) as each region is scanned looking for live
> > objects.  I would expect the execution of line 145 to be seldom
> > if ever, so that just using 147 would be fine.  The other case is
> > less clear.   When an object is being moved, the object references
> > within it are updated.   That  access pattern seems like it would be
> > more
> > random to me (fewer cache hits)  but if you have data that shows
> > that line 145 is  beneficial, that would be a good data point.
>
>   I did some measurements on which branches are taken in
> live_words_in_range() with SPECjbb2015 with constant IR (why constant
> IR? I had that setup and had used that for comparison runs
> before/after, see other email) and clamped down adaptive size policy
> (basically setting all heap sizes, 10g total heap, ~1g live set).
> See the patch here
> https://bugs.openjdk.java.net/secure/attachment/56451/reftypes.diff and
>  the results at https://bugs.openjdk.java.net/secure/attachment/56452/r
> ange_helper_types.png
>
> That graph shows the kind of branching decisions taken during execution
> of every full gc. (X-axis is full gc number, y-axis relative branch
>  execution frequency). Labels include the index into the array
> containing the values.
>
> Ignore the first and last few full gcs, they are startup and shutdown
> related (i.e. system.gc's, whatever).
> From this you can see that actually the branch in lines 144-145 is
> rather important (yellow - 4), as it catches a large amount of
> references that would otherwise need a full call to
> live_words_in_range() (turquoise - 5).
>
> A quick run with 144-145 removed, showed that this is indeed a problem
> and actually around half of the improvements from this patch are
> removed if this condition is removed. So I would opt to keep it :)
>
> With the usual disclaimer of that being only a single data point.
>
> Thanks,
>   Thomas
>

-- 
Lei Tianyang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160115/28d67ed2/attachment.htm>