RFR (M): 8146987: Improve Parallel GC Full GC by caching results of live_words_in_range() [Was: Re: [PATCH] enhancement to ParallelScavenge Full GC]

Jon Masamitsu jon.masamitsu at oracle.com
Thu Jan 14 20:09:42 UTC 2016


http://cr.openjdk.java.net/~tschatzl/8146987/webrev.1/src/share/vm/gc/parallel/parMarkBitMap.cpp.frames.html

143 } else if (end_obj < last_obj) {
144 if (pointer_delta((HeapWord*)end_obj, (HeapWord*)beg_addr) > 
pointer_delta((HeapWord*)last_obj, (HeapWord*)end_obj)) {
145 last_ret = last_ret - live_words_in_range_helper((HeapWord*)end_obj, 
last_obj);
146 } else {
147 last_ret = live_words_in_range_helper(beg_addr, end_obj);
148 }

Did you measure the performance improvement afforded by  lines 144 - 145?

The calculation of the new address is used in two cases.  One is when
the live object is being moved to its new location.  In that case
I would expect that the overwhelmingly common case would be
end_obj > last_obj.  The calculation of the new location for a live object
(where live_words_in_range() is used) proceeds from left to right (lower
to higher addresses) as each region is scanned looking for live
objects.  I would expect the execution of line 145 to be seldom
if ever, so that just using 147 would be fine.  The other case is
less clear.   When an object is being moved, the object references
within it are updated.   That  access pattern seems like it would be more
random to me (fewer cache hits)  but if you have data that shows
that line 145 is  beneficial, that would be a good data point.

Jon



On 01/14/2016 06:54 AM, ray alex wrote:
>
> We previously evaluated some applications from JOlden, Dacapo and 
> SpecJVM2008 benchmark suites.
> We believe these tests could reflect the essential behaviors of 
> parallel GC's full gc.
>
> Specifically, the applications are:
> BiSort, Em3d, MST, TreeAdd, Health, Perimeter (JOlden)
> GCBench,
> H2 (Dacapo)
> Xml.validation, Compiler.compiler (SPECjvm2008)
> with a 1GB heap size for all of them (except Health with 3GB).
> The results are averagely 50% for them above.
>
> Actually, the heap size is a factor that impacts the frequency of full 
> gcs, but does not influence the effect of this approach.
> As long as a full gc is triggered, we believe this caching could bring 
> improvement, and the percentage could vary according to the access 
> patterns of objects in different programs.
> Though in some scenarios the improvement may be less, it almost does 
> not impose any overhead.
>
> Hoping these could be helpful for the testing.
>
> I'm looking forward to your new test result of the patch and the 
> upstream process.
> Please remind me if I could help on the test.
>
> Thanks!
>
>
> 2016-01-13 23:51 GMT+08:00 Thomas Schatzl <thomas.schatzl at oracle.com 
> <mailto:thomas.schatzl at oracle.com>>:
>
>     Hi Tianyang,
>
>     On Sun, 2016-01-10 at 22:16 +0800, ray alex wrote:
>     > Hi,
>     > By following the conventions, I remove the prefix "get_" and rename
>     > getter/setter to "last_query_*" in this patch version 4 (attached).
>     > Besides, I ran the measurements mentioned before (both for version 3
>     > and recheck this version 4 ), and there was no regressions.
>     >
>     > May I have your reviews on this patch?
>     >
>     > Thanks!
>
>      first, sorry for the delay, holidays and such...
>
>     I created a CR for this enhancement at
>     https://bugs.openjdk.java.net/browse/JDK-8146987
>
>     I also made some webrevs for this change that fix some minor issues:
>
>     base patch that includes some compilation fixes:
>     http://cr.openjdk.java.net/~tschatzl/8146987/webrev.0/
>     <http://cr.openjdk.java.net/%7Etschatzl/8146987/webrev.0/>
>
>     - recent changes introduced some merge errors in psParallelCompact,
>     mostly due to Unified logging
>     - the assert in psParallelCompact.cpp:3137 missed a parameter so the
>     sources did not compile in any debug mode.
>
>     Some minor style fixes:
>     http://cr.openjdk.java.net/~tschatzl/8146987/webrev.1/
>     <http://cr.openjdk.java.net/%7Etschatzl/8146987/webrev.1/> (full)
>     http://cr.openjdk.java.net/~tschatzl/8146987/webrev.0_to_1/
>     <http://cr.openjdk.java.net/%7Etschatzl/8146987/webrev.0_to_1/> (diff)
>
>     - made a few of the new methods private
>     - some coding style fixes: indentation, trailing white spaces,
>     copyright dates
>     - removed the new parMarkBitMap.inline.hpp again because its
>     methods were only called from parMarkBitmap.cpp, fixed includes to
>     parMarkBitmap.cpp.
>     - prepare some push message, please check
>
>     I did run internal gc testing on the .0 webrev (vm.gc testlist),
>     and did some performance verification (mainly on specjbb201x as
>     this was what was on hand). It does (as expected) improve the perf
>     of full gc, although the improvements are not as high as mentioned
>     in your initial email, around 15% average full gc time (for some
>     rather specific configuration). That is still quite good imo.
>
>     Reproducing the results has been one reason why this reply took a
>     while, it needs a bit of carefully setting up tests to get results
>     that are not drowned in noise due to gc occurring on widely
>     different heap contents, and particularly ergonomics.
>     Not sure how you managed to do this with something like dacapo.
>     SPECjvm2008 in "realistic" configurations does not do a lot of (if
>     any) full gcs.
>
>     The change looks good to me now, but I need to do some more
>     testing again with the .1 version, to make sure the cleanup did
>     not introduce any regressions anywhere.
>
>     Thanks,
>       Thomas
>
>
>
>
> -- 
> Lei Tianyang
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160114/b333dedf/attachment.htm>


More information about the hotspot-gc-dev mailing list