RFR (M): 8226913: Scale cards per chunk used during heap root scanning with region

Tue Jul 16 08:49:10 UTC 2019

Hi Kim,

On Sat, 2019-07-13 at 19:59 -0400, Kim Barrett wrote:
> > On Jul 10, 2019, at 11:48 AM, Thomas Schatzl <
> > thomas.schatzl at oracle.com> wrote:
> > 
> > Hi all,
> > 
> >  can I have reviews for this change that scales the amount of cards
> > processed per chunk depending on heap/heap region size to decrease
> > some unnecessary overhead when clearing it on larger heaps.
> > 
> > It is basically the answer to this comment in the code:
> >  96   // Random power of two number of cards we want to claim per
> > thread. This corresponds
> >  97   // to a 64k of memory work chunk area for every thread.
> >  98   // We use the same claim size as Parallel GC. No particular
> > measurements have been
> >  99   // performed to determine an optimal number.
> > 
> > I.e. the chosen number simply seemed suboptimal, and should
> > actually be determined adaptively.
> > 
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8226913
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8226913/webrev
> > Testing:
> > hs-tier1-5, manually checking that correct chunk sizes are returned
> > for all possible region sizes
> > 
> > Thanks,
> >  Thomas
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
>   98   // size we limit the total number of chunks to limit memory
> usage and maintenance
> 
> I think there should be a sentence break in "size we".

Added a comma.

> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
>  100   // Testing showed that 8 for 1M/2M region, 16 for 4M/8M
> regions, 32 for 16/32M regions
>  101   // seems to be such a good trade-off.
> ...
>  105     // The original formula is y = 0.4 * x - 5. The below
> formula is tweaked so
> 
> I'm not sure how that formula was derived (some linear fit?), but
> 
>   1u << ((log_region_size / 2) - 7) 
> 
> seems to give equivalent results for the given range and seems (to
> me) more "obviously" based on the given numbers.

Linear fit. I will use your formula though since it is much more
obvious.

> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
>  103     assert(log_region_size >= 20 && log_region_size <= 25,
>  104            "expected value in [20,25], but got %u",
> log_region_size);
> 
> This is hard-coded to the current region size range limits, and
> doesn't seem to be related to any requirements of the calcuation
> being done. Any future expansion of the permitted region size range
> would fail here for no immediately obvious reason. The only reason to
> choose those bounds here seems to be that those are what's actually
> been tested, so as to not make any assumptions about what might be
> good outside that range.

That is the reason - if you do not mind I would prefer the code to fail
and the person changing this think about this code once more (and do at
least cursory testing).
I added a comment indicating that.

If you feel strongly about this, I can remove the assert completely -
maybe at least give a "reasonable" lower bound like 2^16 or so (so that
the calculation does not give ridiculously low values or even
underflow).

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
>  370     return result * (HeapRegion::CardsPerRegion /
> get_chunks_per_region(HeapRegion::LogOfHRGrainBytes));
> 
> Instead of get_chunks_per_region(...), why not
> _scan_chunks_per_region?
> 
> -------------------------------------------------------------------
> -----------

Done.

New webrevs:
http://cr.openjdk.java.net/~tschatzl/8226913/webrev.0_to_1

http://cr.openjdk.java.net/~tschatzl/8226913/webrev.1

Thanks a lot,
  Thomas