RFR [L][3/7]: 8197850: Calculate liveness in regions during marking
Thomas Schatzl
thomas.schatzl at oracle.com
Tue Mar 6 13:33:54 UTC 2018
On Tue, 2018-03-06 at 13:31 +0100, Stefan Johansson wrote:
> Hi Thomas,
>
> On 2018-03-05 15:41, Thomas Schatzl wrote:
> > Hi all,
> >
> > can I have reviews for this change that calculates the liveness
> > in regions (below nTAMS) during marking?
> >
> > This is required to be more clever about selecting the regions G1
> > is going to rebuild the remembered sets for (i.e. the well-known
> > G1MixedGCLiveThresholdPercent threshold).
> >
> > This slows down the marking part of the concurrent phase a bit, but
> > all alternatives were worse:
> >
> > - initially rebuild everything and then drop remembered sets for
> > regions for too highly occupied regions at cleanup: wastes lots of
> > space and memory maintaining these superfluous remembered sets.
> >
> > - choose regions using previous marking cycle data: this is
> > generally not a good idea if your marking cycles are infrequent
> > (which should be the normal case), as it tends to make too few
> > regions available for evacuation (e.g. eager reclaim may already
> > reclaim a lot). I.e. you need extra headroom, or be more aggressive
> > with rebuilding.
> >
> > The main change, next to the calls that add the liveness to some
> > internal data structure every time an object is marked, is some
> > per- thread cache that (G1RegionMarkStatsCache). It helps
> > decreasing the synchronization overhead the (atomic) increments in
> > the global result cause.
> >
> > It is a very simple fixed size hashmap where increments to the same
> > entries are collected, and on collision evicts the previous entry,
> > doing the atomic operation on the global result.
> >
> > There is also some unfortunate moving of code that is required to
> > flush these caches correctly during mark overflow handling.
> >
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8197850
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8197850/webrev/index.html
>
> Look good, just some minor things.
>
> src/hotspot/share/gc/g1/g1ConcurrentMark.cpp
> 394 _region_mark_stats(NEW_C_HEAP_ARRAY(G1RegionMarkStats,
> max_regions, mtGC))
>
> max_regions is passed into the constructor but everywhere else in
> G1ConcurrentMark we use g1h->max_regions(). I would consider either
> calling g1h->max_regions() in the constructor too, or save the value
> in a member and use it everywhere.
Done using option 1), use g1h->max_regions() everywhere.
> ---
>src/hotspot/share/gc/g1/g1_globals.hpp
> 262 experimental(size_t, G1RegionMarkStatsCacheSize, 1024,
> ...
> 265 range(128, (max_juint / 2) + 1)
>
> Do we need this flag or should we scale it against the number of
> regions? If we want to keep it I think we should have a somewhat
> more restrictive range.
I would not want this to scale according to the number of heap regions,
because that is going to have a significant impact on pause times when
all of the thread's caches are evicted (The O(#regions) part).
The flag has only been added (rather late, admittedly) to fix potential
issues with marking time with really large heaps with tens of thousands
of regions.
Let me do some more measurements, I will get back to you with them
before giving you a new webrev.
> ---
> src/hotspot/share/gc/g1/g1RegionMarkStatsCache.hpp
>
> In G1RegionMarkStatsCacheEntry _region_idx is set to 0 when clearing.
> I don't see a problem with it right now, but it feels a bit wrong
> since index 0 is a valid region index. Maybe we could use another
> cleared marker.
Unfortunately there is none (one could use something like -1, but that
one is valid too), and it seems a waste of memory to reserve more space
for it.
I see your point that this is a bit ugly, but there should never be any
issue because of the initialization of the value part of an entry with
zero. It simply has no impact on totals if you add zero.
The code could initialize the cache with region indices from 0 to cache
size - 1, which are valid values given the hash function, what do you
think?
However that would slow down clearing a little (the compiler will just
compile it to a memset() in the loop right now).
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list