G1 RSet improvement ideas

Thu Jan 30 17:33:40 UTC 2014

Hi all,

I had a recent e-mail exchange with Bengt and Thomas and we touched a 
bit on G1 RSet work. RSets can be a major bottleneck in G1 right now, 
but I don't believe it's fundamental issue with the concept. IMHO, the 
implementation can be substantially improved to be much more efficient. 
I recently created a few issues in our (Twitter) bug system with some 
ideas on how to do that. I pasted all the text below (but replaced the 
internal issue #s with (1), (2), etc.).

Let me know if you have any ideas, thoughts, etc. on this.

Tony

----------

(1) Use a single RSet for all young regions

In G1 each region has its own Remembered Set (RSet). During a 
collection, all RSets of the Collection Set (CSet) regions need to be 
scanned. Given that all young regions are always collected together 
during the same collection, it will be beneficial to actually make all 
young regions point to the same physical RSet. This will have several 
advantages:

- lower overall RSet footprint (better to have one fuller RSet, than 
many mostly empty ones; this will also quickly eliminate duplicate RSet 
entries across all young regions)
- more efficient RSet scanning (the number of RSets scanned during a 
collection will not depend on the number of young regions in the CSet)

Minus any issues and assumptions that will need to be ironed out in the 
RSet code, doing this should be straightforward. During a GC, a new RSet 
is created and all Survivor regions created during that GC will be made 
to point to it. Then, as new Eden regions are allocated, they will also 
point to the same RSet. At the start of the next GC, all Eden / Survivor 
regions will be pointing to the same RSet. During young GCs, we'll only 
need to scan that RSet. During mixed GCs, we'll need to scan the young 
RSet and all the RSets of the old regions. But this will be an 
improvement over what we do now. We'll also need to decouple the young 
RSet freeing code from the region reclamation code so that we don't 
attempt to reclaim the same RSet multiple times.

----------

(2) G1 RSets: Use a single RSet for old regions of a single CSet

(This is a follow-up to (1))

For the same reasons that it's a good idea to use a single Remembered 
Set (RSet) for all young regions, it'd be also helpful to use a single 
RSet for all old regions that are added to a Collection Set (CSet). In 
combination with (1), this will result in only having either one (young 
GCs) or two (mixed GCs) RSets to scan during GC.

This will be quite a lot more work to do, compared to (1), given that we 
only choose the old regions to add to a CSet at the start of a GC. To 
achieve this we'll have to do a bit more advance preparation: choose 
groups of old regions to collect together (maybe do that concurrently to 
minimize STW work) and combine their RSets. So, at each mixed GC, we'll 
pick an old region group, and its associated RSet, and collect them 
along with the young regions.

----------

(3) G1 RSets: Use one RSet for each GC

(follow-up to (1) and (2))

If we manage to have one young RSet and one old RSet for each mixed GC, 
it'd be straightforward to combine the two to always end up having to 
scan a single RSet at every GC.

(2) proposes to create old region groups, each with its own combined 
RSet. We could take this further and grab this combined old RSet at the 
start of a GC and use that as the young RSet (for Survivor regions 
allocated during that GC and Eden regions allocated after that GC). So, 
at the start of the next GC, all regions in the CSet (Eden, Survivors, 
old region group) will have the same RSet.

----------

(4) G1 RSets: Filter old RSets before a collection

Old RSets tend to grow quite large and sometimes they contain a large 
amount of stale entries that need to be scanned for nothing during a GC. 
After concurrent marking has completed, but maybe before each mixed GC, 
we could introduce a concurrent process that scans RSets of regions that 
will be collected and filters out any stale entries (by scanning the 
cards that the RSet points to and removing any cards that don't point to 
the region any more). This should reduce the amount of RSet scanning 
work that's done during mixed GCs.

The easiest way to do this, to avoid any nasty races / MT issues, is to 
give each old region a new RSet (during a safepoint, say), store the old 
one somewhere, scan it (concurrently), and add any entries that need to 
be retained to the new one. At the end, we can just reclaim the old one. 
This way, any new updates to the new RSet will automatically happen 
through the normal path. Explicitly removing an entry, instead of 
recreating the RSet as described, will be much harder, given that we'd 
have to synchronize with other threads potentially trying to add the 
same entry.

This can be done in addition to (2). In fact, the two operations could 
also be combined (i.e. combined old region group's RSets and also filter 
them at the same time).

----------

(5) G1 RSets: Don't use the card table during RSet scanning to reduce 
duplication

Currently, while scanning Remembered Sets (RSets) during a GC we use the 
card table to mark whether a particular card has been scanned or not 
(since the same card might appear on multiple RSets). This is bad for a 
couple of reasons (the first one is the important one, the second one we 
could address reasonably easily):

- First, we have to clear the card table at the end of the GC, which 
requires a non-trivial amount of work.
- Second, the card marks during RSet scanning are not done atomically 
(atomics were deemed too expensive). This is safe (rescanning a card is 
not a correctness issue and it shouldn't happen very often anyway). 
However, this prevents some subtle optimizations (when one thread comes 
across a reference on a card that points into the CSet and pushes its 
location on the stack, it has to recheck that it still points into the 
CSet when it eventually pops it from the stack, given that another 
thread might also scan that card and update the reference meanwhile).

A better solution will be to somehow use the RSet data structure for 
each thread to claim RSet chunks. It can even be done in a destructive 
way, given that the RSet will be reclaimed at the end of the GC anyway.

This depends on (3), given we can only ensure that each card is claimed 
exactly once if it appears on a single RSet.

----------

(6) G1 RSets: Only create old region RSets when they are needed

Currently, each region has its own RSet from the beginning and that's 
maintained throughout the region's lifetime. For very long regions, it 
turns out that long-lived old regions end up with a lot of stale entries 
in their RSets (as ref fields are updated over time to point to 
different objects) which make them very bloated. An important 
observation is that the old region RSets are only really needed during 
mixed GCs after a concurrent cycle. Sometimes, concurrent cycles happen 
very infrequently (every few hours, maybe) so we constantly maintain the 
old region RSets and only take advantage of them very infrequently.

A better approach would be to eliminate RSets from old regions (say, set 
the RSet field to NULL for those regions so that any refinement 
operation will just do nothing for refs that point to such regions) and 
only recreate them when needed.

One approach, which has been done by other GCs, is to recreate the RSets 
during marking (as the marking phase will visit all live objects).

Maybe a better approach will be to finish the marking phase, do some 
quick analysis on the old regions to decide which to collect and which 
not to collect, and do a sweep phase to populate the RSets of the 
regions that will be collected. Only when that operation is done, mixed 
GCs will be allowed.

This is a (better but more involved) alternative to (4) as the recreated 
RSets will not be very long-lived, so they will hopefully not need 
further filtering.

This can be combined with the approach in (2), i.e., find the old region 
groups we'll collect, give all regions in each group a new RSet, and 
only then do the sweeping phase to create directly RSets per group 
instead of per region.

----------

(7) G1 RSets: Redesign for remembered set data structures

This is somewhat orthogonal to the other G1 RSet issues, but there 
should be a lot of gains to be made by rethinking / redesigning the RSet 
data structures. The concept of having multiple RSet granularities is 
quite interesting. Unfortunately, the implementation is quite 
inefficient. In particular, there are two hash tables (for fine and 
sparse entries) which is completely wasteful (requires two look-ups and 
extra footprint for the second table). Also, several operations require 
locks which affects scalability. Finally, the implementation is too 
dependent on the region abstraction (i,e., it's baked in the code that 
each RSet component represents a region-to-region relationship) which 
makes them very inflexible (RSset components can get very large when 
very large regions are used).

The goals of a redesign should be the following:

- At a high-level, the new structure should just be a card address -> 
bool map (false -> card not included in the RSet, true -> card included 
in the RSet).
- A subdivision of the heap, so that each such heap chunk is represented 
by a different structure, can still be used. However, chunks should not 
be tied to regions. I.e., a region might be covered by several chunks 
(helpful, when regions are large).
- The fine / medium / coarse granularity approach will still be useful 
to represent each chunk as efficiently as possible. However, there 
should be a single hash table that will keep track of the entries of all 
granularities.
- The RSet structure should have a facility for threads to claim entries 
to work on them. This is to facilitate (5).
- Addition operations should be entirely lock-free.
- RSet iterations should be done using iterators.
- A facility to efficiently do a union of various RSets will also be 
helpful (to facilitate (2)).

-- 
Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com