RFR: 8305896: Alternative full GC forwarding [v25]

Wed May 3 21:42:24 UTC 2023

On Wed, 3 May 2023 19:19:44 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large.
>> 
>> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header.
>> 
>> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions.
>> 
>> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region.
>> 
>> With this, forwarding information would be encoded like this:
>>  - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded.
>>  - Bit 2: Used for 'fallback'-forwarding (see below)
>>  - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above)
>>  - Bits 4..31 The number of heap words from the target base address
>> 
>> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table.
>> 
>> All the table accesses can be done unsynchronized because:
>> - Serial GC is single-threaded anyway
>> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions.
>> - G1 serial compaction is single-threaded
>> 
>> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers).
>> 
>> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately).
>> 
>> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header.
>> 
>> Testing:
>>  - [x] hotspot_gc -UseAltGCForwarding
>>  - [x] hotspot_gc +UseAltGCForwarding
>>  - [x] tier1 -UseAltGCForwarding
>>  - [x] tier1 +UseAltGCForwarding
>>  - [x] tier2 -UseAltGCForwarding
>>  - [x] tier2 +UseAltGCForwarding
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix type narrowing

Changes requested by tschatzl (Reviewer).

src/hotspot/share/gc/shared/gc_globals.hpp line 699:

> 697:                                                                             \
> 698:   product(bool, UseAltGCForwarding, false, EXPERIMENTAL,                    \
> 699:           "Use alternative GC forwarding that preserves object headers")    \

I would strongly prefer if this were not a product flag at this time, but a develop flag.

It potentially decreases performance of serial gc full gcs by a significant amount with no upside at all (not that worried about g1 or other concurrent gcs). Can you give me reasons why an end user would ever consciously enable this flag?

Using a develop flag is only a minor annoyance for development - we already do that for other features like evacuation failure injection in G1. For end users this would result in (guaranteed) zero performance impact. 

Only when adding compressed object headers with Lilliput this should be changed to a product flag.

I do not know your schedule for upstreaming Lilliput, but if it would miss JDK 21, people would suffer from this for the entire lifetime of JDK 21.... which is an LTS release. (Fwiw I would suggest the same for a non-LTS release, it seems to be worse in this situation though).

src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43:

> 41: 
> 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) {
> 43:   uint index = static_cast<uint>(pointer_delta(addr, _heap_start) >> _region_size_words_shift);

I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this.

Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value.

src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 59:

> 57:     // Primary is free
> 58:     _bases_table[base_idx] = to_region_base;
> 59:   } else if (_bases_table[base_idx] == to_region_base) {

This probably won't help at all with performance, but I would kind of put the checks for the common cases where the table values are set (particularly the first one) first (I may be wrong about whether this is possible).
The `UNUSED_BASE` values in the tables will be encountered exactly once...

-------------

PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1411879100
PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184309687
PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184310440
PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184313798