RFR: 8305896: Alternative full GC forwarding [v48]

Roman Kennke rkennke at openjdk.org
Fri Jun 16 14:59:28 UTC 2023


On Fri, 16 Jun 2023 13:02:48 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large.
>> 
>> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header.
>> 
>> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions.
>> 
>> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region.
>> 
>> With this, forwarding information would be encoded like this:
>>  - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded.
>>  - Bit 2: Used for 'fallback'-forwarding (see below)
>>  - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above)
>>  - Bits 4..31 The number of heap words from the target base address
>> 
>> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table.
>> 
>> All the table accesses can be done unsynchronized because:
>> - Serial GC is single-threaded anyway
>> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions.
>> - G1 serial compaction is single-threaded
>> 
>> The c...
>
> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 115 commits:
> 
>  - Further templatize Serial GC's adjust_pointers()
>  - Merge branch 'master' into JDK-8305896
>  - Specialize full-GC loops to get UseAltGCForwarding flag check out of hot paths
>  - Remove G1-only assert for fallback forwarding, and comment with explanation
>  - Merge branch 'master' into JDK-8305896
>  - Replace homegrown FallbackTable with a ResourceHashtable based impl
>  - Merge remote-tracking branch 'origin/JDK-8305896' into JDK-8305896
>  - Some more @shipilev comments
>  - Update src/hotspot/share/gc/shared/slidingForwarding.hpp
>    
>    Co-authored-by: Aleksey Shipilëv <shipilev at amazon.de>
>  - Align fake-heap without GCC warnings (duh)
>  - ... and 105 more: https://git.openjdk.org/jdk/compare/b412fc79...524f9c52

I found one more place where I would need to specialize a loop in MarkSweep::adjust_pointers(). I've run the performance test again and now get these results:

baseline: 379.34ms
noaltfwd: 367.73ms (-3.0%)
altfwd: 435.17ms (+14.7% vs baseline, +18.3% vs noaltfwd)

1. Not sure where the earlier 7% figure came from. I've repeated the experiment a couple of times and the results are very reliable.
2. Yes, that is a 3% improvement. I think I improved inlining a little bit, by moving stuff to places where the templates would be picked up. That's a reliable result, too (but feel free to run your own experiments).

I would prefer to keep the flag and specialized loops. I really don't want users of the legacy (no-compact-headers) path to experience any bad performance or stability surprises. (Especially not when eventually backporting it to 21u or maybe even 17u).

Do we want to move this PR forward, now that we have the jdk22 train open?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1594828570


More information about the hotspot-gc-dev mailing list