RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v5]

Aleksey Shipilev shade at openjdk.org
Tue Mar 4 20:09:58 UTC 2025


On Tue, 4 Mar 2025 04:13:33 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. 
>> 
>> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase.
>> 
>> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table.
>> 
>> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address.
>> 
>> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake—something we plan to work on after this PR is merged.
>> 
>> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code.
>> 
>> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I’d appreciate it if someone with access to RISC-V (@luhenry  ?) and PowerPC (@TheRealMDoerr ?)  platforms could review and test the changes for those platforms, as I have limited access to running tests on them.
>
> Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Revert changes to shared cardTable.hpp
>  - Revert changes to shared cardTable.hpp

Much cleaner, thanks! I'll take another look later, but meanwhile, some comments:

src/hotspot/cpu/arm/gc/shared/cardTableBarrierSetAssembler_arm.cpp line 100:

> 98:   assert(bs->kind() == BarrierSet::CardTableBarrierSet,
> 99:          "Wrong barrier set kind");
> 100: 

Unnecessary deletion of blank line?

src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 655:

> 653: 
> 654: #ifndef _LP64
> 655:   __ pop(tmp1);

Sounds like `tmp1` is undefined here. Should be `tmp`?

src/hotspot/os_cpu/linux_arm/javaThread_linux_arm.cpp line 46:

> 44:   if (UseShenandoahGC) {
> 45:     _card_table_base = nullptr;
> 46:     return ;

Suggestion:

    return;

src/hotspot/os_cpu/linux_arm/javaThread_linux_arm.cpp line 50:

> 48:     _card_table_base = nullptr;
> 49:   }
> 50: 

Unnecessary removals of blank lines?

src/hotspot/share/ci/ciUtilities.cpp line 49:

> 47:   CardTableBarrierSet* ctbs = barrier_set_cast<CardTableBarrierSet>(bs);
> 48:   CardTable* ct = ctbs->card_table();
> 49:   SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");)

Here is a bit of a trick about the `Use${X}GC` flags: you don't need to guard them with `${X}GC_ONLY` macros. They are specifically designed that way: they reside in `gc_globals.hpp` without any feature flags.

src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp line 25:

> 23:  */
> 24: 
> 25: #include "gc/shenandoah/shenandoahThreadLocalData.hpp"

Includes should be sorted alphabetically.

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 268:

> 266: 
> 267: void ShenandoahGeneration::prepare_gc() {
> 268: 

Unnecessary removal.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 258:

> 256:   if (ShenandoahCardBarrier) {
> 257:     ShenandoahThreadLocalData::set_card_table(Thread::current(), bs->card_table()->write_byte_map_base());
> 258:   }

Er. This sets up card table for VMThread, right? I am surprised we do not need this for other fields in `ShenandoahThreadLocalData`.

src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 407:

> 405:   ShenandoahCardCluster(ShenandoahDirectCardMarkRememberedSet* rs) {
> 406:     _rs = rs;
> 407:     _object_starts = NEW_C_HEAP_ARRAY(crossing_info, rs->total_cards()+1, mtGC);

What is this `+1`? This is #23882, right?

-------------

PR Review: https://git.openjdk.org/jdk/pull/23170#pullrequestreview-2656931853
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980148491
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979192037
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980147454
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980121669
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980118417
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1980116049
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979940218
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979944657
PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1979158102


More information about the shenandoah-dev mailing list