RFR: 8343468: GenShen: Enable relocation of remembered set card tables

Y. Srinivas Ramakrishna ysr at openjdk.org
Tue Jan 21 23:28:35 UTC 2025


On Fri, 17 Jan 2025 21:16:00 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. 
>> 
>> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase.
>> 
>> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table.
>> 
>> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address.
>> 
>> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake—something we plan to work on after this PR is merged.
>> 
>> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code.
>> 
>> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I’d appreciate it if someone with access to RISC-V (@luhenry  ?) and PowerPC (@TheRealMDoerr ?)  platforms could review and test the changes for those platforms, as I have limited access to running tests on them.
>
> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 228:
> 
>> 226: // This avoids the need to synchronize reads of the table by the GC workers doing
>> 227: // remset scanning, on the one hand, with the dirtying of the table by mutators
>> 228: // and by the GC workers doing remset scans, on the other.
> 
> Remove "and by the the GC workers doing remset scans" from this comment, so it reads:
> "This avoids the need to synchronize reads of the table by the GC workers doing
> remset scanning, on the one hand, with the dirtying of the table by mutators on the other.

@kdnilsen , when are the existing intergenerational pointers (represented by (a subset of) the dirty cards in the read table) transferred to the write table?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1924486803


More information about the shenandoah-dev mailing list