RFR: 8344049: Shenandoah: Eliminate init-update-refs safepoint [v2]

Fri Jan 10 00:42:47 UTC 2025

On Thu, 9 Jan 2025 17:47:26 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> Shenandoah typically takes 4 safepoints per GC cycle. Although Shenandoah itself does not spend much time on these safepoints, it may still take quite some time for all of the mutator threads to reach the safepoint. The occasionally long time-to-safepoint increases latency in the higher percentiles.
>> 
>> The `init-update-refs` safepoint is responsible for retiring GCLABs (and PLABs) used during evacuation. Once evacuation is complete, no threads will access these LABs. This need not be done on a safepoint. `init-update-refs` is also where the global and thread local copies of the `gc_state` are updated. However, here we are turning off the `WEAK_ROOTS` flag _after_ all of the unmarked weak referents have been `nulled` out, so this does not need to happen atomically with respect to the mutators. Neither is it necessary to change the other state flags (EVACUATION, UPDATE_REFS) atomically across all mutators.
>> 
>> Note that the `init-update-refs` safepoint is still taken if either verification or `ShenandoahPacing` are enabled.
>
> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision:
> 
>  - Improve comments
>  - Merge remote-tracking branch 'jdk/master' into remove-init-update-refs-safepoint
>  - Fix comments
>  - Fix comment, revert unnecessary change
>  - Merge remote-tracking branch 'jdk/master' into remove-init-update-refs-safepoint
>  - Fix phase encoding to handle weak roots
>  - WIP: Use Threads::threads_do for propagating gc state (consolidated)
>  - WIP: Use Threads::threads_do for propagating gc state
>  - Remove unnecessary gc state propagations
>  - Encapsulate gc state
>  - ... and 22 more: https://git.openjdk.org/jdk/compare/e0773235...83ac7b49

src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 368:

> 366:   // This updates the singular, global gc state. This call must happen on a safepoint.
> 367:   // However, in some cases (init update refs, e.g.), the gc state may change concurrently
> 368:   // and will be propagated to all threads by a handshake operation.

I am a little bit confused by the statement starting at "However, ...". Did you mean that the "local copy of the global state" may be changed outside of a safepoint but not the global state itself? 
I notice that `set_gc_state()` still asserts that we are at a safepoint:
https://github.com/openjdk/jdk/blob/83ac7b49d34081beb3ff58f1c159d22faacd077a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L2000

Ah, now I see that you use a different API for setting the global gc state outside of a safepoint.

If my understanding is correct, then we should probably rename the APIs such that the one that is expected to be set at a safepoint uses `set_gc_state_at_safepoint()` and the one that doesn't might use `set_gc_state_concurrent()` or something like that. That would be less confusing.

It also brings up the issue of what specific state predicates it's safe to test when. E.g. whether `is_gc_state()` can be safely tested any time during a safepoint or concurrently. I think it is safe, but explicitly stating this might be useful, not least because we seem to have one state change API that still asserts that we should be at a safepoint.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22688#discussion_r1909623411