RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v5]
Y. Srinivas Ramakrishna
ysr at openjdk.org
Tue Feb 6 03:33:58 UTC 2024
On Wed, 24 Jan 2024 17:53:39 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:
>> The protocol for handling OOM during evacuation is subtle and critical for correct operation. This PR does NOT change behavior. It provides improved documentation of existing behavior.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix spelling error and mismatched parentheses.
I like the longer block comment you wrote in the .hpp file describing the protocol because it provides fuller context and defines the intent of the protocol in greater detail.
I am not sure which I would go with as both descriptions look good to me in their own way. The existing one has the benefit of being both concrete and concise. With that in mind, I also left a few comments on the original documentation on the left side.
I am good with whatever you choose to use.
The smaller individual documentation comments for each method look good to me, irrespective.
Sorry for the long delay in getting back on this. The protocol is subtle, and I like your ideas about potentially improving it in the future.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 46:
> 44: void ShenandoahEvacOOMCounter::clear() {
> 45: assert(unmasked_count() == 0, "sanity");
> 46: Atomic::release_store_fence(&_bits, (jint)0);
Leaving a comment here but it applies to the comment at line 40 above, which reads:
// NOTE: It's ok to simply decrement, even with mask set, because unmasked value is positive.
May be leave a block comment at the start of the method at line 37 that states:
// Decrement the counter atomically, leaving the OOM bit unchanged at its original state.
Then, the comment at current line 40, could :
// The value is necessarily positive before we decrement, as we assert above, because
// this thread incremented it earlier. Since we atomically decrement a positive value,
// the state of the OOM bit is left unchanged at its original value.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 52:
> 50: // associated with this counter. After all _num_counters OOM bits have been set, all threads newly attempting to enter_evacuation
> 51: // will be informed that they cannot allocate for evacation. Threads that entered evacuation before the OOM bit was set may
> 52: // continue to allocate for evacuation until they exit_evacuation.
This can simply state:
// Set the OOM bit, and optionally decrement the counter
I don't think you need to describe how this fits into the OOM protocol, at least not here. That confuses the documentation and the reader.
That can be put in the caller or in a block comment describing the protocol.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 62:
> 60: jint other = Atomic::cmpxchg(&_bits, threads_in_evac, newval);
> 61: if (other == threads_in_evac) {
> 62: // Success: return so we can wait for other threads to stop allocating.
I would simplify this comment to:
// Successfully set the OOM bit (and optionally decremented the counter of threads_in_evac)
The context of what happens after we return should be described in the caller, not here.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 65:
> 63: break;
> 64: } else {
> 65: // Failure: try again with updated new value.
Adding comment here, but applies to `ShenandoahEvacOOMCounter::try_increment()` below, lines 71-89, as a block comment before line 71; one could document it as:
// Unless OOM bit is set, increment the counter and return true.
// If OOM bit is set, simply return false without incrementing the counter.
The context of what the caller does, should be described in the caller, not in this method.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 85:
> 83: * OOM-during-evac-handler. The handler allows multiple threads to enter and exit
> 84: * evacuation path, but on OOME it requires all threads that experienced OOME to wait
> 85: * for current threads to leave, and blocks other threads from entering. The counter state
After the period on line 85, I'd add one sentence:
// The counter not only tracks the number of threads in the evacuation path,
// but also whether any thread has encountered an OOM-during-evac. It thus
// captures all of the state needed to track the execution of the protocol.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 87:
> 85: * for current threads to leave, and blocks other threads from entering. The counter state
> 86: * is striped across multiple cache lines to reduce contention when many threads attempt
> 87: * to enter or leave the protocol at the same time.
At the end of the period at line 87, I'd add:
// As a result, the protocol needs special steps, in the event of an OOM-during-evac,
// to ensure that all of the striped counters are zero before the protocol can terminate.
// Once the protocol terminates with the OOM bit set, no threads will attempt
// further allocations for evacuation, so any unresolved forwarding pointer uniquely
// to either its new already-forwarded location or to its original to-space location.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 130:
> 128: * safepoint. Marking by Full GC will finish updating references that might
> 129: * be inconsistent within the heap, and will then compact all live memory within
> 130: * the heap.
I like the longer comment you wrote because it provides fuller context and defines the intent of the protocol in greater detail.
I am not sure which I would go with as both descriptions look good to me in their own way. With that in mind, I also left a few comments on the original documentation on the left side.
I am good with whatever you choose to use.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 136:
> 134: * Maintain a count of how many threads are on an evac-path (which is allocating for evacuation)
> 135: *
> 136: * Upon entry of the evac-path, entering thread will attempt to increase the counter,
"atomically increment the counter, if the OOM-bit isn't set."
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 147:
> 145: *
> 146: *
> 147: * Upon exit, exiting thread will decrease the counter using atomic dec.
atomically decrement the counter; rather than "decrease the counter using atomic dec."
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 172:
> 170: * make the protocol more efficient.
> 171: *
> 172: * TODO: make refinements to the OOM-during-evac protocol so that it is less disruptive and more efficient.
May be all of this and the remainder of this comment in terms of improvements from line 162 above up to line 203 below should instead go in a JBS ticket, include here only a terse TODO with a pointer to the ticket for details:
// TODO: JDK-XXXX will investigate potential performance/efficiency improvements to this protocol.
src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 212:
> 210: _oom_not_evacuating
> 211: };
> 212: volatile ShenandoahEvacuationState _evacuation_state;
Leave a single line of documentation here stating that this is an auxiliary field introduced just for the sake of checking an invariant of the protocol.
-------------
Marked as reviewed by ysr (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/17385#pullrequestreview-1842713670
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479084937
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479059992
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479060963
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479063865
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479121544
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479122394
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479174040
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479158182
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479123287
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479102766
PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479066055
More information about the hotspot-gc-dev
mailing list