RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v3]

Kelvin Nilsen kdnilsen at openjdk.org
Tue Jan 16 21:40:56 UTC 2024


On Tue, 16 Jan 2024 21:22:37 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 208:
>> 
>>> 206: //  3. The count of threads authorized to evacuate for allocation has been decremented, because this thread is no
>>> 207: //     longer authorized.
>>> 208: //  4. We have waited for all evacuating threads to stop allocating, after which it is safe for this thread to resolve
>> 
>> This is where I think the implementation breaks down. If thread `A` is the _first_ thread to attempt evacuation and it fails _before_ any other thread has attempted to evacuate, then thread `A` "believes" that no other threads are evacuating and it returns immediately. How does the protocol then prevent other threads from proceeding with evacuation.
>
> Even when thread A is the first thread to attempt evacuation, it will:
> 1. Iterate through all counters and set the OOM bit on each one.  Any new thread that attempts to enter_evacuation() will check the OOM bit on its respective counter.  If the OOM bit is already set, that new thread will not be authorized to allocate.  If the thread happens to enter before this thread A has set its OOM bit, then we resolve this in the next step.
> 2. After setting the OOM bit on each counter, we wait_for_no_evac_threads before we consider it safe to make use of a from-space pointer.

So what about this scenario?
1. Thread A is setting the OOM bits on all counters, has decremented its own count, but has not yet finished setting all OOM bits
2. Thread B newly tries to enter_evacuation() so it invokes register_thread, and it finds the OOM bit is set.  It waits for no_evac_threads() and this immediately returns, because there are no evac thread running at this moment.
3. Now thread C newly tries to enter_evacuation() so it invokes register thread, but thread A has not yet set this thread's OOM bit, so thread C proceeds to enter_evac with authorization to evacuate.
4. Thread A will wait for thread C to finish evacuating, but thread B is not waiting for thread C to finish evacuating.

This does appear to be a bug.  I think the fix is that register_thread also needs to repeat the loop that sets all OOM bits before it waits for no evac threads.

I expect it is very rare for this to occur, and it only occurs when we're already in dire circumstances, so I don't think it's worth the effort to optimize an implementation that avoids having multiple threads redundantly set the OOM bits on every counter.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1454070259


More information about the hotspot-gc-dev mailing list