RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v4]

William Kemper wkemper at openjdk.org
Wed Jan 17 18:17:51 UTC 2024


On Wed, 17 Jan 2024 00:26:16 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> So what about this scenario?
>> 1. Thread A is setting the OOM bits on all counters, has decremented its own count, but has not yet finished setting all OOM bits
>> 2. Thread B newly tries to enter_evacuation() so it invokes register_thread, and it finds the OOM bit is set.  It waits for no_evac_threads() and this immediately returns without authorization to evacuate, because there are no evac thread running at this moment.
>> 3. Now thread C newly tries to enter_evacuation() so it invokes register thread, but thread A has not yet set this thread's OOM bit, so thread C proceeds to enter_evac with authorization to evacuate.
>> 4. Thread A will wait for thread C to finish evacuating, but thread B is not waiting for thread C to finish evacuating.
>> 
>> This does appear to be a bug.  I think the fix is that register_thread also needs to repeat the loop that sets all OOM bits before it waits for no evac threads.
>> 
>> I expect it is very rare for this to occur, and it only occurs when we're already in dire circumstances, so I don't think it's worth the effort to optimize an implementation that avoids having multiple threads redundantly set the OOM bits on every counter.
>
> Upon further reflection, I think this can't happen either.  In particular:
> 1. Thread B, when it waits for no evac threads, is also waiting for all counter's OOM bits to be set.
> 2. So Thread B will not return until thread A has set all of the OOM bits.
> 3. Suppose thread C's counter has value 0 because Thread A has not yet set its OOM bit.  There's a race:
>     a. Thread A wants to change the counter to OOM bit
>     b. Thread C wants to change the counter to 1
> 4. If Thread A wins the race, then thread C will proceed without authorization to allocate.
> 5. If Thread C wins the race, then both thread A and B will spin until thread C finishes its evacuation and decrements the counter back to zero.

I agree. Though I haven't reproduced the scenario you're describing, I stepped through this code a few times with failure scenarios.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1456241140


More information about the hotspot-gc-dev mailing list