RFR: 8318986: Improve GenericWaitBarrier performance [v6]

Tue Nov 21 09:36:32 UTC 2023

On Tue, 21 Nov 2023 07:04:44 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> If we fail we need to reload _state, when the other CPU just invalidated that cache-line.
>> Then a spin-pause just before would actually be bad, but since it so rarely happens it doesn't matter.
>> 
>> But some platform do not have a CAS, so Atomic::cmpxchg may be a load-reserve store-conditional (risc-v). Which have the (at least theoretical) possibility of all failed with the cmpxchg.
>> As LR/SC is a bit unpredictable, and there are a number of hw vendors, I think it's good to have this just in case.
>
> https://lore.kernel.org/all/20230910082911.3378782-10-guoren@kernel.org/

Actually, I think we don't need `SpinYield` in this particular place for a few reasons:
 1. We want to disarm as fast as possible, even if that means more contention, since we are on "leaving safepoint" path in VM thread here.
 2. No other `_state` update loop yields, so this loop is effectively low priority under contention, which makes (1) even worse.
 3. There is a sharing between `SpinYield` in `_state` CAS loop here and wakeup backoff later. Which is subtly leaking the `SpinYield` state between the phases: the aggressive backoff accrued due to `_state` contention would transfer to dealing with signaling contention.

I removed this `wait()` in new commit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1400278309