RFR: 8366441: AArch64: Support WFET in OnSpinWait [v3]

Ruben duke at openjdk.org
Wed Feb 11 09:40:22 UTC 2026


On Mon, 9 Feb 2026 21:33:44 GMT, Ruben <duke at openjdk.org> wrote:

>> Implement OnSpinWait based on WFET - wait for event with timeout:
>>  - introduce OnSpinWaitDelay - the OnSpinWait time in nanoseconds;
>>  - the OnSpinWaitInstCount is expected to be 1 when WFET is used;
>>  - the waiting loop is followed by SB - to ensure following instructions aren't speculated until wait is finished;
>>  - the timer register is read via the self-synchronized view CNTVCTSS_EL0 to prevent the read being hoisted out of the loop.
>> 
>> The WFET and CNTVCTSS_EL0 read are added to aarch64-asmtest.py as hex values - using the instruction mnemonics would require support of -march=armv9-2.a, and consequently, the binutils 2.36+.
>
> Ruben has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Set default OnSpinWaitDelay to 100
>  - Address review comments
>  - Apply PR review "Suggested changes" from @theRealAph
>  - Merge from mainline
>  - Fix bsd_aarch64 build
>  - Update
>    
>    - Address review comments
>    - Fix test
>    - Mark the support experimental
>    - Remove changes in src/hotspot/os_cpu/bsd_aarch64
>  - Merge from mainline
>  - 8366441: AArch64: Support WFET in OnSpinWait
>    
>    Implement OnSpinWait based on WFET - wait for event with timeout:
>     - introduce OnSpinWaitDelay - the OnSpinWait time in nanoseconds;
>     - the OnSpinWaitInstCount is expected to be 1 when WFET is used;
>     - the waiting loop is followed by SB - to ensure following instructions
>       aren't speculated until wait is finished;
>     - the timer register is read via the self-synchronized view
>       CNTVCTSS_EL0 to prevent the read being hoisted out of the loop.
>    
>    The WFET and CNTVCTSS_EL0 read are added to aarch64-asmtest.py as
>    hex values - using the instruction mnemonics would require support of
>    -march=armv9-2.a, and consequently, the binutils 2.36+.
>    
>    Co-authored-by: Stuart Monteith <stuart.monteith at arm.com>

Thank you for the feedback,

The current WFET sequence does not allow a delay less than 80ns based on the above data: `onSpinWait delay` is 80ns / 142ns on Arm Cortex-X925 / Cortex-A725, when the delay of 1ns is requested. It might be possible to achieve a lower minimal delay by adjusting the sequence, and the minimal delay of the current sequence might be different on other hardware.

The default delay of 100ns was chosen based on the above measurements - as providing improvement for both ProducerConsumer and SharedCounter microbenchmarks.

I am planning to change the default value to 40ns, however I probably should update the description of the option too - specifying that it sets the minimal delay and not the precise delay value.

The option is proposed in this PR as experimental. Benchmarking on real-world workloads as well as microbenchmarking on other hardware might be beneficial - to determine use cases where it is more helpful than other `OnSpinWaitInst` options.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27030#issuecomment-3881046394


More information about the hotspot-dev mailing list