RFR: 8366441: AArch64: Support WFET in OnSpinWait [v3]
Per Minborg
pminborg at openjdk.org
Tue Feb 10 07:24:42 UTC 2026
On Mon, 9 Feb 2026 21:33:44 GMT, Ruben <duke at openjdk.org> wrote:
>> Implement OnSpinWait based on WFET - wait for event with timeout:
>> - introduce OnSpinWaitDelay - the OnSpinWait time in nanoseconds;
>> - the OnSpinWaitInstCount is expected to be 1 when WFET is used;
>> - the waiting loop is followed by SB - to ensure following instructions aren't speculated until wait is finished;
>> - the timer register is read via the self-synchronized view CNTVCTSS_EL0 to prevent the read being hoisted out of the loop.
>>
>> The WFET and CNTVCTSS_EL0 read are added to aarch64-asmtest.py as hex values - using the instruction mnemonics would require support of -march=armv9-2.a, and consequently, the binutils 2.36+.
>
> Ruben has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
>
> - Set default OnSpinWaitDelay to 100
> - Address review comments
> - Apply PR review "Suggested changes" from @theRealAph
> - Merge from mainline
> - Fix bsd_aarch64 build
> - Update
>
> - Address review comments
> - Fix test
> - Mark the support experimental
> - Remove changes in src/hotspot/os_cpu/bsd_aarch64
> - Merge from mainline
> - 8366441: AArch64: Support WFET in OnSpinWait
>
> Implement OnSpinWait based on WFET - wait for event with timeout:
> - introduce OnSpinWaitDelay - the OnSpinWait time in nanoseconds;
> - the OnSpinWaitInstCount is expected to be 1 when WFET is used;
> - the waiting loop is followed by SB - to ensure following instructions
> aren't speculated until wait is finished;
> - the timer register is read via the self-synchronized view
> CNTVCTSS_EL0 to prevent the read being hoisted out of the loop.
>
> The WFET and CNTVCTSS_EL0 read are added to aarch64-asmtest.py as
> hex values - using the instruction mnemonics would require support of
> -march=armv9-2.a, and consequently, the binutils 2.36+.
>
> Co-authored-by: Stuart Monteith <stuart.monteith at arm.com>
Unless the default value is changed, this will have a relatively large impact on certain low-latency applications where one waits for a CAS-lock in a spin loop. Latencies in such applications are often measured in ns. So, this is something that should be communicated to the community in some way.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27030#issuecomment-3875804714
More information about the hotspot-dev
mailing list