RFR: 8366441: AArch64: Support WFET in OnSpinWait [v2]

Ruben duke at openjdk.org
Thu Feb 5 17:53:32 UTC 2026


On Thu, 18 Dec 2025 16:31:04 GMT, Ruben <duke at openjdk.org> wrote:

>> Implement OnSpinWait based on WFET - wait for event with timeout:
>>  - introduce OnSpinWaitDelay - the OnSpinWait time in nanoseconds;
>>  - the OnSpinWaitInstCount is expected to be 1 when WFET is used;
>>  - the waiting loop is followed by SB - to ensure following instructions aren't speculated until wait is finished;
>>  - the timer register is read via the self-synchronized view CNTVCTSS_EL0 to prevent the read being hoisted out of the loop.
>> 
>> The WFET and CNTVCTSS_EL0 read are added to aarch64-asmtest.py as hex values - using the instruction mnemonics would require support of -march=armv9-2.a, and consequently, the binutils 2.36+.
>
> Ruben has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix bsd_aarch64 build

The data has been collected for this PR rebased onto 949370ab0e701cfcc68cb84dd0f91e5db41f4f45.
JMH options: `-f 5 -i 20`.


Arm Cortex-A725. 10 cores.
+-------------------+--------------------------+---------------------------+------------------------+
| OnSpinWaitInst    | onSpinWait delay         | ProducerConsumer          | SharedCounter          |
+===================+==========================+===========================+========================+
| default           | 0.454    +/- 0.005 ns/op | 1797.534 +/- 7.137 us/op  | 38.229 +/- 1.507 ms/op |
| isb               | 14.735   +/- 0.049 ns/op | 1622.890 +/- 8.957 us/op  | 67.825 +/- 2.004 ms/op |
| sb                | 6.609    +/- 0.028 ns/op | 1756.784 +/- 8.068 us/op  | 56.593 +/- 2.682 ms/op |
| wfet delay=1      | 142.610  +/- 0.135 ns/op | 407.111  +/- 2.177 us/op  | 45.724 +/- 1.047 ms/op |
| wfet delay=10     | 142.706  +/- 0.053 ns/op | 412.207  +/- 2.491 us/op  | 44.549 +/- 0.879 ms/op |
| wfet delay=100    | 195.225  +/- 1.544 ns/op | 497.679  +/- 1.825 us/op  | 29.912 +/- 0.225 ms/op |
| wfet delay=1000   | 1105.214 +/- 0.048 ns/op | 1725.607 +/- 41.243 us/op | 13.124 +/- 0.044 ms/op |
+-------------------+--------------------------+---------------------------+------------------------+


Arm Cortex-X925. 10 cores.
+-------------------+--------------------------+---------------------------+------------------------+
| OnSpinWaitInst    | onSpinWait delay         | ProducerConsumer          | SharedCounter          |
+===================+==========================+===========================+========================+
| default           | 0.251    +/- 0.001 ns/op | 1292.315 +/- 6.435 us/op  | 34.149 +/- 0.725 ms/op |
| isb               | 8.577    +/- 0.015 ns/op | 1240.861 +/- 6.554 us/op  | 55.781 +/- 1.335 ms/op |
| sb                | 5.182    +/- 0.008 ns/op | 1281.250 +/- 6.326 us/op  | 39.042 +/- 0.858 ms/op |
| wfet delay=1      | 80.570   +/- 0.271 ns/op | 263.324  +/- 2.462 us/op  | 52.155 +/- 0.755 ms/op |
| wfet delay=10     | 80.676   +/- 0.251 ns/op | 256.965  +/- 2.230 us/op  | 53.537 +/- 0.580 ms/op |
| wfet delay=100    | 230.551  +/- 0.016 ns/op | 437.056  +/- 4.488 us/op  | 31.870 +/- 1.023 ms/op |
| wfet delay=1000   | 1087.637 +/- 0.089 ns/op | 1824.265 +/- 27.841 us/op | 9.842  +/- 0.168 ms/op |
+-------------------+--------------------------+---------------------------+------------------------+

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27030#issuecomment-3855235598


More information about the hotspot-dev mailing list