RFR: 8366441: AArch64: Support WFET in OnSpinWait [v2]
Ruben
duke at openjdk.org
Thu Feb 5 17:53:32 UTC 2026
On Thu, 18 Dec 2025 16:31:04 GMT, Ruben <duke at openjdk.org> wrote:
>> Implement OnSpinWait based on WFET - wait for event with timeout:
>> - introduce OnSpinWaitDelay - the OnSpinWait time in nanoseconds;
>> - the OnSpinWaitInstCount is expected to be 1 when WFET is used;
>> - the waiting loop is followed by SB - to ensure following instructions aren't speculated until wait is finished;
>> - the timer register is read via the self-synchronized view CNTVCTSS_EL0 to prevent the read being hoisted out of the loop.
>>
>> The WFET and CNTVCTSS_EL0 read are added to aarch64-asmtest.py as hex values - using the instruction mnemonics would require support of -march=armv9-2.a, and consequently, the binutils 2.36+.
>
> Ruben has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix bsd_aarch64 build
The data has been collected for this PR rebased onto 949370ab0e701cfcc68cb84dd0f91e5db41f4f45.
JMH options: `-f 5 -i 20`.
Arm Cortex-A725. 10 cores.
+-------------------+--------------------------+---------------------------+------------------------+
| OnSpinWaitInst | onSpinWait delay | ProducerConsumer | SharedCounter |
+===================+==========================+===========================+========================+
| default | 0.454 +/- 0.005 ns/op | 1797.534 +/- 7.137 us/op | 38.229 +/- 1.507 ms/op |
| isb | 14.735 +/- 0.049 ns/op | 1622.890 +/- 8.957 us/op | 67.825 +/- 2.004 ms/op |
| sb | 6.609 +/- 0.028 ns/op | 1756.784 +/- 8.068 us/op | 56.593 +/- 2.682 ms/op |
| wfet delay=1 | 142.610 +/- 0.135 ns/op | 407.111 +/- 2.177 us/op | 45.724 +/- 1.047 ms/op |
| wfet delay=10 | 142.706 +/- 0.053 ns/op | 412.207 +/- 2.491 us/op | 44.549 +/- 0.879 ms/op |
| wfet delay=100 | 195.225 +/- 1.544 ns/op | 497.679 +/- 1.825 us/op | 29.912 +/- 0.225 ms/op |
| wfet delay=1000 | 1105.214 +/- 0.048 ns/op | 1725.607 +/- 41.243 us/op | 13.124 +/- 0.044 ms/op |
+-------------------+--------------------------+---------------------------+------------------------+
Arm Cortex-X925. 10 cores.
+-------------------+--------------------------+---------------------------+------------------------+
| OnSpinWaitInst | onSpinWait delay | ProducerConsumer | SharedCounter |
+===================+==========================+===========================+========================+
| default | 0.251 +/- 0.001 ns/op | 1292.315 +/- 6.435 us/op | 34.149 +/- 0.725 ms/op |
| isb | 8.577 +/- 0.015 ns/op | 1240.861 +/- 6.554 us/op | 55.781 +/- 1.335 ms/op |
| sb | 5.182 +/- 0.008 ns/op | 1281.250 +/- 6.326 us/op | 39.042 +/- 0.858 ms/op |
| wfet delay=1 | 80.570 +/- 0.271 ns/op | 263.324 +/- 2.462 us/op | 52.155 +/- 0.755 ms/op |
| wfet delay=10 | 80.676 +/- 0.251 ns/op | 256.965 +/- 2.230 us/op | 53.537 +/- 0.580 ms/op |
| wfet delay=100 | 230.551 +/- 0.016 ns/op | 437.056 +/- 4.488 us/op | 31.870 +/- 1.023 ms/op |
| wfet delay=1000 | 1087.637 +/- 0.089 ns/op | 1824.265 +/- 27.841 us/op | 9.842 +/- 0.168 ms/op |
+-------------------+--------------------------+---------------------------+------------------------+
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27030#issuecomment-3855235598
More information about the hotspot-dev
mailing list