RFR: 8261649: AArch64: Optimize LSE atomics in C++ code

Andrew Haley aph at openjdk.java.net
Wed Feb 17 18:17:43 UTC 2021


On Wed, 17 Feb 2021 18:06:55 GMT, Andrew Haley <aph at openjdk.org> wrote:

> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal.
> 
> Barrier-ordered-before, Arm Architecture Reference Manual B2.3 :
> 
>    | Barrier instructions order prior Memory effects before subsequent
>    | Memory effects generated by the same Observer. A read or a write RW1
>    | is Barrier-ordered-before a read or a write RW2 from the same Observer
>    | if and only if RW1 appears in program order before RW2 and any of the
>    | following cases apply:
>    |
>    | [...]
>    |
>    | * RW1 appears in program order before an atomic instruction with both
>    | Acquire and Release semantics that appears in program order before RW2.
> 
> So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction.

This patch:

Moves memory barriers from the atomic_linux_aarch64 file into the stubs.
Rewrites the LSE versions of the stubs to be more efficient.
Fixes a race condition in stub generation.
Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2612


More information about the hotspot-dev mailing list