RFR: 8261649: AArch64: Optimize LSE atomics in C++ code
Andrew Haley
aph at openjdk.java.net
Thu Feb 18 09:25:39 UTC 2021
On Wed, 17 Feb 2021 18:15:02 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal.
>>
>> Barrier-ordered-before, Arm Architecture Reference Manual B2.3 :
>>
>> | Barrier instructions order prior Memory effects before subsequent
>> | Memory effects generated by the same Observer. A read or a write RW1
>> | is Barrier-ordered-before a read or a write RW2 from the same Observer
>> | if and only if RW1 appears in program order before RW2 and any of the
>> | following cases apply:
>> |
>> | [...]
>> |
>> | * RW1 appears in program order before an atomic instruction with both
>> | Acquire and Release semantics that appears in program order before RW2.
>>
>> So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction.
>
> This patch:
>
> Moves memory barriers from the atomic_linux_aarch64 file into the stubs.
> Rewrites the LSE versions of the stubs to be more efficient.
> Fixes a race condition in stub generation.
> Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance.
Closing because this is a duplicate.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2612
More information about the hotspot-dev
mailing list