RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v3]

Volker Simonis simonis at openjdk.java.net
Tue Feb 9 16:53:40 UTC 2021


On Tue, 9 Feb 2021 15:07:05 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Go back a few years, and there were simple atomic load/store exclusive
>> instructions on Arm. Say you want to do an atomic increment of a
>> counter. You'd do an atomic load to get the counter into your local cache
>> in exclusive state, increment that counter locally, then write that
>> incremented counter back to memory with an atomic store. All the time
>> that cache line was in exclusive state, so you're guaranteed that
>> no-one else changed anything on that cache line while you had it.
>> 
>> This is hard to scale on a very large system (e.g. Fugaku) because if
>> many processors are incrementing that counter you get a lot of cache
>> line ping-ponging between cores.
>> 
>> So, Arm decided to add a locked memory increment instruction that
>> works without needing to load an entire line into local cache. It's a
>> single instruction that loads, increments, and writes back. The secret
>> is to send a cache control message to whichever processor owns the
>> cache line containing the count, tell that processor to increment the
>> counter and return the incremented value. That way cache coherency
>> traffic is mimimized. This new set of instructions is known as Large
>> System Extensions, or LSE.
>> 
>> Unfortunately, in recent processors, the "old" load/store exclusive
>> instructions, sometimes perform very badly. Therefore, it's now
>> necessary for software to detect which version of Arm it's running
>> on, and use the "new" LSE instructions if they're available. Otherwise
>> performance can be very poor under heavy contention.
>> 
>> GCC's -moutline-atomics does this by providing library calls which use
>> LSE if it's available, but this option is only provided on newer
>> versions of GCC. This is particularly problematic with older versions
>> of OpenJDK, which build using old GCC versions.
>> 
>> Also, I suspect that some other operating systems could use this.
>> Perhaps not MacOS, given that all Apple CPUs support LSE, but
>> maybe Windows.
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Properly align everything

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 35:

> 33: #include "interpreter/interpreter.hpp"
> 34: #include "memory/universe.hpp"
> 35: #include "atomic_aarch64.hpp"

I think the conventions is to put the includes in alphabetic order after `#include "precompiled.hpp"`

-------------

PR: https://git.openjdk.java.net/jdk/pull/2434


More information about the hotspot-dev mailing list