RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code

Andrew Haley aph at openjdk.java.net
Fri Feb 5 19:07:01 UTC 2021


Go back a few years, and there were simple atomic load/store exclusive
instructions on Arm. Say you want to do an atomic increment of a
counter. You'd do an atomic load to get the counter into your local cache
in exclusive state, increment that counter locally, then write that
incremented counter back to memory with an atomic store. All the time
that cache line was in exclusive state, so you're guaranteed that
no-one else changed anything on that cache line while you had it.

This is hard to scale on a very large system (e.g. Fugaku) because if
many processors are incrementing that counter you get a lot of cache
line ping-ponging between cores.

So, Arm decided to add a locked memory increment instruction that
works without needing to load an entire line into local cache. It's a
single instruction that loads, increments, and writes back. The secret
is to send a cache control message to whichever processor owns the
cache line containing the count, tell that processor to increment the
counter and return the incremented value. That way cache coherency
traffic is mimimized. This new set of instructions is known as Large
System Extensions, or LSE.

Unfortunately, in recent processors, the "old" load/store exclusive
instructions, sometimes perform very badly. Therefore, it's now
necessary for software to detect which version of Arm it's running
on, and use the "new" LSE instructions if they're available. Otherwise
performance can be very poor under heavy contention.

GCC's -moutline-atomics does this by providing library calls which use
LSE if it's available, but this option is only provided on newer
versions of GCC. This is particularly problematic with older versions
of OpenJDK, which build using old GCC versions.

Also, I suspect that some other operating systems could use this.
Perhaps not MacOS, given that all Apple CPUs support LSE, but
maybe Windows.

-------------

Commit messages:
 - Hoist load of stub pointer before FULL_MEM_BARRIER.
 - Oops
 - Move stuff around
 - Cleanup
 - Intermediate for perf test
 - Untabify
 - Intermediate
 - Intermediate

Changes: https://git.openjdk.java.net/jdk/pull/2434/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261027
  Stats: 396 lines in 6 files changed: 362 ins; 6 del; 28 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2434.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434

PR: https://git.openjdk.java.net/jdk/pull/2434


More information about the hotspot-dev mailing list