RFC: linux-aarch64 and LSE support

Kim Barrett kim.barrett at oracle.com
Thu Sep 8 07:50:08 UTC 2022


> On Sep 7, 2022, at 5:30 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
> 
> On 9/7/22 05:41, Kim Barrett wrote:
>> I’m puzzled by this change:
>> https://bugs.openjdk.org/browse/JDK-8282322
>> 8282322: AArch64: Provide a means to eliminate all STREX family of instructions
>> (2022-07-08, jdk20, no backports)
>> It’s a followup to these changes:
>> https://bugs.openjdk.org/browse/JDK-8261027
>> 8261027: AArch64: Support for LSE atomics C++ HotSpot code
>> (2021-02-12, jdk17, backported to jdk11, but not jdk8)
>> 8261649: AArch64: Optimize LSE atomics in C++ code
>> (2021-02-19, jdk17, backported to jdk11, but not jdk8)
>> [Also related is this
>> https://bugs.openjdk.org/browse/JDK-8261660
>> AArch64: Race condition in stub code generation for LSE Atomics
>> (2021-02-12, jdk17, superseded by 8261649)]
>> which are essentially reimplementing gcc’s -moutline-atomics option. The point
>> of doing this is to allow those changes to be used while building jdk with gcc
>> versions that don't support -moutline-atomics, esp. for purposes of backports.
>> (That option arrived with gcc8.5/gcc9.4/gcc10, enabled by default in gcc10. I
>> guess some non-Oracle folks might still be using something earlier, esp. in
>> the jdk17 timeframe when LSE support was being added.)
> 
> Sure. The timing wasn't good: the outline-atomics patch went in to GCC
> in September 2019, but was back-ported to GCC releases by May/June 2021.
> Our patch was committed in Feb that year.

I don’t have a problem with 8261027 and 8261649.  Yeah, it’s kind of messy, but
I think appropriate for the time.

>> For jdk20+ (where 8282322 landed), I question whether the approach being taken
>> here really makes sense. If one is willing to assume a relatively recent
>> version of gcc is being used, then I think there is no reason to reimplement
>> the effect of -moutline-atomics. We could undo all three of those changes,
>> reverting back to using gcc __atomic intrinsics, and rely on -moutline-atomics
>> (explicitly requested for gcc8/9). In that case, nothing like 8282322 is
>> needed; just specify armv8.1-a or later when configuring the build (which is
>> already needed to activate the current 8282322 behavior) and LSE will be used,
>> regardless of -moutline-atomics (or if it is even supported, if you want to
>> use rr with an old gcc version).
>> That would be a lot simpler.  It also makes it easier to make further changes.
> True. It all feels a bit early to me, though. I don't think that OpenJDK has ever
> needed so recent compiler features before. We've always been very conservative
> about what we depend on. So it might be OK to do this, but (to me) it feels
> like sailing close to the wind.

We (Oracle) used to be extremely conservative about toolchain upgrades, but we
got better. (Thank you, spectre and meltdown.) I admit my opinion is colored
by that, and the benefits from it.

Note that OpenJDK recently dropped support for Visual Studio 2017, now
requiring some patch level of VS2019. That's a stronger restriction in a
similar timeframe. (The dropped versions of VS cannot build the latest JDK.)

Unlike that situation with VS, switching linux-aarch64 back to using __atomic
intrinsics would still work if using an old toolchain, but possibly at reduced
performance on spiffy new hardware. And I'm only proposing this going forward;
backports need not apply.

A benefit of such a change would be that if one knows what hardware is going
to be used, one can compile specifically for that and get best performance for
these operations for that hardware. I can easily imagine that being a common
scenario. The current approach actively prevents that.

How many folks are very conservative about gcc versions but not about JDK
versions? And should we be catering to them at the expense of folks who are
keeping up to date and willing to do tailored builds.

> I'm curious, though: wouldn't atomic bitset ops be based on CAS? If so, you don't
> need to care how CAS is implemented.

We can write them as an Atomic::cmpxchg loop, just as we can write Atomic::add
that way. But that isn't a particularly good way to write them. It's a loop
over an ll/sc loop, or a loop over an LSE CAS instruction. Better is a direct
ll/sc loop or an LSE bitop instruction.



More information about the hotspot-dev mailing list