RFC: linux-aarch64 and LSE support

Andrew Haley aph-open at littlepinkcloud.com
Fri Sep 9 09:24:32 UTC 2022


On 9/8/22 08:50, Kim Barrett wrote:
>> On Sep 7, 2022, at 5:30 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
>>
>> On 9/7/22 05:41, Kim Barrett wrote:
>>> I’m puzzled by this change:
>>> https://bugs.openjdk.org/browse/JDK-8282322
...
> 
>>> For jdk20+ (where 8282322 landed), I question whether the approach being taken
>>> here really makes sense. If one is willing to assume a relatively recent
>>> version of gcc is being used, then I think there is no reason to reimplement
>>> the effect of -moutline-atomics. We could undo all three of those changes,
>>> reverting back to using gcc __atomic intrinsics, and rely on -moutline-atomics
>>> (explicitly requested for gcc8/9). In that case, nothing like 8282322 is
>>> needed; just specify armv8.1-a or later when configuring the build (which is
>>> already needed to activate the current 8282322 behavior) and LSE will be used,
>>> regardless of -moutline-atomics (or if it is even supported, if you want to
>>> use rr with an old gcc version).
 >>>
>>> That would be a lot simpler.  It also makes it easier to make further changes.
>> True. It all feels a bit early to me, though. I don't think that OpenJDK has ever
>> needed so recent compiler features before. We've always been very conservative
>> about what we depend on. So it might be OK to do this, but (to me) it feels
>> like sailing close to the wind.
> 
> We (Oracle) used to be extremely conservative about toolchain upgrades, but we
> got better. (Thank you, spectre and meltdown.) I admit my opinion is colored
> by that, and the benefits from it.

OK, so we have, in effect, a change of policy. That does make sense.

> Note that OpenJDK recently dropped support for Visual Studio 2017, now
> requiring some patch level of VS2019. That's a stronger restriction in a
> similar timeframe. (The dropped versions of VS cannot build the latest JDK.)
> 
> Unlike that situation with VS, switching linux-aarch64 back to using __atomic
> intrinsics would still work if using an old toolchain, but possibly at reduced
> performance on spiffy new hardware. And I'm only proposing this going forward;
> backports need not apply.

Don't under-estimate the effect of using LL/SC on some Arm hardware. In some
cases it's so bad that it looks almost like the implementation is broken.

Another thing to bear in mind is that we have some reports of relatively
poor LSE performance on larger Huawei systems, so the mere presence of LSE
atomics does not always tell us we should use them. We have a switch.

As far as I can tell, outline-atomics doesn't support switching LSE off, but
maybe some systems have local patches.

[https://people.mpi-sws.org/~viktor/papers/netys2021-hmcs.pdf]
Disclaimer: I haven't tested if this is true, and it might be out of date.

> A benefit of such a change would be that if one knows what hardware is going
> to be used, one can compile specifically for that and get best performance for
> these operations for that hardware. I can easily imagine that being a common
> scenario. The current approach actively prevents that.

True, but the overhead is very low. It's two perfectly-predicted branches,
about as near as you can get to zero in practice. Less overhead than
outline-atomics.

> How many folks are very conservative about gcc versions but not about JDK
> versions? And should we be catering to them at the expense of folks who are
> keeping up to date and willing to do tailored builds.

That's an excellent question. I guess most Linux distros are now at 8.5,
so perhaps there's no problem.

>> I'm curious, though: wouldn't atomic bitset ops be based on CAS? If so, you don't
>> need to care how CAS is implemented.
> 
> We can write them as an Atomic::cmpxchg loop, just as we can write Atomic::add
> that way. But that isn't a particularly good way to write them. It's a loop
> over an ll/sc loop, or a loop over an LSE CAS instruction. Better is a direct
> ll/sc loop or an LSE bitop instruction.

Oh! Does LSE actually have the ops you need for this?

So, if we revert to using the GCC intrinsics, we'd hurt performance on some
systems, and we'd lose some control. On the other hand it'd be cleaner. Much
cleaner.  :-)

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the hotspot-dev mailing list