RFC: linux-aarch64 and LSE support
Kim Barrett
kim.barrett at oracle.com
Fri Sep 9 21:50:35 UTC 2022
> On Sep 9, 2022, at 5:24 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
>
> On 9/8/22 08:50, Kim Barrett wrote:
>> We (Oracle) used to be extremely conservative about toolchain upgrades, but we
>> got better. (Thank you, spectre and meltdown.) I admit my opinion is colored
>> by that, and the benefits from it.
>
> OK, so we have, in effect, a change of policy. That does make sense.
That's a change in Oracle/JPG policy. Of course, other vendors may have other
policies. But I would think a linux-aarch64 vendor/distro that is tracking
up-to-date JDK versions would likely also be tracking reasonably up-to-date
gcc versions.
> Don't under-estimate the effect of using LL/SC on some Arm hardware. In some
> cases it's so bad that it looks almost like the implementation is broken.
>
> Another thing to bear in mind is that we have some reports of relatively
> poor LSE performance on larger Huawei systems, so the mere presence of LSE
> atomics does not always tell us we should use them. We have a switch.
>
> As far as I can tell, outline-atomics doesn't support switching LSE off, but
> maybe some systems have local patches.
I'd expect that to be something that would be dealt with by -mcpu, and
automatically by -moutline-atomics (if not now, then eventually - a hardware
vendor has a vested interest in optimizing gcc). But yes, maybe that kind of
configuration is not (yet) well handled.
>> A benefit of such a change would be that if one knows what hardware is going
>> to be used, one can compile specifically for that and get best performance for
>> these operations for that hardware. I can easily imagine that being a common
>> scenario. The current approach actively prevents that.
>
> True, but the overhead is very low. It's two perfectly-predicted branches,
> about as near as you can get to zero in practice. Less overhead than
> outline-atomics.
It's more than that, because of register shuffling going in to and out of the
out-of-line helper routine (whoever supplies that routine). How measureable
is it? I don't know.
>> How many folks are very conservative about gcc versions but not about JDK
>> versions? And should we be catering to them at the expense of folks who are
>> keeping up to date and willing to do tailored builds.
>
> That's an excellent question. I guess most Linux distros are now at 8.5,
> so perhaps there's no problem.
That would be my hope/guess, but it's not something I personally track.
>>> I'm curious, though: wouldn't atomic bitset ops be based on CAS? If so, you don't
>>> need to care how CAS is implemented.
>> We can write them as an Atomic::cmpxchg loop, just as we can write Atomic::add
>> that way. But that isn't a particularly good way to write them. It's a loop
>> over an ll/sc loop, or a loop over an LSE CAS instruction. Better is a direct
>> ll/sc loop or an LSE bitop instruction.
>
> Oh! Does LSE actually have the ops you need for this?
Yes!
> So, if we revert to using the GCC intrinsics, we'd hurt performance on some
> systems, and we'd lose some control. On the other hand it'd be cleaner. Much
> cleaner. :-)
Exactly.
So what do folks think?
Obviously, I can add support for the bitops using the existing structure (even
though I'd rather not) and we can revisit this somewhat messy situation again
later. Or we can clean up the code now.
More information about the hotspot-dev
mailing list