RFR: 8321371: SpinPause() not implemented for bsd_aarch64/macOS

Fri Dec 15 09:58:41 UTC 2023

On Wed, 6 Dec 2023 14:01:49 GMT, Fredrik Bredberg <fbredberg at openjdk.org> wrote:

> The SpinPause() function only returns 0 on bsd_aarch64 (i.e. macOS)
> 
> This PR initially meant to implement SpinPause() for macOS on AArch64 by copying the source from linux_aarch64, but after having some internal discussions, it seems like the most reasonable thing to do is to implement SpinPause() using a single inline yield instruction.
> 
> Tested successfully on macosx-aarch64 tier1-tier5.

Let’s zoom out and look at the big picture for a bit.

So yield is the obvious instruction dedicated for this purpose in the ISA, and has been for a very long time. I suppose early ARM chips didn’t have a whole lot of concurrency and implementing it wasn’t all that beneficial as you frankly didn’t spend considerable time spinning. And now we seemingly have come to a classic chicken and egg problem. Software people like us don’t want to use the obvious yield instruction intended for this exact purpose, because hardware vendors haven’t implemented it. And hardware vendors don’t want to implement it, because no software is using it.

It feels like we had a sort of similar situation with neon vs SVE. All hardware was running neon, and nobody was running SVE. That doesn’t make it very encouraging as a software developer to implement SVE support in software, for an imaginary chip that doesn’t exist. And the fact that software doesn’t implement SVE doesn’t make it very encouraging to implement it. Yet we did it because it was the right thing to do, and no benchmark thanked us for it.

In the short term, it would seem like a better idea to use ISB, if you look at micro benchmarks. But what we are doing then is IMO what we tell our Java users not to do, and for good reason. We say “don’t use Unsafe to expose JDK and JVM internals that you happen to know how it works today, but you don’t know how it will work tomorrow, so you can look like a winner in a microbenchmark”. We are looking past the intended ISA contract, and look at current implementations today, and finding that as a hack, the ISB instruction which was designed to deal with cross modifying code, and not at all designed for this, is currently a better fit for doing what the yield instruction should have done had it been implemented, as the current ISB implementation has a long latency. And then a couple of years later, when the next LTS is released, a the Apple M3 is released. And then by the time we hit bulk of mainstream adoption, maybe we cycle through M4 and M5, and perhaps we end up with a hy
 per threaded core with 4 threads per core, and the isb turns out to be a disaster as it impedes progress of the other threads on the core to avoid shared resources being tripped over by cross modifying code, which is the exact opposite behaviour of what a pause instruction should do. Or maybe it won’t happen and everything is fine. The point is that *we don’t know*.

We are now in a situation where on MacOS the spinning hint does nothing. There is no cost over the baseline of making that spinning hint bind to the obvious yield instruction. It might not do a lot on Apple silicon today, and as of today, isb would probably yield better results (pun intended). But we don’t have to choose ISB just because it looks better today. We can instead be bold and break the chicken and egg situation, and use yield. Then the ball is on the HW court to do the right thing.

Because of this, I still think the right thing to do, is to bind SpinPause to the obvious ISA intended “yield” instruction, even though it is currently a nop.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16994#issuecomment-1857589031