RFR: 8321371: SpinPause() not implemented for bsd_aarch64/macOS
Erik Österlund
eosterlund at openjdk.org
Fri Dec 15 13:17:40 UTC 2023
On Fri, 15 Dec 2023 12:50:04 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:
>> Let’s zoom out and look at the big picture for a bit.
>>
>> So yield is the obvious instruction dedicated for this purpose in the ISA, and has been for a very long time. I suppose early ARM chips didn’t have a whole lot of concurrency and implementing it wasn’t all that beneficial as you frankly didn’t spend considerable time spinning. And now we seemingly have come to a classic chicken and egg problem. Software people like us don’t want to use the obvious yield instruction intended for this exact purpose, because hardware vendors haven’t implemented it. And hardware vendors don’t want to implement it, because no software is using it.
>>
>> It feels like we had a sort of similar situation with neon vs SVE. All hardware was running neon, and nobody was running SVE. That doesn’t make it very encouraging as a software developer to implement SVE support in software, for an imaginary chip that doesn’t exist. And the fact that software doesn’t implement SVE doesn’t make it very encouraging to implement it. Yet we did it because it was the right thing to do, and no benchmark thanked us for it.
>>
>> In the short term, it would seem like a better idea to use ISB, if you look at micro benchmarks. But what we are doing then is IMO what we tell our Java users not to do, and for good reason. We say “don’t use Unsafe to expose JDK and JVM internals that you happen to know how it works today, but you don’t know how it will work tomorrow, so you can look like a winner in a microbenchmark”. We are looking past the intended ISA contract, and look at current implementations today, and finding that as a hack, the ISB instruction which was designed to deal with cross modifying code, and not at all designed for this, is currently a better fit for doing what the yield instruction should have done had it been implemented, as the current ISB implementation has a long latency. And then a couple of years later, when the next LTS is released, a the Apple M3 is released. And then by the time we hit bulk of mainstream adoption, maybe we cycle through M4 and M5, and perhaps we end up with a
hyper threaded core with 4 threads per core, and the isb turns out to be a disaster as it impedes progress of the other threads on the core to avoid shared resources being tripped over by cross modifying code, which is the exact opposite behaviour of what a pause instruction should do. Or maybe it won’t happen and everything is fine. The point is that *we don’t know*.
>>
>> We are now in a situation w...
>
> @fisk I don't mind to use `yield`.
>
> `isb` appeared as a result of real customer cases and consultations with hardware engineers. It was a real problem to write a micro-benchmark. I think it's still a problem.
> We are aware that different implementations of AArch64 might need different spin pause implementations. This is why we provide options which instructions to use: `yield`, `nop` or `isb`.
>
> I think hardcoding `yield` is not a good idea. We should use the current implemented approach where `yield` is selected. If the current approach does not work because it is based on a generated stub in CodeCache. Let's modify it to get code in place instead of stub.
>
> @fbredber @fisk
> What do you think about the following solution:
>
> In `src/hotspot/cpu/aarch64/spin_wait_aarch64.hpp` we add `DEFAULT` to `SpinWait::Inst` which is set to `YIELD`. We also add a template function `exec<SpinWait::Instr>()` and its implementation `exec<SpinWait::YIELD>()`
>
>
> class SpinWait {
> public:
> enum Inst {
> NONE = -1,
> NOP,
> ISB,
> YIELD,
> DEFAULT = YIELD
> };
> ...
> };
>
> template<SpinWait::Inst inst> void exec();
>
> template<> exec<SpinWait::YIELD>() {
> asm volatile("yield" : : : "memory");
> }
>
>
> In `src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp`:
>
>
> extern "C" {
> int SpinPause() {
> exec<SpinWait::DEFAULT>();
> return 1;
> }
>
>
> To enable onSpinWait intrinsic (this or a separate PR) you will need to set `yield` as default in `src/hotspot/cpu/aarch64/globals_aarch64.hpp`
>
> product(ccstr, OnSpinWaitInst, "yield", DIAGNOSTIC, \
> "The instruction to use to implement " \
> "java.lang.Thread.onSpinWait()." \
> "Options: none, nop, isb, yield.") \
>
> You will also need to update tests:
> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java`
> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java`
@eastig I think the essence of what you are proposing - having a non-WX solution for making the instruction selection configurable, but defaulting to yield - sounds very reasonable. Good suggestion!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16994#issuecomment-1857865087
More information about the hotspot-runtime-dev
mailing list