[aarch64-port-dev ] RFR: 8186671: Use `yield` instruction in SpinPause on linux-aarch64
Ivan Krylov
ivankrylov.java at gmail.com
Thu Aug 24 19:16:22 UTC 2017
Just want to take a moment and appreciate the effort of bringing
onSpinWait to the ARM architecture.
The idea behind this method (or a hint to JIT if you will) is to find a
sweet spot between an ordinary spin loop and a task preemption,
keeping the code hot while not eating as much CPU cycles.
I recall the discussion about the difference between x86 pause and ARM
Yield back when we did the original hotspot patch code review and we
never reached a conclusion back than.
Looking at the charts that Dmitry provided it seems that the proposed
ARM implementation reaches the same goal.
Ivan
On 8/24/17 6:33 PM, Dmitry Chuyko wrote:
> On 08/23/2017 10:39 PM, White, Derek wrote:
>> Hi Andrew,
>>
>>> -----Original Message-----
>>> From: aarch64-port-dev [mailto:aarch64-port-dev-
>>> bounces at openjdk.java.net] On Behalf Of Andrew Haley
>>> Sent: Wednesday, August 23, 2017 12:32 PM
>>> To: aarch64-port-dev at openjdk.java.net
>>> Subject: Re: [aarch64-port-dev ] RFR: 8186671: Use `yield`
>>> instruction in
>>> SpinPause on linux-aarch64
>>>
>>> On 23/08/17 17:07, Dmitry Chuyko wrote:
>>>> Please review a change in SpinPause implementation.
>>>>
>>>> related study:
>>>> http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html
>>>> rfe: https://bugs.openjdk.java.net/browse/JDK-8186671
>>>> webrev: http://cr.openjdk.java.net/~dchuyko/8186671/webrev.00/
>>>>
>>>> The function was moved to platform .S file and now contains yield
>>>> instruction.
>>> Re the use of YIELD for onSpinWait(), I think this probably would be a
>>> mistake:
> Andrew, thank you for the discussion but I don't quite get your point.
> Some thoughts and questions below.
>>> Intel's PAUSE is intended to improve the performance of spin-wait
>>> loops, whereas ARM's YIELD is intended to hint that the task
>>> performed by a
>>> thread is of low importance so that it could yield.
> If we go further in yield subsection of ARMv8 Reference Manual, it says:
> "Examples of when the YIELD instruction might be used include a thread
> that is sitting in a spin-lock", which to me is the case. If we look
> at Java usage, it is like
> ----
> else if ((LockSupport.nextSecondarySeed() &
> OVERFLOW_YIELD_RATE) == 0)
> Thread.yield();
> else
> Thread.onSpinWait()
> ----
> Yield is also used in kernel's cpu_relax() variants that look
> semantically close.
>>> So, despite that the instructions superficially look similar, they have
>>> diametrically opposite semantics! But we won't really know if YIELD
>>> will
>>> make a spin loop faster until somebody implements it.
> I can imagine yield making entire app throughput higher in SMT case if
> it gives more cycles to neighbor strand.
>
> How do you see typical yield usage with opposite semantics?
>
> What are other possible implementations for the intrinsic?
> I'd say that even issuing 2-4 NOP instructions may be useful in both
> SMT and temporal MT case.
>> I might re-word this to say that both PAUSE and YIELD were
>> implemented to improve system performance while running spin-loops.
>> Intel's PAUSE has several parts to it:
>> 1) Cancels out checking for memory order violation on out-of-order
>> reads to speed up spin-loop exit. Also referred to as "de-pipelining"?
>> 2) Adds a pause (delay) for some number of cycles (0, or 10-140) to
>> slow down the spin-loop.
>> - Memory updates cannot happen as quickly as instruction execution,
>> - Which may reduce power consumption, or:
>> 3) On hyper-threaded cores, may give core resources to the other
>> thread for some number of cycles. If the other thread was part of the
>> spin-loop transaction, this speeds up the spin-loop, otherwise it
>> speeds up the system as a whole.
>>
>> Aarch64's YIELD seems to address feature (3) only:
>> - A hint that the current thread is low priority and can yield.
>> - On an SMT system this should be able to release core resources
>> to other HW threads for some number of cycles.
>> (ARM ARM also talks about how this can be used to "suspend and resume
>> multiple software threads if it supports the capability", but if it's
>> talking about triggering thread rescheduling in the kernel I don't
>> see how the kernel gets notified.)
>>
>> But I assume that hardware vendors are allowed to implement NOPs like
>> YIELD with whatever latency they choose, so feature (2) can also be
>> supported by YIELD, depending on the implementation (which is true
>> for Intel as well). This could be independent of the core supporting
>> SMT.
>>
>> - Derek
>>
>> In any case we
>>> Re the use of yield in SpinPause(): this looks correct to me. OK.
> Good. This part seemed more scaring.
>
> --
> Dmitry
>>>
>>> --
>>> Andrew Haley
>>> Java Platform Lead Engineer
>>> Red Hat UK Ltd. <https://www.redhat.com>
>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
More information about the aarch64-port-dev
mailing list