[aarch64-port-dev ] RFR: 8186671: Use `yield` instruction in SpinPause on linux-aarch64

Ivan Krylov ivankrylov.java at gmail.com
Thu Aug 24 19:16:22 UTC 2017


Just want to take a moment and appreciate the effort of bringing 
onSpinWait to the ARM architecture.
The idea behind this method (or a hint to JIT if you will) is to find a 
sweet spot between an ordinary spin loop and a task preemption,
keeping the code hot while not eating as much CPU cycles.
I recall the discussion about the difference between x86 pause and ARM 
Yield back when we did the original hotspot patch code review and we 
never reached a conclusion back than.
Looking at the charts that Dmitry provided it seems that the proposed 
ARM implementation reaches the same goal.

Ivan


On 8/24/17 6:33 PM, Dmitry Chuyko wrote:
> On 08/23/2017 10:39 PM, White, Derek wrote:
>> Hi Andrew,
>>
>>> -----Original Message-----
>>> From: aarch64-port-dev [mailto:aarch64-port-dev-
>>> bounces at openjdk.java.net] On Behalf Of Andrew Haley
>>> Sent: Wednesday, August 23, 2017 12:32 PM
>>> To: aarch64-port-dev at openjdk.java.net
>>> Subject: Re: [aarch64-port-dev ] RFR: 8186671: Use `yield` 
>>> instruction in
>>> SpinPause on linux-aarch64
>>>
>>> On 23/08/17 17:07, Dmitry Chuyko wrote:
>>>> Please review a change in SpinPause implementation.
>>>>
>>>> related study:
>>>> http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html
>>>> rfe: https://bugs.openjdk.java.net/browse/JDK-8186671
>>>> webrev: http://cr.openjdk.java.net/~dchuyko/8186671/webrev.00/
>>>>
>>>> The function was moved to platform .S file and now contains yield
>>>> instruction.
>>> Re the use of YIELD for onSpinWait(), I think this probably would be a
>>> mistake:
> Andrew, thank you for the discussion but I don't quite get your point. 
> Some thoughts and questions below.
>>> Intel's PAUSE is intended to improve the performance of spin-wait
>>> loops, whereas ARM's YIELD is intended to hint that the task 
>>> performed by a
>>> thread is of low importance so that it could yield.
> If we go further in yield subsection of ARMv8 Reference Manual, it says:
> "Examples of when the YIELD instruction might be used include a thread 
> that is sitting in a spin-lock", which to me is the case. If we look 
> at Java usage, it is like
> ----
>        else if ((LockSupport.nextSecondarySeed() & 
> OVERFLOW_YIELD_RATE) == 0)
>             Thread.yield();
>         else
>             Thread.onSpinWait()
> ----
> Yield is also used in kernel's cpu_relax() variants that look 
> semantically close.
>>> So, despite that the instructions superficially look similar, they have
>>> diametrically opposite semantics!  But we won't really know if YIELD 
>>> will
>>> make a spin loop faster until somebody implements it.
> I can imagine yield making entire app throughput higher in SMT case if 
> it gives more cycles to neighbor strand.
>
> How do you see typical yield usage with opposite semantics?
>
> What are other possible implementations for the intrinsic?
> I'd say that even issuing 2-4 NOP instructions may be useful in both 
> SMT and temporal MT case.
>> I might re-word this to say that both PAUSE and YIELD were 
>> implemented to improve system performance while running spin-loops.
>>   Intel's PAUSE has several parts to it:
>> 1) Cancels out checking for memory order violation on out-of-order 
>> reads to speed up spin-loop exit. Also referred to as "de-pipelining"?
>> 2) Adds a pause (delay) for some number of cycles (0, or 10-140) to 
>> slow down the spin-loop.
>>   - Memory updates cannot happen as quickly as instruction execution,
>>   - Which may reduce power consumption, or:
>> 3) On hyper-threaded cores, may give core resources to the other 
>> thread for some number of cycles. If the other thread was part of the 
>> spin-loop transaction, this speeds up the spin-loop, otherwise it 
>> speeds up the system as a whole.
>>
>> Aarch64's YIELD seems to address feature (3) only:
>>   - A hint that the current thread is low priority and can yield.
>>      - On an SMT system this should be able to release core resources 
>> to other HW threads for some number of cycles.
>> (ARM ARM also talks about how this can be used to "suspend and resume 
>> multiple software threads if it supports the capability", but if it's 
>> talking about triggering thread rescheduling in the kernel I don't 
>> see how the kernel gets notified.)
>>
>> But I assume that hardware vendors are allowed to implement NOPs like 
>> YIELD with whatever latency they choose, so feature (2) can also be 
>> supported by YIELD, depending on the implementation (which is true 
>> for Intel as well). This could be independent of the core supporting 
>> SMT.
>>
>>   - Derek
>>
>> In any case we
>>> Re the use of yield in SpinPause(): this looks correct to me.  OK.
> Good. This part seemed more scaring.
>
> -- 
> Dmitry
>>>
>>> -- 
>>> Andrew Haley
>>> Java Platform Lead Engineer
>>> Red Hat UK Ltd. <https://www.redhat.com>
>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>



More information about the aarch64-port-dev mailing list