[aarch64-port-dev ] RFR: 8186671: Use `yield` instruction in SpinPause on linux-aarch64

White, Derek Derek.White at cavium.com
Wed Aug 23 19:39:28 UTC 2017


Hi Andrew,

> -----Original Message-----
> From: aarch64-port-dev [mailto:aarch64-port-dev-
> bounces at openjdk.java.net] On Behalf Of Andrew Haley
> Sent: Wednesday, August 23, 2017 12:32 PM
> To: aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8186671: Use `yield` instruction in
> SpinPause on linux-aarch64
> 
> On 23/08/17 17:07, Dmitry Chuyko wrote:
> > Please review a change in SpinPause implementation.
> >
> > related study:
> > http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html
> > rfe: https://bugs.openjdk.java.net/browse/JDK-8186671
> > webrev: http://cr.openjdk.java.net/~dchuyko/8186671/webrev.00/
> >
> > The function was moved to platform .S file and now contains yield
> > instruction.
> 
> Re the use of YIELD for onSpinWait(), I think this probably would be a
> mistake: Intel's PAUSE is intended to improve the performance of spin-wait
> loops, whereas ARM's YIELD is intended to hint that the task performed by a
> thread is of low importance so that it could yield.
> So, despite that the instructions superficially look similar, they have
> diametrically opposite semantics!  But we won't really know if YIELD will
> make a spin loop faster until somebody implements it.

I might re-word this to say that both PAUSE and YIELD were implemented to improve system performance while running spin-loops.
 
Intel's PAUSE has several parts to it:
1) Cancels out checking for memory order violation on out-of-order reads to speed up spin-loop exit. Also referred to as "de-pipelining"?
2) Adds a pause (delay) for some number of cycles (0, or 10-140) to slow down the spin-loop.
 - Memory updates cannot happen as quickly as instruction execution, 
 - Which may reduce power consumption, or:
3) On hyper-threaded cores, may give core resources to the other thread for some number of cycles. If the other thread was part of the spin-loop transaction, this speeds up the spin-loop, otherwise it speeds up the system as a whole.

Aarch64's YIELD seems to address feature (3) only:
 - A hint that the current thread is low priority and can yield. 
    - On an SMT system this should be able to release core resources to other HW threads for some number of cycles.
(ARM ARM also talks about how this can be used to "suspend and resume multiple software threads if it supports the capability", but if it's talking about triggering thread rescheduling in the kernel I don't see how the kernel gets notified.)

But I assume that hardware vendors are allowed to implement NOPs like YIELD with whatever latency they choose, so feature (2) can also be supported by YIELD, depending on the implementation (which is true for Intel as well). This could be independent of the core supporting SMT.

 - Derek

In any case we 
> 
> Re the use of yield in SpinPause(): this looks correct to me.  OK.
> 
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the aarch64-port-dev mailing list