Spin Loop Hint support: Draft JEP proposal

Fri Oct 9 01:18:22 UTC 2015

On Oct 8, 2015, at 12:39 AM, Gil Tene <gil at azul.com> wrote:
> 
> On the one hand:
> 
> I like the idea of (an optional?) boolean parameter as a means of hinting at the thing that may terminate the spin. It's probably much more general than identifying a specific field or address. And it can be used to cover cases that poll multiple addresses (an or in the boolean) or look a termination time. If the JVM can track down the boolean's evaluation to dependencies on specific memory state changes, it could pass it on to hardware, if such hardware exists.

Yep.  And there is a user-mode MWAIT in SPARC M7, today.  For Intel, Dave Dice wrote this up:
  https://blogs.oracle.com/dave/entry/monitor_mwait_for_spin_loops

Also, from a cross-platform POV, a boolean would provide an easy to use "hook" for profiling how often the polling is failing.  Failure frequency is an important input to the tuning of spin loops, isn't it?  Why not feed that info through to the JVM?

> On the other hard:
> 
> Unfortunately, I don't think that hardware support that can receive the address information exists right now,

(It does, on SPARC.)

> and if/when it does, I'm not sure the semantics of passing the boolean through are enough to cover the actual way to use such hardware when it becomes available.

The alternative is to have the JIT pattern-match for loop control around the call to Thread.yield. That is obviously less robust than having the user thread the poll condition bit through the poll primitive.

> It is probably premature to design a generic way to provide addresses and/or state to this "spin until something interesting changes" stuff without looking at working examples. A single watched address API is much more likely to fit current implementations without being fragile.
> 
> ARM v8's WFE is probably the most real user-mode-accesible thing for this right now (MWAIT isn't real yet, as it's not accessible from user mode). We can look at an example of how a spinloop needs to coordinate the use of WFE, SEVL, and the evaluation of memory location with load exclusive operations here: http://lxr.free-electrons.com/source/arch/arm64/include/asm/spinlock.h . The tricky part is that the SEVL needs to immediately proceed the loop (and all accesses that need to be watched by the WFE), but can't be part of the loop (if were in the loop the WFE would always trigger immediately). But the code in the spinning loop can can only track a single address (the exclusive tag in load exclusive applies only the the most recent address used), so it would be wrong to allow generic code in the spin (it would have to be code that watches exactly one address). 
> 
> My suspicion is that the "right" way to capture the various ways a spin loop would need to interact with RFE logic will be different than tracking things that can generically affect the value of a boolean. E.g. the evaluation of the boolean could be based on multiple addresses, and since it's not clear (in the API) that this is a problem, the benefits derived would be fragile.

Having the JIT explore nearby loop structure for memory references is even more fragile.

If we can agree that (a) there are advantages to profiling the boolean parameter for all platforms, and (b) the single-poll-variable case is likely to be optimizable sooner *with* a parameter than *without*, maybe this is enough to tip the scales towards boolean parameter.

The idea would be that programmers would take a little extra thought when using yield(Z)Z, and get paid immediately from good profiling.  They would get paid again later if and when platforms analyze data dependencies on the Z.

If there's no initial payoff, then, yes, it is hard asking programmers to expend extra thought that only benefits on some platforrms.

— John