[concurrency-interest] Spin Loop Hint support: Draft JEP proposal

Thu Oct 15 03:53:15 UTC 2015

I agree on the separation between spin-hinting and monitor-like constructs. But not so much on the analogy to or use of the term "yield" to describe what is intended y spin hints.

On the name choice: things that include the word "yield" vs. spinLoopHint()::

While the spinYield() example in your e-mail below can work from a semantic point of view in the same code, IMO the word "yield" suggests the exact opposite of what spnLoopHint() is intending to do or hint at: spinLoopHint is motivated by wanting to spin while holding onto the CPU (intentionally not yielding to other processes), and by the wish to improve performance while doing so, primarily by reducing the reaction time to a loop-terminating event. So spinLoopHint() is something that very selfish spinning code does without reducing it's selfishness. In contrast, yield() is virtually always done as an unselfish act (the very word suggests it). The general expectation with yield() calls is that that the OS scheduler can make use of the core for other uses, it is OK to switch away form the current task since the thread may not be making progress, or may be ok with relinquishing resources to be "nice" to others". As such, a yield() generally suggests a system call, a (relatively) large overhead, and a willingness to sacrifice reaction time in order to allow others to make use of the CPU.

My preferred choice of the spinLoopHint() name comes directly from how the behavior expectations of of a PAUSE-like instruction are already expressed in relevant documents. E.g. Intel's documentation describes PAUSE instructions as a "Spin Loop Hint".

On the no-args vs. "with some arg" variants:

With regards to passing a time value (e.g. the yield(long nanoseconds) example in your e-mail below): A spinLoopHint() is a natural fit for loops that already do their own spinning, and as such needs to allow those loop to deal with their own composition, choices around termination conditions, state updates, and choices about backing of or employing time or count based termination behaviors. Choosing and dictating a specific mechanism (nanoseconds) or terminating condition checks will interfere with the compostability of spinLoopHint() into such code. To use a specific example: E.g. the yield(long nanoSeconds) form (even if it's called spinLoopHint(long nanoSeconds)) would not be directly usable in the following code:

while (!(doneCond1 || doneCond2 || count++ > backOffThreshld) {
	spinLoopHint():
}

Similarly, a no-args spinLoopHint() will cleanly drop into things like the disruptor's bust spin waitFor() variant (https://github.com/LMAX-Exchange/disruptor/blob/f29b3148c2eef3aa2dc5d5f570d7dde92b2f98ba/src/main/java/com/lmax/disruptor/BusySpinWaitStrategy.java#L28), but something that takes a nanoseconds argument would not.

I think that compose-abiliy should be our main driver here. Java code already knows how to spin in many interesting ways. It just needs to have away to hint that reaction time is more important that speed in the spin, and that's what I'm suggesting as the main purpose of a spinLoophint(). See proposed JavaDoc below.

— Gil.

/**
 * Provide the JVM with a hint that this call is made from within a spinning
 * loop. The JVM may assume that the speed of executing the loop (e.g. in
 * terms of number of loop executions per second) is less important than the
 * reaction time to events that would cause the loop to terminate, or than
 * potential power savings that may be derived from possible execution
 * choices. The JVM will not slow down the loop execution to a point where
 * execution will be delayed indefinitely, but other choices of loop execution
 * speed are system-specific. Note that a nop is a valid implementation of
 * this hint.
 */
public static void spinLoopHint() {
}

> On Oct 14, 2015, at 11:04 PM, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> Some notes after reading follow-ups.
> 
> One question is whether there should be a method that clues in
> the JVM about what change is being waited for. This is the territory of
> monitor-like constructions (see below), as opposed to the
> yield/sleep-like constructions that Gil was initially proposing.
> 
> For these, the next question is whether this should be more
> like Thread.yield() vs Thread.sleep(). If it could be like
> sleep, then new a API might not be needed: JVMs could
> implement sleep(0, 1) (or any small value of nanosec arg)
> using a PAUSE instruction on platforms supporting them.
> But sleep is also required to check interrupt status,
> which means that at least one extra load would be needed
> in addition to PAUSE. So it seems that something yield-like
> (with no obligation to check interrupt) is still desirable,
> leading either to my original suggestion:
> 
>  /**
>   * A hint to the platform that the current thread is momentarily
>   * unable to progress...
>   */
>  public static void spinYield();
> 
> OR something more analogous to sleep, but without interrupt check:
> 
> /**
>  * A hint to the platform that the current thread is unlikely
>  * to progress for the indicated duration in nanoseconds...
>  */
>  public static void yield(long nanoSeconds);
> 
> When available, JVMs would implement small values via PAUSE,
> larger by calling plain yield(), but in no case promising to
> return in either at least or at most the given duration.
> While it is a little odd, it seems to cover John Rose's desire
> to force an argument dependency.
> 
> I think either of these would be OK.
> 
> We'd use this functionality in a few places inside java.util.concurrent.
> We can't do so as aggressively as some users might like: we
> generally bound spin-then-block constructions to an approximation
> of best-case unavailability (lock-hold etc) times, so as to
> work OK when systems are heavily loaded. When we have done more
> than this, we have gotten justifiable complaints. But we also
> have "try" and "poll" forms of almost everything so users can
> add additional spins themselves. Or create custom sync using
> base capabilities.
> 
> Back to the question of monitor-like constructions:
> 
> Low-level memory-wait instructions are limited in what they
> can wait for -- basically only changes at fixed addresses.
> This is not an easy fit for GCed languages where the address
> of a variable might change. However, there is at least one
> case where this can work: park/unpark are (and are nearly forced
> to be) implemented using an underlying native-level semaphore.
> So it should be possible to at least sometimes use MWAIT
> inside park to reduce unproductive context switches.
> The "sometimes" part might vary across platforms.
> In particular, the implementation of LockSupport.parkNanos
> could always just invoke an MWAIT-based intrinsic for small
> arguments. It would be great if people working on hotspot
> explored such options.
> 
> So for this particular application of MWAIT-like support
> (which should be vastly more common than other uses anyway),
> we could side-step for now analogs of proposed C++ "synchronics"
> and the like that would require unknown mechanics on
> still-unreleased VarHandles.
> 
> -Doug
>