[concurrency-interest] Spin Loop Hint support: Draft JEP proposal

Thu Oct 29 07:52:46 UTC 2015

[Sorry for the 4 day delay in response. JavaOne sort of got in the way]

I think we are looking at two separate and almost opposite motivations, each of which is potentially independently valid. Each can be characterized by answering the question: "How does adding this to an empty while(!ready) {} spin loop change things?".

Putting name selection aside, one motivation can be characterized with "if I add this to a spinning loop, keep spinning hard and don't relinquish resources any more than the empty loop would, but try to leave the spin as fast as possible. And it would be nice if power was conserved as a side effect.". The other motivation can be characterized with "If I add this to a spin loop, I am indicating that I can't make useful progress unless stuff happens or some internal time limit is reached, and that it is ok to try and make better use of resources (including my CPU), relinquishing them more aggressively than the empty loop would. And it would be nice if reaction time was faster most of the time too". 

The two motivations are diametrically opposed in their expected effect when compared to the behavior of an empty spin loop that does not contain them. Both can be validly implemented as a nop, but they "hint" in opposite directions. The former is what I have been calling a spin loop hint (in the "keep spinning and don't let go" sense), and the latter is a "spin/yield" (in the "ok to let go" sense). They have different uses.

> On Oct 24, 2015, at 11:09 AM, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> 
> Here's one more attempt to explain why it would be a good idea
> to place, name, and specify this method in a way that is more
> general than "call this method only if you want a PAUSE instruction
> on a dedicated multicore x86":

I agree with the goal of not aiming at a processor specific behavior, and focusing on documenting intent and expectation. But I think that the intent suggested in the spinLoopHint() JavaDoc does that. As noted later in this e-mail, there are other things that the JVM can choose to do to work in the hint's intended direction.

> 
> On 10/15/2015 01:23 PM, Gil Tene wrote:
> ...
>> 
>> As noted in my proposed JavaDoc, I see the primary indication of the hint to
>> be that the reaction time to events that would cause the loop to exit (e.g.
>> in nanosecond units) is more important to the caller than the speed at which
>> the loop is executing (e.g. in "number of loop iterations per second" units).
> 
> Sure. This can also be stated:
> 
> class Thread { ...
> /**
>  * A hint to the platform that the current thread is momentarily
>  * unable to progress until the occurrence of one or more actions of
>  * one or more other threads (or that its containing loop is
>  * otherwise terminated).  The method is mainly applicable in
>  * spin-then-block constructions entailing a bounded number of
>  * re-checks of a condition, separated by spinYield(), followed if
>  * necessary with use of a blocking synchronization mechanism.  A
>  * spin-loop that invokes this method on each iteration is likely to
>  * be more responsive than it would otherwise be.
>  */
>  public static void spinYield();
> }

I like the "more responsive than it would otherwise be" part. That certainly describes how this is different than an empty loop. But the choice of "mainly applicable" in spinYield() is exactly opposite from the main use case spinLoopHint() is intended for (which is somewhere between "indefinite spinning" and "I don't care what kind of spinning"). This JavaDoc looks like a good description of spinYield() and it's intended main use cases, but this stated intent and expectations (when compared to just doing an empty spin loop) works in the opposite direction of what spinLoopHint's intent and expectations need to be for it's common use cases.

> 
>> Anyone running indefinite spin loops on a uniprocessor deserves whatever they
>> get. Yielding in order to help them out is not mercy. Let Darwin take care of
>> them instead.
>> 
>> But indefinite user-mode spinning on many-core systems is a valid and common
>> use case (see the disruptor link in my previous e-mail).
> 
>> In such situations the spinning loop should just be calling yield(), or
>> looping for a very short count (like your magic 64) and then yielding. A
>> "magically choose for me whether reaction time or throughput or being nice to
>> others is more important" call is not a useful hint IMO.
>> 
>> Like in my uniprocessor comment above, any program spinning indefinitely (or
>> for a non-trivial amount of time) with load > # cpus deserves what it gets.
> 
> The main problem here is that there are no APIs reporting whether
> load > # cpus, and no good prospects for them either, especially
> considering the use of hypervisors (that may intentionally mis-report)
> and tightly packed cloud nodes where the number of cpus currently
> available to a program may depend on random transient effects of
> co-placement with other services running on that node.

Since a simple empty spinning loop ( while(!ready){} ) is valid, even if/when stupid, on any such setup, I don't see how a hint needs to carry a higher burden of being able to know these things. Such empty loops are already being used in both indefinite and backing-off spinning situations, along with the risks, responsibilities, and sensitivities that performing such spinning carry. It is hard to argue against the obvious and very real benefits that indefinite spinning loops provide on well provisioned many-core systems, in terms of latency behavior and reaction time when compared with back-off variants. Yes, they come with extra risks of performance degradation when control is lacking, but they are so useful that their existence proof probably trumps the "people shouldn't do this" argument.

So lets look at what each call would do compared to just having an empty loop: The starting point of a pure empty loop obviously does not imply or hint that the JVM should take extra steps to yield resources in the loop. The JVM/OS/Hypervisor certainly MAY do that, but there is no declaration of this intent, and probably no expectation that such yielding would be more likely in the loop than anywhere else in the code.

In the case where a spin hint is added:

while(!ready){ spinLoopHint(); };

The *only* intent declared [in my suggested JavaDoc for the hint] (above the empty loop implementation) is the wish improve the speed of reacting to "ready" becoming true, and the willingness to sacrifice the "speed" of iteration (number of times/sec around the loop) in service of that wish. This would be the common case in indefinite spinning situations that are prevalent in many-core latency sensitive stacks today. [e.g. I would expect https://github.com/LMAX-Exchange/disruptor/blob/f29b3148c2eef3aa2dc5d5f570d7dde92b2f98ba/src/main/java/com/lmax/disruptor/BusySpinWaitStrategy.java#L28 to elect to use the spinLoopHint() ]. It does not harm, and can only help the cause.

[Note that there is currently no way to achieve this hint in Javadom, leaving such busy spinning strategies written in Java at a disadvantage when compared to their C cousins executing on identical platforms. That's the gap that this proposed spinLoopHint() JEP is intended to close.]

In contrast:

while(!ready){ spinYield(); };

Would declare (per the JavaDoc suggested for spinYield()) a very different intent: I.e. an intent to spin but eventually back off, and a wish to relinquish resources (including the cpu itself) more aggressively than the empty loop would.

While I can certainly envision new implementations that may want to use such a call, it would be useful to try and find actual places where this call would be made in current use cases. Since most of the the desired effect can be achieved in current Java, there are already multiple implementations of non-indefinite spinning out there, and looking at them in this context may be useful.

Having done a cursory scan of a few such loops, I suspect that many current spin-then-backoff implementations are likely to avoid using such a fuzzy implementation because they would normally desire more control over the backoff logic. E.g. various "non-busy" WaitStrategy variants (see implementations of WaitStrategy found here: https://github.com/LMAX-Exchange/disruptor/tree/f29b3148c2eef3aa2dc5d5f570d7dde92b2f98ba/src/main/java/com/lmax/disruptor ) make specific choices about how to not busily-spin. Specific and current non-busy-spinning implementation variants include yielding, blocking, sleeping, blocking with a timeout, "lite" blocking, and a phased backoff strategy. I would expect that none of those would make use of the suggested spinYield() because each is making a different choice about backoff behavior. However, several of them (e.g. YieldingWaitStrategy and PhasedBackoffWaitStrategy ] would probably make use of spinLoopHint() [in the spinning parts that have not yet decided to back off], even though they don't spin indefinitely. [here too, spinLoopHint does no harm, and can only help their cause].

It does feel like letting the JVM (and underlying platform) implement spinning logic may be more desirable and portable for some things than specific strategy implementations written in Java, but evolving the proper API for such spins-with-backoff will probably entail studying the various things that may be expected of them (the richness of the Disruptor's not-entirely-busy strategies alone suggests that there is a need to indicate intent more clearly and richly than a single no-args call can do). I would submit that developing this API is orthogonal to the intend and purpose of the proposed spinLoopHint() JEP, and that we should work on it separately.

> And given that programmers cannot portably comply, the method must
> allow implementations that take the best course of action known to the JVM.

I agree, but in the sense of "best course of action for achieving the implied intent compared to an empty spin loop with no hint". We agree that a nop implementation is valid. Things that "do more than a nop" should strive to move the behavior in the indicated direction compared to that. Agreeing on what that direction is for each call is key.

There is a lot more than PAUSE that can be done in a spinLoopHint(), BTW. E.g. since a spinLoopHint() (in my suggested JavaDoc intent) indicates higher responsiveness as a priority, it would be valid and useful (but in no way required) for the JVM to work to *reduce* the likelihood of yielding the CPU in a loop that contained the hint. E.g. if there was some way for the JVM to communicate this preference to the underlying scheduling levels (OS, hypervisor, and even BIOS and HW power management), that would work to improve the behavior in the desired direction. I can envision interesting choices around isolcpus, tasksets, and weight decisions in cpu load balancing decisions, or even priorities. But I really have no desire to implement any of those at this time…

> Despite all of the above, I agree that an OK initial hotspot implementation
> is just to emit PAUSE if on x86 else no-op. It might be worth then
> experimenting with randomized branching etc on other platforms, and
> someday further exploring some cheap form of load detection, perhaps
> kicking in only upon repeated invocation.
> 
> -Doug
>