Spin Loop Hint support: Draft JEP proposal

Thu Oct 8 07:39:30 UTC 2015

On Oct 7, 2015, at 3:01 PM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

On Oct 5, 2015, at 2:41 AM, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:

Hi Gil,

On 04/10/15 17:22, Gil Tene wrote:

Summary

Add an API that would allow Java code to hint that a spin loop is
being executed.

I don't think this will work for ARM, which has a rather different
spinlock mechanism.

Instead of PAUSE, we wait on a lock word with WFE.  WFE puts a core
into a lightweight sleep state waiting on a particular address (the
lock word) and a write to the lock word wakes it up.  This is very
useful and somewhat analogous to 86's MONITOR/MWAIT.

I can't immediately see how to generalize your proposal to ARM, which
is a shame.

Suggestion:  Allow the hint intrinsic to take an argument, from which
a JIT can infer a memory dependency (if one is in fact present).

Even if we are just targeting a PAUSE instruction, I think it is helpful
to the JIT to add more connection points (beyond control flow) between
the intrinsic and the surrounding loop.

class jdk.internal.vm.SpinLoop {
    /** Provides a hint to the processor that a spin loop is in progress.
     *  The boolean is returned unchanged.  The processor may assume
     *  that the loop is likely to continue as long as the boolean is false.
     *  The processor may pause or wait after a false result, if there is
     *  some reason to believe that the boolean argument, if re-evaluated,
     *  will be false again.  Any pausing behavior is system-specific.
     *  The processor may not pause indefinitely.
     *  <p>Example:
     * <blockquote><pre>{@code
MyMailbox mb = …;
while (true) {
  if (!pollSpinExit(mb.hasMail())  continue;
  Object m = mb.getMail();
  if (m != null)  return m;
}
     * }</pre></blockquote>
     * /
   @jdk.internal.HotSpotIntrinsicCandidate
    public static boolean pollSpinExit(boolean spinExit) { return spinExit; }
}

I'm going to guess that the extra hinting provided by the parameter would
make it easier for a JIT to generate MWAIT and WFEs.

On the one hand:

I like the idea of (an optional?) boolean parameter as a means of hinting at the thing that may terminate the spin. It's probably much more general than identifying a specific field or address. And it can be used to cover cases that poll multiple addresses (an or in the boolean) or look a termination time. If the JVM can track down the boolean's evaluation to dependencies on specific memory state changes, it could pass it on to hardware, if such hardware exists.

On the other hard:

Unfortunately, I don't think that hardware support that can receive the address information exists right now, and if/when it does, I'm not sure the semantics of passing the boolean through are enough to cover the actual way to use such hardware when it becomes available. It is probably premature to design a generic way to provide addresses and/or state to this "spin until something interesting changes" stuff without looking at working examples. A single watched address API is much more likely to fit current implementations without being fragile.

ARM v8's WFE is probably the most real user-mode-accesible thing for this right now (MWAIT isn't real yet, as it's not accessible from user mode). We can look at an example of how a spinloop needs to coordinate the use of WFE, SEVL, and the evaluation of memory location with load exclusive operations here: http://lxr.free-electrons.com/source/arch/arm64/include/asm/spinlock.h . The tricky part is that the SEVL needs to immediately proceed the loop (and all accesses that need to be watched by the WFE), but can't be part of the loop (if were in the loop the WFE would always trigger immediately). But the code in the spinning loop can can only track a single address (the exclusive tag in load exclusive applies only the the most recent address used), so it would be wrong to allow generic code in the spin (it would have to be code that watches exactly one address).

My suspicion is that the "right" way to capture the various ways a spin loop would need to interact with RFE logic will be different than tracking things that can generically affect the value of a boolean. E.g. the evaluation of the boolean could be based on multiple addresses, and since it's not clear (in the API) that this is a problem, the benefits derived would be fragile. In addition, there can validly be state mutating logic in the loop (e.g. counting), and implicitly re-executing that logic repeatedly inside a pollSpinExit(booleanThatOnlyWatchesOneAddress) call would seem "wrong" (the logic would presumably proceed the call, and it would be surprising to see it execute more than once within the call).

I suspect that the right way to deal with RFE would be to provide an API that is closer to what it needs (and which is different from spin-hinting in the loop). E.g. some way to designate the beginning of the loop (so SEVL could be inserted right before it), some way to indicate the address that needs to use exclusive load in the loop, and some way to indicate that the loop is done. A possible way to do this is by wrapping the spinloop code and providing the address.

E.g.:

/**
 * Execute the spinCode repeatedly until it returns false. The processor
 * may assume that of the return value is false, it is likely to continue to
 * return false as long as the contents of the fieldToWatch field of the
 * objectToWatchFieldIn object does not change. he processor may therefore
 * pause or wait after a false result. The processor must not pause indefinitely,
 * but other pausing behavior is system-specific.
 */
void spinExecuteWhileTrue(BooleanSupplier spinCode, Field fieldToWatch, Object objectToWatchFieldIn);

This would probably be a good fit for the specific WFE/SEVL semantics: the loop is implicit to the call, so the SEVL can be placed ahead of it; The loop can perform a load exclusive on the designated field of the designated object, and the spinCode can then do whatever it wants, with the understanding that no address other than the fieldToWatch is being watched to provide a timely exit from the loop. [Similar variant can be done for watching array fields].

The same single-watched-address API will probably fit MONITOR/MWAIT if it becomes available, and possibly ll/sc variants in other CPUs too. But wider-watching variants (NCAS, TSX) will not be covered by this API. And common uses of the x86 PAUSE instruction wouldn't either (since they are not limited at all to a limited number of addresses). Th good news is that even though the single-address-watching API only covers limited use cases, it can be easily implemented on architectures that only support spin hinting. So if someone's use case does fit into the API and is codes to that form, they are likely to gain benefits on both types of platforms.

This leads me to believe that we are looking at two different APIs:
- Spin loop hinting (matching the mature use cases of the PAUSE instruction in x86 and HW thread priority reduction in Power).
- Single-watched-address spinning, matching ARM v8's WFE/SEVL use case, and potential other single address watchers (MONITOR/WAIT, and potential ll/sc based hints in other future cpus).

I think the first use case is very mature and well understood, and certainly ready for a long term supported Java SE API. The second use case only applies to recently introduced hardware (ARM v8 right now), but it is fairly simple and *may* be useful more widely in the future.

Since it can be beneficially intrinsified on platforms that support the wider spin-hinting API, we could add the single-address-watching for the JEP (as the two do seem related). I just worry that the questions about the usefulness and  longevity of the single-address-watching use model may shadow the simplicity and apparent slam-dunkness of the spin loop hinting solution.

Also, the boolean argument is easy to profile in the interpreter, if that's what
a VM wants to do.

For a similar mechanism (which again uses a boolean to provide IR
connection to a data dependency), see:

http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/fe40b31c0e52/src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java#l697

In fact, something like the profileBoolean intrinsic might be useful to allow
spin loops to gather their own statistics.  Getting the array allocation right
might require an invokedynamic site (or some equivalent, like a static
method handle), in order to control the allocation of profile state per call site.

HTH
— John

P.S. I agree with others that this needs cooking, in a jdk.internal place,
before it is ready for SE.