request for advice: safepoints in the JSR 292 spec

Fri Dec 10 18:04:11 PST 2010

On Dec 10, 2010, at 5:08 AM, Doug Lea wrote:

> On 12/09/10 19:09, John Rose wrote:
>> I started a thread on Google Groups to get more advice on safepoint-based
>> invalidation, which the EG is naming MutableCallSite#sync.
>> 
>> http://groups.google.com/group/jvm-languages/browse_thread/thread/9c9d3e84fc745676#
>> 
> 
> TL;DR version: The scheme you laid out here seems sound.

Thanks very much, Doug.  I assume you are primarily referring to part (a) of my question:

>> In order to work, the specification has to (a) make logical sense in the terms of the JMM, (b) be reasonably implementable by JVMs, and (c) be useful to programmers.

Does anyone have a take on (c)?  (See my comments below on using this API...)

> My only concern about this is whether it precludes
> further useful support for the general problem
> of "phased computation"...

I hope not, though JSRs 292 and 166 are an instance of:
  http://en.wikipedia.org/wiki/Conway's_Law

The worst case is that we are seeing the good being the enemy of the better.  The best case is we are doing initial exercises in JDK 7 which will get well-structured in JDK 8.  I vote for best case.

The device of associating a "nonce-volatile" with a safepoint-phased variable update is generally applicable, not just to MutableCallSite.  Perhaps once we have a suitable LValue abstraction (able to efficiently refer to x.f or a[i], which are your two implementations of phased variables), we can make a standard operation LValue.setOnceVolatile.

Doug, you've asked before for LValue.  As we've discussed, in order to do LValue, the JVM needs very reliable scalarization.  This means alleviating certain obstacles from the JOM (Java Object Model), notably pointer comparison and monitor-per-object.  Tuples would help also.  The MLVM project needs to work on this stuff; the key resource shortage is the most valuable resource, which is people (esp. HotSpot hackers) to think and work on it.

> For phased computations, the main economies apply
> to "phased" variables -- those that may take a
> new value on each phase but are constant within phases.

Thanks for this concept.

> Sadly enough, I don't know of a recipe determining which
> of these options is best in a given situation. (Although
> I do know one of the ingredients: whether you require
> atomicity of updates across all parties upon advance.)
> Maybe some further study and exploration could arrive at a
> good recipe.

In the JSR 292 case, we are supporting languages (like Java!) which can dynamically change their type schemas, but do so infrequently, perhaps at phase changes.  The changes will in general require complex sets of call sites to be updated, implying a dependency mechanism (see the Switcher class for this).  But the actual change-over can be slow, and may involve safepoints, as today with devirtualization that happens when new classes are loaded.

(See Universe::flush_dependents_on around line 1100 of:
  http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/tip/src/share/vm/memory/universe.cpp )

The idea is that the type schema change would be prepared (as a "next global state") by a writer thread #1.  At this point some call sites become "invalid in next state".  Those call sites would be switched, either by a switcher (which would in general control many call sites at once) or by editing them individually, and then calling 'sync' on them.

The new state of all call sites must at this point be consistent with both the current and next global state of the type schema.  Typically, a lock would be held on the schema by writer thread #1, and the updated call site paths (now switched in and sync-ed) would pile up on the schema lock.  After the next global state is installed as current, the writer thread #1 lets go of the lock, and callers are free to shake out the new state of affairs.

The shake-out may include installing optimized call paths in various call sites.  This means that a mutable call site might undergo two mutations, the first to get into a state consistent with both old and new schemas (perhaps seizing a lock, as above) and the second to optimize calling within the new state.  Both transitions generally need to grab a lock so they can have a consistent view of the global schema state.

As an extra complication, the editing of call sites is probably performed in a concurrent manner, and lazily.  There might be writer thread #2, etc., performing updates to mutable call sites.  It it not clear to me whether the second write also needs a sync, but it looks like that may be the case to get performance.  Perhaps the right way to go is avoid the second write, and have the first write install the optimized path immediately, and queuing a 'sync' request to be executed after a short delay.  One key concern is to avoid "safepoint storms", by batching 'sync' requests.

> In the mean time, note that only one of these options
> (explicit current+next) even has a reasonable interpretation
> in existing JVM-ese. The others are mismatched in that
> some of the operations have JMM-volatile semantics and some
> don't, so it doesn't make sense to declare the variable
> as either volatile or non-volatile.
> 
> John's proposal for MutableCallSite falls into this category,
> hence the controversy about how to express it.
> It is a specialization of the onAdvance (aka sync()) approach
> with in-place updates, and safepoint generations serving
> as phases. This seems reasonable given the likely need for
> atomicity during safepoints anyway.

I'd like to build both on top of LValue operations, but not in JDK 7.

Someone with a thesis to write should propose some conservative additions to the JMM to operate on LValues (or something like them).  They can start with nonce-volatiles, but there are probably better ideas waiting to be recognized.

> While ideally it would be nice to integrate this with
> other phased computation even now (I can imagine defining
> adjunct classes for use with j.u.c.Phaser), I'm not too
> tempted to do so right now, because it is not always the
> best available option.

Let's keep thinking about building blocks, for inclusion in a JDK 7+O(1).

> One
> way to help reach this goal is to improve JVM-level support
> so that the most structured parallel code is also the fastest
> parallel code.

Yes.  If the fastest code is unstructured, the fastest code will be unreasonably expensive in most cases.  Which would be dumb economics.

-- John