request for advice: safepoints in the JSR 292 spec

Wed Dec 15 09:22:05 PST 2010

On Dec 15, 2010, at 1:31 AM, John Rose wrote:

> On Dec 13, 2010, at 12:19 AM, Rémi Forax wrote:
>> On 12/12/2010 05:02 PM, Rich Hickey wrote:
>>> Rémi's synchronized block only coordinates the activities of  
>>> updaters.
>>> Other threads may, and in some cases may have to, see some of the
>>> setTarget results prior to the sync call, which could be a mess. The
>>> point of syncTargets is to make the setting and the visibility  
>>> atomic,
>>> which synchronized cannot do.
>
> Yes, syncTargets would be very powerful.  But it would also be hard to
> require of all JVMs.  Speaking only for HotSpot, we don't do  
> everything
> with safepoints, because they are expensive.  We use racy updates
> whenever we can get away with it.  The cost of a racy update is the
> co-existence of two states, unpredictably visible to various threads.
> I think that's a normal complexity cost for doing good language
> implementations.
>
> I think mapping a global atomic update to the JMM would require
> more "magic edges" in the happens-before graph.  The proposal
> I posted, while weaker, has a correspondingly simpler impact on
> the JMM. This is another way of observing that JVMs are likely to
> have an easier time of adding the proposed functionality.
>
> So a globally atomic update is harder to implement and harder
> to specify.  It is also overkill for a common use case, which is
> delayed optimization of call sites.  See below...
>
>> Rich,
>> I don't think you can provide an optimized method handle when  
>> syncing but
>> more probably a default generic method that will later installs a  
>> fast path.
>
> Thanks, Remi, for explaining this.  I'm going to pile on here.
>
> (I have one comment on your code; see below.)
>
> I think of the pattern Remi sketches as a 2-phase commit.
>
> Phase 0 and phase 2 are long-term phases.  Phase 1 is brief but not  
> atomic.
>
> Phase 0 is the reign of the old target T0, before any MCS.setTarget.
>
> Phase 1 starts when metadata locks are grabbed.
> Under the locks, MCS.setTarget installs a default generic method T1.
> (This is analogous to the JVM trick of setting a call site to the  
> "unlinked" state.)
>
> T1 is not racy.  It is careful to grab a reader lock on metadata.
> It is likely to install an optimized method T2, via a simple  
> setTarget.
> This may happen after an invocation count, or after user-level  
> profiling.
> (Therefore, it does not make sense to try to guess at T2 during  
> phase 1.)
>
> The MCS.sync operation is performed during phase 1, after all
> relevant setTargets are done.  It has the effect of excluding threads
> from observing target T0.  (I.e., it "flushes" T0 from the system.)
>
> Phase 2 starts when metadata locks are released.  During phase 2,
> individual threads eventually execute T1.  T1 lazily decides to  
> install T2.
> (Or several equivalent versions of T2.)
>
> Threads which observe T1 (because of caching or inlining) will perform
> sync actions which will force them to observe more recent call site  
> targets.
>
> During phase 0, T2 cannot be observed.  During phase2, T0 cannot be  
> observed.
> The intermediate target T1 can be observed during any phase.
>
> Compare that with Rich's proposed atomic syncTargets operation,
> which would exclude phase 1 and target T1, for better and worse.
>
> Another way of comparing syncTargets with setTarget+sync is
> simply to syncTargets excludes target T1 from phase 0, whereas
> the weaker proposal does not exclude T1.
>
> This weakness can also be described in terms of two reader
> threads, a Fast Reader and a Slow Reader.  The Fast Reader
> sees the result of the writer's setTarget of T1 in the same
> nanosecond.  The Slow Reader sees only T0 until it is
> forced to pick up T1 by the sync operation.
>
>

This API has several presumptions:

- There is some singular global metadata.

- It is protected by a lock. What if it is an immutable structure in  
an AtomicReference and modified via CAS? Or passed by value to targets/ 
handles?

- Call sites will always use a two-phase update, i.e. they will need  
to rediscover their binding vs directly target it (during metadata  
update) given their cached data and the new metadata. You are  
piggybacking on this presumption to provide a consistency hook  
(blocking on the read lock).

While T1 is not strictly MT racy (given the adoption of this pattern  
wholesale), it is prone to error, in that should the update phase  
interleave ordinary code and setTarget calls, it can see an  
inconsistent state (i.e. nothing keeps call sites the update thread  
encounters from moving to T1.5 prior to the sync call, since the  
updater thread holds the lock). Many dynamic language implementations  
use their own dynamic code in a way that could trip over this.

This is an API that only works correctly given an extremely  
constrained pattern of use. That's fine, if unfortunate, but needs to  
go in the docs I think. It is essential that the pattern be - grab the  
lock, then 'modify' all metadata (no setTargets), then update all  
targets (no calls through callsites), then sync (under the lock), and  
that the targets *must be* stubs that grab the lock and then a) do  
nothing (if not lazy) or b) setTarget (under the lock), if lazy.

The phases you describe are derived from this pattern and do not fall  
out of the API.

I proposed one thing (syncTargets), and asked one question (does that  
obsolete setTarget).

The use case Remi and you describe makes it clear that something like  
setTarget (with ordinary field synchronization semantics) would still  
have utility for two-phase callsites. It would be overkill to sync  
again, presuming fastpath calculation is idempotent and acceptable to  
do over in multiple threads. However, it's error prone, as Remi's  
example is broken, as you pointed out - setting the target outside of  
the lock means the site can (permanently) miss an update.

If syncTargets were possible, you'd require no locks, and you might  
want something more like a CAS-based swapTarget for lazy sites.  
Coordination around metadata (if any) would be completely orthogonal.

invariant broken (writer):
  {
  // calculate new metadata
  // determine new targets
  // update all callsites
  syncTargets(impacted callsites, new targets);
  }

default generic method (reader):
  {
  // check arguments
  // create a fast path
  swapTarget(this?, guard + fastpath);
  }

I'll leave it to your expertise whether this combination is harder to  
implement and harder to specify. That certainly dominates. The  
benefits of syncTargets and swapTarget from an API perspective are  
that, given appropriate regard for the expense of syncTargets - you  
can't get it wrong, the API itself delivers the coordination  
semantics, there is no elaborate pattern to replicate, and the field  
of use is broader.

Rich