[jmm-dev] Does a volatile load have to see the exact volatile store to synchronize?

Tue May 1 14:06:04 UTC 2018

Indeed, Power never has been multi-copy atomic, and last I checked,
Power's Java implementations used lighter-weight barriers.

							Thanx, Paul

On Mon, Apr 30, 2018 at 09:17:42PM -0700, Daniel Lustig wrote:
> (re-sending as a jmm-dev subscriber...sorry for spamming the four of
> you on cc explicitly)
> 
> Hans' analysis looks right to me at first glance.  This is essentially
> "WRR+2W" at https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf:
> two syncs there, plus the third sync either after each volatile store
> or before each volatile load.
> 
> In wg21.link/P0668r1, which I know many people here contributed to,
> there was a similar decision to make: weaken the spec (and retroactively
> possibly break some obscure code), or strengthen the fences in the
> mappings (and take the performance hit).  The consensus there seemed to
> be to weaken the spec, and I think rightfully so.  Is there any reason
> to think the consensus would ultimately be different here?  Maybe
> something Java-specific?
> 
> And FWIW, at least from our perspective at NVIDIA, Hans is right:
> non-multi-copy-atomicity is very much alive and well :)
> 
> Dan
> 
> On 4/30/2018 1:27 PM, Hans Boehm wrote:
> > I can't easily quantify the cost. I explicitly added some experts.
> > 
> > I believe that we would essentially have to stop using weak fences in the
> > volatile implementation, and just use heavy-weight syncs. Thus a load would
> > need two heavy-weight syncs and a store one, or the other way around.
> > 
> > I was hoping that anyone with an actual use case for the stronger version
> > would speak up. We do have some anecdotal evidence that it doesn't matter
> > much:
> > 
> > 1) AFAICT, all Power implementations have always been broken w.r.t. the
> > current spec, and nobody has complained.
> > 
> > 2) The standard de-facto programming model for Java consists mainly of DRF
> > + hacks for lazy initialization of immutable data (+ maybe incorrect
> > hacks). At least the first two of those are unaffected.
> > 
> > 3) This is a divergence from C++ that AFAIK has never really been
> > discussed. I suspect the only people who realized that are part of this
> > discussion.
> > 
> > I certainly share your concern about weakening this after the fact. OTOH,
> > we're not breaking portable code any more than it was already broken by
> > existing implementations.
> > 
> > I think the issue here is not just Power; it's also about constraints on
> > future processor designs (and possibly on software DSM implementations). As
> > much as I dislike non-multi-copy-atomic architectures for reasoning about
> > as a programmer, a lot of architects seem to believe that they are likely
> > to stay in some form, for highly parallel systems. Requiring
> > synchronization between threads that don't directly communicate seems to be
> > inherently questionable, at least without a strong use case.
> > 
> > Hans
> > 
> > 
> > On Fri, Apr 27, 2018 at 6:27 PM, Brian Demsky <bdemsky at uci.edu> wrote:
> > 
> >> Hi Hans,
> >>
> >> Do you have an estimate of how much it would actually slow down Java on
> >> Power to implement the spec?  Or good reason to believe that code doesn’t
> >> rely on the specified behavior?
> >>
> >> Brian
> >>
> >>> On Apr 27, 2018, at 6:07 PM, Hans Boehm <boehm at acm.org> wrote:
> >>>
> >>> [ This was previously posted to a smaller audience. Reposting here as the
> >>> next step. ]
> >>>
> >>> This seems to be a new Java memory model problem uncovered in response to
> >>> the revision of "release sequences" in C++. wg21.link/P0982 has details.
> >>> But if you don't care about the C++ memory model, you can ignore all that
> >>> and just read the following.
> >>>
> >>> Clearly this isn't the only or most serious open Java memory model
> >> problem.
> >>> But I think it's actually one that has a fairly simple point solution.
> >> And
> >>> it may be worth fixing without a comprehensive solution.
> >>>
> >>> Problematic litmus test:
> >>>
> >>> Writing =rlx for ordinary Java memory accesses and =sc for volatile ones,
> >>> consider
> >>>
> >>> Thread 1:
> >>> x =plain 1;
> >>> v =vol 1;
> >>>
> >>> Thread 2:
> >>> v =vol 2;
> >>>
> >>> Thread 3:
> >>> r1 =vol v;
> >>> r2 =plain x;
> >>>
> >>> Java disallows the final state, after joining all threads, of r1 = v = 2
> >>> and r2 = 0. Since in the end v = 2, Thread 2s assignment to v must have
> >>> followed Thread 1's  in the synchronization order. And in Java a volatile
> >>> store synchronizes with all later (in synchronization order) volatile
> >> loads
> >>> (Property A). This Thread 1 must synchronize with Thread 3, and r2 must
> >> be
> >>> 1.
> >>>
> >>> This diverges from the analogous C++ semantics. (The release sequence
> >>> problem there is a bit different.)
> >>>
> >>> The consensus of the experts in the other discussion is that this outcome
> >>> is in fact allowed on Power, with both of the standard compilation
> >> models.
> >>> Thus the spec and the implementations can't both be right in this regard.
> >>>
> >>> IIRC, the JMM discussion that led to this, like the one that led to the
> >>> vaguely analogous C++ problem, was more of a "why not" argument then
> >>> anything solid. Which in retrospect was probably unwise in both cases.
> >>> That, combined with the fact that this is a C++ vs Java divergence, and
> >> the
> >>> expense of actually conforming to the current spec on Power, suggests we
> >>> may want to call this a spec problem.
> >>>
> >>> The concrete proposal would be to change the bullet (in 17.4.4)
> >>>
> >>> * A write to a volatile variable v (§8.3.1.4) synchronizes-with all
> >>> subsequent reads of v by any thread (where "subsequent" is defined
> >>> according to the synchronization order).
> >>>
> >>> to (for now)
> >>>
> >>> * A write w to a volatile variable v (§8.3.1.4) synchronizes-with any
> >> read
> >>> of v that observes the value written by w.
> >>>
> >>> The reason I said "for now" is that I think we will eventually need
> >>> C++-style "release sequences" in order to prevent intervening RMW
> >>> operations from breaking the synchronizes with relationship here. Without
> >>> that some fairly basic idioms, like reference counting, would look
> >>> different in Java and C++, with Java being needlessly slower. But RMW
> >>> operations aren't yet a thing in the JLS, so we can leave that in the
> >>> bucket of other things that will eventually need fixing.
> >>>
> >>> The argument for doing this now rather than later is that the spec
> >> clearly
> >>> promises something that fails to hold for major implementations. And
> >>> somewhat uniquely, in this case, we do know how to fix it. There is no
> >>> reason to provide misleading information here.
> >>>
> >>> Opinions?
> >>>
> >>> Hans
> >>>
> >>
> >>
> > 
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information.  Any unauthorized review, use, disclosure or distribution
> is prohibited.  If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>