[jmm-dev] Does a volatile load have to see the exact volatile store to synchronize?

Tue May 1 04:17:42 UTC 2018

(re-sending as a jmm-dev subscriber...sorry for spamming the four of
you on cc explicitly)

Hans' analysis looks right to me at first glance.  This is essentially
"WRR+2W" at https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf:
two syncs there, plus the third sync either after each volatile store
or before each volatile load.

In wg21.link/P0668r1, which I know many people here contributed to,
there was a similar decision to make: weaken the spec (and retroactively
possibly break some obscure code), or strengthen the fences in the
mappings (and take the performance hit).  The consensus there seemed to
be to weaken the spec, and I think rightfully so.  Is there any reason
to think the consensus would ultimately be different here?  Maybe
something Java-specific?

And FWIW, at least from our perspective at NVIDIA, Hans is right:
non-multi-copy-atomicity is very much alive and well :)

Dan

On 4/30/2018 1:27 PM, Hans Boehm wrote:
> I can't easily quantify the cost. I explicitly added some experts.
> 
> I believe that we would essentially have to stop using weak fences in the
> volatile implementation, and just use heavy-weight syncs. Thus a load would
> need two heavy-weight syncs and a store one, or the other way around.
> 
> I was hoping that anyone with an actual use case for the stronger version
> would speak up. We do have some anecdotal evidence that it doesn't matter
> much:
> 
> 1) AFAICT, all Power implementations have always been broken w.r.t. the
> current spec, and nobody has complained.
> 
> 2) The standard de-facto programming model for Java consists mainly of DRF
> + hacks for lazy initialization of immutable data (+ maybe incorrect
> hacks). At least the first two of those are unaffected.
> 
> 3) This is a divergence from C++ that AFAIK has never really been
> discussed. I suspect the only people who realized that are part of this
> discussion.
> 
> I certainly share your concern about weakening this after the fact. OTOH,
> we're not breaking portable code any more than it was already broken by
> existing implementations.
> 
> I think the issue here is not just Power; it's also about constraints on
> future processor designs (and possibly on software DSM implementations). As
> much as I dislike non-multi-copy-atomic architectures for reasoning about
> as a programmer, a lot of architects seem to believe that they are likely
> to stay in some form, for highly parallel systems. Requiring
> synchronization between threads that don't directly communicate seems to be
> inherently questionable, at least without a strong use case.
> 
> Hans
> 
> 
> On Fri, Apr 27, 2018 at 6:27 PM, Brian Demsky <bdemsky at uci.edu> wrote:
> 
>> Hi Hans,
>>
>> Do you have an estimate of how much it would actually slow down Java on
>> Power to implement the spec?  Or good reason to believe that code doesn’t
>> rely on the specified behavior?
>>
>> Brian
>>
>>> On Apr 27, 2018, at 6:07 PM, Hans Boehm <boehm at acm.org> wrote:
>>>
>>> [ This was previously posted to a smaller audience. Reposting here as the
>>> next step. ]
>>>
>>> This seems to be a new Java memory model problem uncovered in response to
>>> the revision of "release sequences" in C++. wg21.link/P0982 has details.
>>> But if you don't care about the C++ memory model, you can ignore all that
>>> and just read the following.
>>>
>>> Clearly this isn't the only or most serious open Java memory model
>> problem.
>>> But I think it's actually one that has a fairly simple point solution.
>> And
>>> it may be worth fixing without a comprehensive solution.
>>>
>>> Problematic litmus test:
>>>
>>> Writing =rlx for ordinary Java memory accesses and =sc for volatile ones,
>>> consider
>>>
>>> Thread 1:
>>> x =plain 1;
>>> v =vol 1;
>>>
>>> Thread 2:
>>> v =vol 2;
>>>
>>> Thread 3:
>>> r1 =vol v;
>>> r2 =plain x;
>>>
>>> Java disallows the final state, after joining all threads, of r1 = v = 2
>>> and r2 = 0. Since in the end v = 2, Thread 2s assignment to v must have
>>> followed Thread 1's  in the synchronization order. And in Java a volatile
>>> store synchronizes with all later (in synchronization order) volatile
>> loads
>>> (Property A). This Thread 1 must synchronize with Thread 3, and r2 must
>> be
>>> 1.
>>>
>>> This diverges from the analogous C++ semantics. (The release sequence
>>> problem there is a bit different.)
>>>
>>> The consensus of the experts in the other discussion is that this outcome
>>> is in fact allowed on Power, with both of the standard compilation
>> models.
>>> Thus the spec and the implementations can't both be right in this regard.
>>>
>>> IIRC, the JMM discussion that led to this, like the one that led to the
>>> vaguely analogous C++ problem, was more of a "why not" argument then
>>> anything solid. Which in retrospect was probably unwise in both cases.
>>> That, combined with the fact that this is a C++ vs Java divergence, and
>> the
>>> expense of actually conforming to the current spec on Power, suggests we
>>> may want to call this a spec problem.
>>>
>>> The concrete proposal would be to change the bullet (in 17.4.4)
>>>
>>> * A write to a volatile variable v (§8.3.1.4) synchronizes-with all
>>> subsequent reads of v by any thread (where "subsequent" is defined
>>> according to the synchronization order).
>>>
>>> to (for now)
>>>
>>> * A write w to a volatile variable v (§8.3.1.4) synchronizes-with any
>> read
>>> of v that observes the value written by w.
>>>
>>> The reason I said "for now" is that I think we will eventually need
>>> C++-style "release sequences" in order to prevent intervening RMW
>>> operations from breaking the synchronizes with relationship here. Without
>>> that some fairly basic idioms, like reference counting, would look
>>> different in Java and C++, with Java being needlessly slower. But RMW
>>> operations aren't yet a thing in the JLS, so we can leave that in the
>>> bucket of other things that will eventually need fixing.
>>>
>>> The argument for doing this now rather than later is that the spec
>> clearly
>>> promises something that fails to hold for major implementations. And
>>> somewhat uniquely, in this case, we do know how to fix it. There is no
>>> reason to provide misleading information here.
>>>
>>> Opinions?
>>>
>>> Hans
>>>
>>
>>
> 
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------