Presentation: Understanding OrderAccess

Mon Nov 28 16:29:28 UTC 2016

Hi David,

sending the email again with corrected subject + removed confusing statement. My spam filter had added "[JUNK]". I have no clue what it didn't like. Sorry for that.

> Problem there, I think, is that fence() is really not special in that regard. You need to insert something between the two loads to force a globally consistent view of memory. But what part of fence() gives that guarantee?

This is really hard to explain. Maybe there are better explanations out there, but I'll give it a try:

I think the comment in orderAccess.hpp is not bad:
// Finally, we define a "fence" operation, as a bidirectional barrier.
// It guarantees that any memory access preceding the fence is not // reordered w.r.t. any memory accesses subsequent to the fence in program // order.

One can consider a fence as a global operation which separates a set of accesses A from a set of accesses B.
If A contains a load, one has to include the corresponding store which may have been performed by another thread into A.
Especially the storeLoad part of the barrier must include stores performed by other processors but observed by this one.

> Yeah I've seen the mappings but it is the conceptual model that I have a problem with. Andrew's reply makes it somewhat clearer - if every atomic op is seq-cst then you get a seq-cst execution ...
> but does that somehow bind all memory accesses not just those involved in the atomic ops? And how do non seq-cst atomic ops interact with seq-cst ones?

"Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged." [4]

So acquire+release orders wrt. all memory accesses while the total modification order only applies to "atomic operations that are so tagged". This is pretty much like volatile vs. non-volatile in Java [5].

Best regards,
Martin

[4] http://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering
[5] http://g.oswego.edu/dl/jmm/cookbook.html

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com]
Sent: Montag, 28. November 2016 13:56
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
Subject: Re: Presentation: Understanding OrderAccess

Hi Martin,

On 28/11/2016 8:43 PM, Doerr, Martin wrote:
> Hi David,
>
> I know, multi-copy atomicity is hard to understand. It is relevant for complex scenarios in which more than 2 threads are involved.
> I think a good explanation is given in the paper [1] which we had discussed some time ago (email thread [2]).
>
> The term "multiple-copy atomicity" is described as "... in a machine 
> which is not multiple-copy atomic, even if a write instruction is access-atomic, the write may become visible to different threads at different times ...".
>
> I think "IRIW" (in [1] "6.1 Extending SB to more threads: IRIW and RWC") is the most comprehensible example.
> The key property of the architectures is that "... writes can be propagated to different threads in different orders ...".

Thanks for the reminder of that discussion. :)

> A globally consistent order can be enforced by adding - in hotspot terms - OrderAccess::fence() between the read accesses.

Problem there, I think, is that fence() is really not special in that regard. You need to insert something between the two loads to force a globally consistent view of memory. But what part of fence() gives that guarantee? Maybe there is something we need to define for non-multi-copy-atomicarchitectures to use just for this purpose.

> Since you have asked about C++11, there's an example implementation for PPC [3].
> Load Seq Cst uses a heavy-weight sync instruction (OrderAccess::fence() in hotspot terms) before the load. Such "Load Seq Cst" observe writes in a globally consistent order.

Yeah I've seen the mappings but it is the conceptual model that I have a problem with. Andrew's reply makes it somewhat clearer - if every atomic op is seq-cst then you get a seq-cst execution ... but does that somehow bind all memory accesses not just those involved in the atomic ops? And how do non seq-cst atomic ops interact with seq-cst ones?

> Btw.: We have implemented the Java volatile accesses very similar to [3] for PPC64 even though the recent Java memory model does not strictly require this implementation.
> But I guess the Java memory model is beyond the scope of your presentation.

Oh yes way out of scope! :)

Cheers,
David

> Best regards,
> Martin
>
>
> [1] http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> [2]
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030
> 212.html [3]
> http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Montag, 28. November 2016 06:56
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers 
> <hotspot-dev at openjdk.java.net>
> Subject: Re: Presentation: Understanding OrderAccess
>
> Hi Martin
>
> On 24/11/2016 2:20 AM, Doerr, Martin wrote:
>> Hi David,
>>
>> thank you very much for the presentation. I think it provides a good guideline for hotspot development.
>
> Thanks.
>
>>
>> Would you like to add something about multi-copy atomicity?
>
> Not really. :)
>
>> E.g. there's a usage of OrderAccess::fence() in GenericTaskQueue<E, F, N>::pop_global which is only needed on platforms which don't provide this property (PPC and ARM).
>>
>> It is needed in the following scenario:
>> - Different threads write 2 variables.
>> - Readers of these 2 variables expect a globally consistent order of the write accesses.
>>
>> In this case, the readers must use OrderAccess::fence() between the 2 load accesses on platforms without "multi-copy atomicity".
>
> Hmmm ... I know this code was discussed at length a couple of years ago ... and I know I've probably forgotten most of what was discussed ... so I'll have to revisit this because this seems wrong ...
>
>> (While taking a look at it, the condition "#if !(defined SPARC || 
>> defined IA32 || defined AMD64)" is not accurate and should better get 
>> improved. E.g. s390 is multi-copy atomic.)
>>
>>
>> I like that you have added our cmpxchg_memory_order definition. We implemented it even more conservative than C++' seq_cst on PPC64.
>
> I still can't get my head around the C++11 terminology for this and 
> how you are expected to use it - what does it mean for an individual 
> operation to be "sequentially consistent" ? :(
>
> Cheers,
> David
>
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On 
>> Behalf Of David Holmes
>> Sent: Mittwoch, 23. November 2016 06:08
>> To: hotspot-dev developers <hotspot-dev at openjdk.java.net>
>> Subject: Presentation: Understanding OrderAccess
>>
>> This is a presentation I recently gave internally to the runtime and serviceability teams that may be of more general interest to hotspot developers.
>>
>> http://cr.openjdk.java.net/~dholmes/presentations/Understanding-Order
>> A
>> ccess-v1.1.pdf
>>
>> Cheers,
>> David
>>