Presentation: Understanding OrderAccess

Tue Nov 29 13:08:16 UTC 2016

Hi David and Erik,

> But again that attribution of global properties is not something I think is necessarily implied or intended by OrderAccess.
> Or maybe it is, but as it is only an issue on non-multicopy-atomic systems, it has never been called out explicitly. ?? And
> those global properties must also be a part of the other barriers (as the fence is just the combination of them all) - but I
> don't know how you would describe the affects of the other barriers (like loadload) in "global" terms.

I think the global properties are implicitly assumed on multicopy-atomic systems and most people don't think about them.
But they are important as soon as more than 2 threads are involved, especially on PPC64 and Aarch64.
That's why I'd appreciate if they could be added to hotspot documentations or presentations.

Also storeStore barriers are expected to be transitive or "cumulative" as the property is called in PPC64 documentation.
If one thread releases something which is based on something else which was written by another thread, a third thread
which acquires it, is expected to see that in a consistent way. Do you agree?

loadStore and loadLoad barriers are much simpler as they basically require the following accesses to occur late enough
without any global synchronization requirements.

Best regards,
Martin

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Montag, 28. November 2016 22:22
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
Cc: ERIK.OSTERLUND <erik.osterlund at oracle.com>
Subject: Re: Presentation: Understanding OrderAccess

Hi Martin,

I've added Erik explicitly to the cc as he and I have been discussing fences and "visibility", and of course he most recently revised the descriptions in orderAccess.hpp

On 29/11/2016 2:29 AM, Doerr, Martin wrote:
> Hi David,
>
> sending the email again with corrected subject + removed confusing statement. My spam filter had added "[JUNK]". I have no clue what it didn't like. Sorry for that.
>
>> Problem there, I think, is that fence() is really not special in that regard. You need to insert something between the two loads to force a globally consistent view of memory. But what part of fence() gives that guarantee?
>
> This is really hard to explain. Maybe there are better explanations out there, but I'll give it a try:
>
> I think the comment in orderAccess.hpp is not bad:
> // Finally, we define a "fence" operation, as a bidirectional barrier.
> // It guarantees that any memory access preceding the fence is not // reordered w.r.t. any memory accesses subsequent to the fence in program // order.
>
> One can consider a fence as a global operation which separates a set of accesses A from a set of accesses B.
> If A contains a load, one has to include the corresponding store which may have been performed by another thread into A.
> Especially the storeLoad part of the barrier must include stores performed by other processors but observed by this one.

But again that attribution of global properties is not something I think is necessarily implied or intended by OrderAccess. Or maybe it is, but as it is only an issue on non-multicopy-atomic systems, it has never been called out explicitly. ?? And those global properties must also be a part of the other barriers (as the fence is just the combination of them all) - but I don't know how you would describe the affects of the other barriers (like loadload) in "global" terms.

David
-----

>
>> Yeah I've seen the mappings but it is the conceptual model that I have a problem with. Andrew's reply makes it somewhat clearer - if every atomic op is seq-cst then you get a seq-cst execution ...
>> but does that somehow bind all memory accesses not just those involved in the atomic ops? And how do non seq-cst atomic ops interact with seq-cst ones?
>
> "Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged." [4]
>
> So acquire+release orders wrt. all memory accesses while the total modification order only applies to "atomic operations that are so tagged". This is pretty much like volatile vs. non-volatile in Java [5].
>
>
> Best regards,
> Martin
>
> [4] http://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering
> [5] http://g.oswego.edu/dl/jmm/cookbook.html
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Montag, 28. November 2016 13:56
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers <hotspot-dev at openjdk.java.net>
> Subject: Re: Presentation: Understanding OrderAccess
>
> Hi Martin,
>
> On 28/11/2016 8:43 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> I know, multi-copy atomicity is hard to understand. It is relevant for complex scenarios in which more than 2 threads are involved.
>> I think a good explanation is given in the paper [1] which we had discussed some time ago (email thread [2]).
>>
>> The term "multiple-copy atomicity" is described as "... in a machine
>> which is not multiple-copy atomic, even if a write instruction is access-atomic, the write may become visible to different threads at different times ...".
>>
>> I think "IRIW" (in [1] "6.1 Extending SB to more threads: IRIW and RWC") is the most comprehensible example.
>> The key property of the architectures is that "... writes can be propagated to different threads in different orders ...".
>
> Thanks for the reminder of that discussion. :)
>
>> A globally consistent order can be enforced by adding - in hotspot terms - OrderAccess::fence() between the read accesses.
>
> Problem there, I think, is that fence() is really not special in that regard. You need to insert something between the two loads to force a globally consistent view of memory. But what part of fence() gives that guarantee? Maybe there is something we need to define for non-multi-copy-atomicarchitectures to use just for this purpose.
>
>> Since you have asked about C++11, there's an example implementation for PPC [3].
>> Load Seq Cst uses a heavy-weight sync instruction (OrderAccess::fence() in hotspot terms) before the load. Such "Load Seq Cst" observe writes in a globally consistent order.
>
> Yeah I've seen the mappings but it is the conceptual model that I have a problem with. Andrew's reply makes it somewhat clearer - if every atomic op is seq-cst then you get a seq-cst execution ... but does that somehow bind all memory accesses not just those involved in the atomic ops? And how do non seq-cst atomic ops interact with seq-cst ones?
>
>> Btw.: We have implemented the Java volatile accesses very similar to [3] for PPC64 even though the recent Java memory model does not strictly require this implementation.
>> But I guess the Java memory model is beyond the scope of your presentation.
>
> Oh yes way out of scope! :)
>
> Cheers,
> David
>
>> Best regards,
>> Martin
>>
>>
>> [1] http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>> [2]
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030
>> 212.html [3]
>> http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Montag, 28. November 2016 06:56
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev developers
>> <hotspot-dev at openjdk.java.net>
>> Subject: Re: Presentation: Understanding OrderAccess
>>
>> Hi Martin
>>
>> On 24/11/2016 2:20 AM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>> thank you very much for the presentation. I think it provides a good guideline for hotspot development.
>>
>> Thanks.
>>
>>>
>>> Would you like to add something about multi-copy atomicity?
>>
>> Not really. :)
>>
>>> E.g. there's a usage of OrderAccess::fence() in GenericTaskQueue<E, F, N>::pop_global which is only needed on platforms which don't provide this property (PPC and ARM).
>>>
>>> It is needed in the following scenario:
>>> - Different threads write 2 variables.
>>> - Readers of these 2 variables expect a globally consistent order of the write accesses.
>>>
>>> In this case, the readers must use OrderAccess::fence() between the 2 load accesses on platforms without "multi-copy atomicity".
>>
>> Hmmm ... I know this code was discussed at length a couple of years ago ... and I know I've probably forgotten most of what was discussed ... so I'll have to revisit this because this seems wrong ...
>>
>>> (While taking a look at it, the condition "#if !(defined SPARC ||
>>> defined IA32 || defined AMD64)" is not accurate and should better get
>>> improved. E.g. s390 is multi-copy atomic.)
>>>
>>>
>>> I like that you have added our cmpxchg_memory_order definition. We implemented it even more conservative than C++' seq_cst on PPC64.
>>
>> I still can't get my head around the C++11 terminology for this and
>> how you are expected to use it - what does it mean for an individual
>> operation to be "sequentially consistent" ? :(
>>
>> Cheers,
>> David
>>
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
>>> Behalf Of David Holmes
>>> Sent: Mittwoch, 23. November 2016 06:08
>>> To: hotspot-dev developers <hotspot-dev at openjdk.java.net>
>>> Subject: Presentation: Understanding OrderAccess
>>>
>>> This is a presentation I recently gave internally to the runtime and serviceability teams that may be of more general interest to hotspot developers.
>>>
>>> http://cr.openjdk.java.net/~dholmes/presentations/Understanding-Order
>>> A
>>> ccess-v1.1.pdf
>>>
>>> Cheers,
>>> David
>>>