RFR(M): 8202080: Introduce ordering semantics for Atomic::add

Erik Österlund erik.osterlund at oracle.com
Mon Apr 23 09:44:59 UTC 2018


Hi David,

On 2018-04-22 14:30, David Holmes wrote:
> Erik,
>
> On 21/04/2018 5:53 PM, Erik Österlund wrote:
>> Hi Martin and David,
>>
>> I for one welcome using C++11 atomic semantics, and would like to 
>> outline why.
>>
>> 1) It is compliant with the JMM. Work has been done to unify the two. 
>> I think I outlined for my access decorators which JMM property links 
>> to each MO_ decorator, which corresponds to a C++11 semantic, that I 
>> also have a short description of.
>> 2) It has a well defined meaning, as lots of people with more time on 
>> their hands than us have spent many hours defining what it means.
>> 3) It is well known, and new developers that know either of C++11 
>> atomics or the JMM can immediately start working with it.
>> 4) Since it is so well-known, compilers know about it too. For 
>> platforms that don't offer a whole bunch of ABIs for these semantics, 
>> like x86, we have the opportunity to use intrinsics, and let the 
>> compiler know what we are trying to do. This allows more freedom such 
>> as using consume, that we can not represent with volatiles and inline 
>> assembly.
>> 5) For similar reasons, the hardware know about it too. If we had a 
>> notion of seq_cst (which is what this counter needs), the AArch64 
>> hardware could provide the most appropriate instructions that map 
>> directly to that.
>> 6) For our TSO platforms, it will make little difference. But for SAP 
>> and RedHat, this makes a much bigger difference, to the extent that 
>> PPC can not pragmatically use the conservative model we have 
>> provided. Therefore we have ended up in a situation where we have a 
>> contract that is not followed, and people like me writing workarounds 
>> out of politeness. I would rather have a weaker contract I can rely 
>> on, that is actually implemented properly in all backends, than have 
>> a strong contract that looks great, but is too unpragmatic to be 
>> implemented in the very backends where it matters the most, and hence 
>> is unreliable in practice.
>
> It has not been established that the weaker contract can be relied 
> upon. Simply saying "it seems to work fine" is not good enough. Plenty 
> of code with missing memory barriers seemed to work fine, right up 
> until it stopped working fine.
>
> I have no issues with expanding our support for memory access modes - 
> as long as we clearly document exactly what semantics they have - but 
> before applying a relaxed mode to an existing conservative operation 
> you have to "prove" that it is correct to do so. As I mentioned the 
> last time this was attempted with a relaxed cmpxchg, it was shown to 
> not be a trivial exercise to get right.

Just to be clear, I was talking about how I welcome being *able* to use 
weaker semantics in Atomic to accommodate the platforms that need it. I 
was not talking about whether the existing callsites of Atomic::add 
should be downgraded to a weaker ordering without analysis what the 
consequences would be. I think that is what you are skeptical against. I 
am too, and do not know how I feel about that. But let's analyze what 
would happen in terms of fencing for the platforms in our repo if that 
was the case.

I think the proposal is to change Atomic::add to have acq_rel semantics 
by default (which is what in practice PPC has been using), which would 
downgrade the expected memory ordering requirements on existing 
callsites to Atomic::add. Let's have a look at what would happen on 
those call sites on a selection of platforms:

1) x86/x64: No difference. Was a lock xadd, will continue to be so.
2) SPARC: No difference. Was load phi; cas phi; loop. Will continue be so.
3) AArch64: No difference. The GCC bindings expand to exclusive 
load-acquire/release-store loop. Will continue to be so (as the AArch64 
interpretation of acquire/release is sequentially consistent, and the 
AArch64 port apparently decided to stop at sequential consistency).
4) ARM: The ARM AArch64 port uses explicit ldaxr/stlxr bindings. Same 
storyas above. The non-AArch64 case uses a ldrex/strex loop with both 
leading and trailing dmb sy, when LL/SC is available and otherwise uses 
a not fenced at all loop, presumably because we already know there is no 
concurrency in such systems.
4) S390: No difference. Used atomic load and ALU or CAS loop, depending 
on hardware availability, and will continue to do so.
5) PPC: Can now remove bidirectional sync, and replace with leading 
lwsync and trailing branch + isync. This is seemingly the only platform 
yet that can take advantage of this and generate better code. Therefore, 
it is the only platform where things would look difference. But they 
already skipped the official contract and used fencing corresponding to 
acq_rel since a relatively long time. Therefore, in practice, they would 
not change either.

So I guess for the ports in our repo, we will continue to generate the 
same code as we did before, whether we have changed to a weaker contract 
or not. So it seems to me that whatever risk we take by blindly 
downgrading our Atomic::add callsites to be acq_rel, from the expected 
"conservative", is a risk that will only affect the PPC port (in our 
repo). And it is a risk that they have already taken.

Having said that, it would of course feel better if somebody had gone 
through all the call sites and checked that acq_rel is indeed 
sufficient. Perhaps SAP has done such analysis already?

Thanks,
/Erik

> David
> -----
>
>> 7) The weaker contract forces us to think more about our lock-free 
>> interactions and why it works. But I would argue we should not write 
>> lock-free code if we do not understand why it works anyway.
>>
>> Thanks,
>> /Erik
>>
>> On 2018-04-21 01:08, David Holmes wrote:
>>> Hi Martin,
>>>
>>> You've added:
>>>
>>>   enum cmpxchg_memory_order {
>>>     memory_order_relaxed = 0,
>>> !   memory_order_acquire = 2,
>>> !   memory_order_release = 3,
>>> !   memory_order_acq_rel = 4,
>>>
>>> but not defined them. What do they mean? Are they the same as 
>>> C++11/17? Do we actually want that? There was a discussion on all 
>>> this when the enum was introduced. Also note the enum is named 
>>> cmpxchg_memory_order - if we're going to generalise this then it 
>>> should be renamed and behaviour clearly specified.
>>>
>>> !   inline static D add(I add_value, D volatile* dest,
>>> !                       cmpxchg_memory_order order = 
>>> memory_order_acq_rel);
>>>
>>> The default should be conservative - to match what is stated in 
>>> atomic.hpp regarding expectations for all atomic r-m-w operations.
>>>
>>>  63   // All of the atomic operations that imply a read-modify-write 
>>> action
>>>   64   // guarantee a two-way memory barrier across that operation. 
>>> Historically
>>>   65   // these semantics reflect the strength of atomic operations 
>>> that are
>>>   66   // provided on SPARC/X86. We assume that strength is 
>>> necessary unless
>>>   67   // we can prove that a weaker form is sufficiently safe.
>>>
>>> That's how "conservative" was defined when this was introduced. Some 
>>> of the ports chose to not follow this "contract" and that's up to 
>>> them, but only those ports should be utilising weaker operations 
>>> unless it can be proven that a weaker form is always correct. The 
>>> onus is on the person changing the code. (There was long discussion 
>>> when a relaxed cmpxchg was attempted to be introduced and it was 
>>> shown to not be correct.)
>>>
>>> David
>>> -----
>>>
>>> On 21/04/2018 2:04 AM, Doerr, Martin wrote:
>>>> Hi,
>>>>
>>>> JDK-8195099 “Concurrent safe-memory-reclamation mechanism” 
>>>> introduced a usage of Atomic::add for which we’re not using the 
>>>> correct memory barriers on PPC64 and s390.
>>>>
>>>> Some more description can be found in the bug: 
>>>> https://bugs.openjdk.java.net/browse/JDK-8202080
>>>>
>>>> We can fix this by the following change:
>>>>
>>>> http://cr.openjdk.java.net/~mdoerr/8202080_atomic_add/webrev.00/
>>>>
>>>> Please note that I only implemented the platform code for linux 
>>>> x86, PPC64 and s390 so far. The remaining ones should be trivial 
>>>> and I’ll take care of them after I got some feedback.
>>>>
>>>> I can also do the same change for other atomic functions like sub, 
>>>> inc and dec if you like.
>>>>
>>>> Intention of this change is only to fix this new code and possibly 
>>>> other future enhancements, not to change the behavior of any older 
>>>> usages of Atomic::add.
>>>>
>>>> Feedback is welcome.
>>>>
>>>> Best regards,
>>>>
>>>> Martin
>>>>
>>



More information about the hotspot-runtime-dev mailing list