RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)

Tue Apr 24 22:40:08 UTC 2018

Hi Erik,

thanks a lot for your explanations and for digging so much into PPC stuff.

So I think we should continue using memory_order_conservative instead of seq_cst for now.
Or we could introduce seq_cst and implement it like conservative on PPC64 for now.

Best regards,
Martin

-----Original Message-----
From: Erik Österlund <erik.osterlund at oracle.com> 
Sent: Dienstag, 24. April 2018 22:03
To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)

Hi Martin,

On 2018-04-24 19:02, Doerr, Martin wrote:
> Hi Erik,
>
>> I was not suggesting removing the more conservative ordering property,
>> only allowing using seq_cst, when you know that is the correct semantics.
> Ok, thanks for clarifying.
>
>> There are both leading sync and trailing sync conventions for
>> implementing C++11 seq_cst on PPC
> I'm not convinced that this can be freely selected.
> Is there a GCC switch? What about xlC on AIX?

It is not selectable by the compiler, which is why I think we should 
generate our own inline assembly where we explicitly choose the 
convention as either leading sync or trailing sync, depending on how we 
roll (cf. Repairing Sequential Consistency in C/C++11 
https://dl.acm.org/citation.cfm?id=3062352 where both leading and 
trailing sync bindings for C++11 are provided). Anything else seems 
unsafe (to me). When I talk about C++11 semantics, I don't necessarily 
mean the chosen implementation by specific compilers. I mean the 
semantic properties of seq_cst, acquire, release, acq_rel, consume, etc.

> The trailing sync convention would have a different weakness: Store + Cmpxchg
> The store will have to be SeqCst in order to generate correct code on PPC64.

Sure, if you move the fence from one side to another, but still have one 
side lacking a full fence, there will naturally be a possibility of 
reordering on the other side. And therefore I am not arguing we 
necessarily remove "conservative". Having said that - doesn't this look 
like a much less innocent mistake - having some sort of dekker duality 
between a relaxed store (?!) and seq_cst cmpxchg/load? I am biased to 
think that seems like a much worse and less subtle programmer mistake, 
that is easier to understand. We don't even do that on x86 today (we 
would use release_store_fence for the store in such a dekker duality), 
and it would break more consistently on other platforms too. The 
comments in orderAccess.hpp file states: "Use release_store_fence to 
update values like the thread state, where we don't want the current 
thread to continue until all our prior memory accesses (including the 
new thread state) are visible to other threads". Now we can have 
opinions about this "visibility" property, but we have arbitrarily tied 
it to the stores. Therefore, the fact is that such dekker dualities use 
release_store_fence today, and there is no corresponding 
fence_load_acquire() operation. Therefore, I think it would seem, in 
HotSpot, easier to adopt a trailing sync seq_cst convention, to fit into 
existing code conventions. Sorry for derailing further down the rabbit 
hole...

> Even if we can select it in C++, it would be unfortunate if it differs from JMM.
> JMM should not be changed for the sake of C++ implementation cleanup.
> We have "JEP 188: Java Memory Model Update" for that.

Sure. The chosen bindings for the JMM would have to comply with that. I 
will stop myself from derailing into a conversation about that. :)

>  From the other mail:
>> With C++11, you can choose both what memory ordering
>> semantics successful CAS and failing CAS should have
> You're right. We just need to use it correctly.

Indeed. :)

Thanks,
/Erik

> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
> Sent: Dienstag, 24. April 2018 18:25
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>
> Hi Martin,
>
> On 2018-04-24 15:43, Doerr, Martin wrote:
>> Hi Erik,
>>
>> I highly appreciate if we can get the VM code to work correctly with C++ 2011 semantics on PPC64.
>> But I think this needs to be done very carefully in order to avoid missing sync instructions between CAS and Load.
> I was not suggesting removing the more conservative ordering property,
> only allowing using seq_cst, when you know that is the correct semantics.
>
>> E.g. the new global counter implementation uses (after fix):
>> Atomic::add(memory_order_conservative) + OrderAccess::load_acquire
>>
>> In C++2011 this should become:
>> Cmpxchg SeqCst + Load Seq Cst
>>
>> Cmpxchg SeqCst + any other Load would be incorrect on PPC64. So we will need to double-check all corresponding loads.
> This might be getting out of topic, but here goes...
>
> There are both leading sync and trailing sync conventions for
> implementing C++11 seq_cst on PPC. What you are saying is true for the
> leading sync convention, but not for the trailing sync convention. In
> HotSpot, I think it is probably more desirable to bind the expectations
> for *all* seq_cst accesses to have trailing sync convention to avoid
> this issue of seq_cst store followed by load_acquire being reordered. I
> don't think we want a situation where PPC implements leading sync
> (despite being able to choose trailing sync), when AArch64 uses trailing
> sync. But perhaps this is derailing a bit...
>
>> I've seen that you have moved the support_IRIW_for_not_multiple_copy_atomic_cpu fence into RawAccessBarrier<decorators>::load_internal. Ideally, RawAccessBarrier<decorators>::store_internal should use OrderAccess::release_store without fence in case of support_IRIW_for_not_multiple_copy_atomic_cpu, but I'm not requesting to change this now. We'd get this by moving to C++ 2011 semantics (which assumes support_IRIW_for_not_multiple_copy_atomic_cpu), right?
> That code should really call into Atomic/OrderAccess with seq_cst, and
> let the platform layer do what it needs to do.
>
> Thanks,
> /Erik
>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
>> Sent: Montag, 23. April 2018 12:41
>> To: Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>>
>> Hi Andrew,
>>
>> On 2018-04-23 12:20, Andrew Haley wrote:
>>> On 04/18/2018 05:37 PM, Doerr, Martin wrote:
>>>
>>>> It'd be great if we could specify semantics for Atomic:add like we
>>>> do for Atomic::cmpchgx.
>>>> However, the double-fence is a very bad default from performance
>>>> perspective. I wonder if PPC64 is the only platform which gets hit.
>>> Probably not.
>>>
>>>> Would it be acceptable to set the default to
>>>> memory_order_acquire_release and specify memory_order_conservative
>>>> for the new usage? I think this would fit perfectly for PPC64, but
>>>> some people may not like it. One could say PPC64 is the problem, but
>>>> one could also say the VM code is the problem ��
>>> Indeed.  In as much as we use stronger memory ordering than we need in
>>> more than one place, the VM code is the problem.
>>>
>>>> I don't really like the straight forward fix to insert a fence with
>>>> #ifdef AIX.
>>> I agree.  It looks to me like it's sufficient if the increment of
>>> global_counter and the load of thread->get_rcu_counter() are
>>> sequentially consistent.  On at least one platform, AArch64, we can
>>> do that without additional fencing.
>> I would also like to have seq_cst available. In the Access API, I need
>> MO_SEQ_CST for JMM volatile support. I would like the semantic
>> requirements to go straight into Atomic, instead of doing a "if
>> (support_IRIW_for_not_multiple_copy_atomic_cpu) { OrderAccess::fence();
>> }" dance in shared code, that may or may not make sense at a given
>> platform (among other reasons because it enforces whether leading or
>> trailing sync convention is used in shared code, rather than leave that
>> decision to the platform level). Plus I have a mouse pad that says I
>> love sequential consistency, although that might not be a valid argument
>> to introduce this to Atomic.
>>
>> Thanks,
>> /Erik