RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)

Wed Apr 25 05:18:17 UTC 2018

Hi Martin,

> On 25 Apr 2018, at 00:40, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hi Erik,
> 
> thanks a lot for your explanations and for digging so much into PPC stuff.
> 
> So I think we should continue using memory_order_conservative instead of seq_cst for now.
> Or we could introduce seq_cst and implement it like conservative on PPC64 for now.

Agreed. I’d say probably keep conservative for now, and take one step at a time. We can introduce seq_cst another day.

Thanks,
/Erik

> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: Erik Österlund <erik.osterlund at oracle.com> 
> Sent: Dienstag, 24. April 2018 22:03
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
> 
> Hi Martin,
> 
>> On 2018-04-24 19:02, Doerr, Martin wrote:
>> Hi Erik,
>> 
>>> I was not suggesting removing the more conservative ordering property,
>>> only allowing using seq_cst, when you know that is the correct semantics.
>> Ok, thanks for clarifying.
>> 
>>> There are both leading sync and trailing sync conventions for
>>> implementing C++11 seq_cst on PPC
>> I'm not convinced that this can be freely selected.
>> Is there a GCC switch? What about xlC on AIX?
> 
> It is not selectable by the compiler, which is why I think we should 
> generate our own inline assembly where we explicitly choose the 
> convention as either leading sync or trailing sync, depending on how we 
> roll (cf. Repairing Sequential Consistency in C/C++11 
> https://dl.acm.org/citation.cfm?id=3062352 where both leading and 
> trailing sync bindings for C++11 are provided). Anything else seems 
> unsafe (to me). When I talk about C++11 semantics, I don't necessarily 
> mean the chosen implementation by specific compilers. I mean the 
> semantic properties of seq_cst, acquire, release, acq_rel, consume, etc.
> 
>> The trailing sync convention would have a different weakness: Store + Cmpxchg
>> The store will have to be SeqCst in order to generate correct code on PPC64.
> 
> Sure, if you move the fence from one side to another, but still have one 
> side lacking a full fence, there will naturally be a possibility of 
> reordering on the other side. And therefore I am not arguing we 
> necessarily remove "conservative". Having said that - doesn't this look 
> like a much less innocent mistake - having some sort of dekker duality 
> between a relaxed store (?!) and seq_cst cmpxchg/load? I am biased to 
> think that seems like a much worse and less subtle programmer mistake, 
> that is easier to understand. We don't even do that on x86 today (we 
> would use release_store_fence for the store in such a dekker duality), 
> and it would break more consistently on other platforms too. The 
> comments in orderAccess.hpp file states: "Use release_store_fence to 
> update values like the thread state, where we don't want the current 
> thread to continue until all our prior memory accesses (including the 
> new thread state) are visible to other threads". Now we can have 
> opinions about this "visibility" property, but we have arbitrarily tied 
> it to the stores. Therefore, the fact is that such dekker dualities use 
> release_store_fence today, and there is no corresponding 
> fence_load_acquire() operation. Therefore, I think it would seem, in 
> HotSpot, easier to adopt a trailing sync seq_cst convention, to fit into 
> existing code conventions. Sorry for derailing further down the rabbit 
> hole...
> 
>> Even if we can select it in C++, it would be unfortunate if it differs from JMM.
>> JMM should not be changed for the sake of C++ implementation cleanup.
>> We have "JEP 188: Java Memory Model Update" for that.
> 
> Sure. The chosen bindings for the JMM would have to comply with that. I 
> will stop myself from derailing into a conversation about that. :)
> 
>> From the other mail:
>>> With C++11, you can choose both what memory ordering
>>> semantics successful CAS and failing CAS should have
>> You're right. We just need to use it correctly.
> 
> Indeed. :)
> 
> Thanks,
> /Erik
> 
>> Thanks and best regards,
>> Martin
>> 
>> 
>> -----Original Message-----
>> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
>> Sent: Dienstag, 24. April 2018 18:25
>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>> 
>> Hi Martin,
>> 
>>> On 2018-04-24 15:43, Doerr, Martin wrote:
>>> Hi Erik,
>>> 
>>> I highly appreciate if we can get the VM code to work correctly with C++ 2011 semantics on PPC64.
>>> But I think this needs to be done very carefully in order to avoid missing sync instructions between CAS and Load.
>> I was not suggesting removing the more conservative ordering property,
>> only allowing using seq_cst, when you know that is the correct semantics.
>> 
>>> E.g. the new global counter implementation uses (after fix):
>>> Atomic::add(memory_order_conservative) + OrderAccess::load_acquire
>>> 
>>> In C++2011 this should become:
>>> Cmpxchg SeqCst + Load Seq Cst
>>> 
>>> Cmpxchg SeqCst + any other Load would be incorrect on PPC64. So we will need to double-check all corresponding loads.
>> This might be getting out of topic, but here goes...
>> 
>> There are both leading sync and trailing sync conventions for
>> implementing C++11 seq_cst on PPC. What you are saying is true for the
>> leading sync convention, but not for the trailing sync convention. In
>> HotSpot, I think it is probably more desirable to bind the expectations
>> for *all* seq_cst accesses to have trailing sync convention to avoid
>> this issue of seq_cst store followed by load_acquire being reordered. I
>> don't think we want a situation where PPC implements leading sync
>> (despite being able to choose trailing sync), when AArch64 uses trailing
>> sync. But perhaps this is derailing a bit...
>> 
>>> I've seen that you have moved the support_IRIW_for_not_multiple_copy_atomic_cpu fence into RawAccessBarrier<decorators>::load_internal. Ideally, RawAccessBarrier<decorators>::store_internal should use OrderAccess::release_store without fence in case of support_IRIW_for_not_multiple_copy_atomic_cpu, but I'm not requesting to change this now. We'd get this by moving to C++ 2011 semantics (which assumes support_IRIW_for_not_multiple_copy_atomic_cpu), right?
>> That code should really call into Atomic/OrderAccess with seq_cst, and
>> let the platform layer do what it needs to do.
>> 
>> Thanks,
>> /Erik
>> 
>>> Best regards,
>>> Martin
>>> 
>>> 
>>> -----Original Message-----
>>> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
>>> Sent: Montag, 23. April 2018 12:41
>>> To: Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>>> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>>> 
>>> Hi Andrew,
>>> 
>>>> On 2018-04-23 12:20, Andrew Haley wrote:
>>>>> On 04/18/2018 05:37 PM, Doerr, Martin wrote:
>>>>> 
>>>>> It'd be great if we could specify semantics for Atomic:add like we
>>>>> do for Atomic::cmpchgx.
>>>>> However, the double-fence is a very bad default from performance
>>>>> perspective. I wonder if PPC64 is the only platform which gets hit.
>>>> Probably not.
>>>> 
>>>>> Would it be acceptable to set the default to
>>>>> memory_order_acquire_release and specify memory_order_conservative
>>>>> for the new usage? I think this would fit perfectly for PPC64, but
>>>>> some people may not like it. One could say PPC64 is the problem, but
>>>>> one could also say the VM code is the problem ��
>>>> Indeed.  In as much as we use stronger memory ordering than we need in
>>>> more than one place, the VM code is the problem.
>>>> 
>>>>> I don't really like the straight forward fix to insert a fence with
>>>>> #ifdef AIX.
>>>> I agree.  It looks to me like it's sufficient if the increment of
>>>> global_counter and the load of thread->get_rcu_counter() are
>>>> sequentially consistent.  On at least one platform, AArch64, we can
>>>> do that without additional fencing.
>>> I would also like to have seq_cst available. In the Access API, I need
>>> MO_SEQ_CST for JMM volatile support. I would like the semantic
>>> requirements to go straight into Atomic, instead of doing a "if
>>> (support_IRIW_for_not_multiple_copy_atomic_cpu) { OrderAccess::fence();
>>> }" dance in shared code, that may or may not make sense at a given
>>> platform (among other reasons because it enforces whether leading or
>>> trailing sync convention is used in shared code, rather than leave that
>>> decision to the platform level). Plus I have a mouse pad that says I
>>> love sequential consistency, although that might not be a valid argument
>>> to introduce this to Atomic.
>>> 
>>> Thanks,
>>> /Erik
>