RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
Erik Osterlund
erik.osterlund at oracle.com
Wed Apr 25 05:18:17 UTC 2018
Hi Martin,
> On 25 Apr 2018, at 00:40, Doerr, Martin <martin.doerr at sap.com> wrote:
>
> Hi Erik,
>
> thanks a lot for your explanations and for digging so much into PPC stuff.
>
> So I think we should continue using memory_order_conservative instead of seq_cst for now.
> Or we could introduce seq_cst and implement it like conservative on PPC64 for now.
Agreed. I’d say probably keep conservative for now, and take one step at a time. We can introduce seq_cst another day.
Thanks,
/Erik
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Erik Österlund <erik.osterlund at oracle.com>
> Sent: Dienstag, 24. April 2018 22:03
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>
> Hi Martin,
>
>> On 2018-04-24 19:02, Doerr, Martin wrote:
>> Hi Erik,
>>
>>> I was not suggesting removing the more conservative ordering property,
>>> only allowing using seq_cst, when you know that is the correct semantics.
>> Ok, thanks for clarifying.
>>
>>> There are both leading sync and trailing sync conventions for
>>> implementing C++11 seq_cst on PPC
>> I'm not convinced that this can be freely selected.
>> Is there a GCC switch? What about xlC on AIX?
>
> It is not selectable by the compiler, which is why I think we should
> generate our own inline assembly where we explicitly choose the
> convention as either leading sync or trailing sync, depending on how we
> roll (cf. Repairing Sequential Consistency in C/C++11
> https://dl.acm.org/citation.cfm?id=3062352 where both leading and
> trailing sync bindings for C++11 are provided). Anything else seems
> unsafe (to me). When I talk about C++11 semantics, I don't necessarily
> mean the chosen implementation by specific compilers. I mean the
> semantic properties of seq_cst, acquire, release, acq_rel, consume, etc.
>
>> The trailing sync convention would have a different weakness: Store + Cmpxchg
>> The store will have to be SeqCst in order to generate correct code on PPC64.
>
> Sure, if you move the fence from one side to another, but still have one
> side lacking a full fence, there will naturally be a possibility of
> reordering on the other side. And therefore I am not arguing we
> necessarily remove "conservative". Having said that - doesn't this look
> like a much less innocent mistake - having some sort of dekker duality
> between a relaxed store (?!) and seq_cst cmpxchg/load? I am biased to
> think that seems like a much worse and less subtle programmer mistake,
> that is easier to understand. We don't even do that on x86 today (we
> would use release_store_fence for the store in such a dekker duality),
> and it would break more consistently on other platforms too. The
> comments in orderAccess.hpp file states: "Use release_store_fence to
> update values like the thread state, where we don't want the current
> thread to continue until all our prior memory accesses (including the
> new thread state) are visible to other threads". Now we can have
> opinions about this "visibility" property, but we have arbitrarily tied
> it to the stores. Therefore, the fact is that such dekker dualities use
> release_store_fence today, and there is no corresponding
> fence_load_acquire() operation. Therefore, I think it would seem, in
> HotSpot, easier to adopt a trailing sync seq_cst convention, to fit into
> existing code conventions. Sorry for derailing further down the rabbit
> hole...
>
>> Even if we can select it in C++, it would be unfortunate if it differs from JMM.
>> JMM should not be changed for the sake of C++ implementation cleanup.
>> We have "JEP 188: Java Memory Model Update" for that.
>
> Sure. The chosen bindings for the JMM would have to comply with that. I
> will stop myself from derailing into a conversation about that. :)
>
>> From the other mail:
>>> With C++11, you can choose both what memory ordering
>>> semantics successful CAS and failing CAS should have
>> You're right. We just need to use it correctly.
>
> Indeed. :)
>
> Thanks,
> /Erik
>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
>> Sent: Dienstag, 24. April 2018 18:25
>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>>
>> Hi Martin,
>>
>>> On 2018-04-24 15:43, Doerr, Martin wrote:
>>> Hi Erik,
>>>
>>> I highly appreciate if we can get the VM code to work correctly with C++ 2011 semantics on PPC64.
>>> But I think this needs to be done very carefully in order to avoid missing sync instructions between CAS and Load.
>> I was not suggesting removing the more conservative ordering property,
>> only allowing using seq_cst, when you know that is the correct semantics.
>>
>>> E.g. the new global counter implementation uses (after fix):
>>> Atomic::add(memory_order_conservative) + OrderAccess::load_acquire
>>>
>>> In C++2011 this should become:
>>> Cmpxchg SeqCst + Load Seq Cst
>>>
>>> Cmpxchg SeqCst + any other Load would be incorrect on PPC64. So we will need to double-check all corresponding loads.
>> This might be getting out of topic, but here goes...
>>
>> There are both leading sync and trailing sync conventions for
>> implementing C++11 seq_cst on PPC. What you are saying is true for the
>> leading sync convention, but not for the trailing sync convention. In
>> HotSpot, I think it is probably more desirable to bind the expectations
>> for *all* seq_cst accesses to have trailing sync convention to avoid
>> this issue of seq_cst store followed by load_acquire being reordered. I
>> don't think we want a situation where PPC implements leading sync
>> (despite being able to choose trailing sync), when AArch64 uses trailing
>> sync. But perhaps this is derailing a bit...
>>
>>> I've seen that you have moved the support_IRIW_for_not_multiple_copy_atomic_cpu fence into RawAccessBarrier<decorators>::load_internal. Ideally, RawAccessBarrier<decorators>::store_internal should use OrderAccess::release_store without fence in case of support_IRIW_for_not_multiple_copy_atomic_cpu, but I'm not requesting to change this now. We'd get this by moving to C++ 2011 semantics (which assumes support_IRIW_for_not_multiple_copy_atomic_cpu), right?
>> That code should really call into Atomic/OrderAccess with seq_cst, and
>> let the platform layer do what it needs to do.
>>
>> Thanks,
>> /Erik
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
>>> Sent: Montag, 23. April 2018 12:41
>>> To: Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>>> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>>>
>>> Hi Andrew,
>>>
>>>> On 2018-04-23 12:20, Andrew Haley wrote:
>>>>> On 04/18/2018 05:37 PM, Doerr, Martin wrote:
>>>>>
>>>>> It'd be great if we could specify semantics for Atomic:add like we
>>>>> do for Atomic::cmpchgx.
>>>>> However, the double-fence is a very bad default from performance
>>>>> perspective. I wonder if PPC64 is the only platform which gets hit.
>>>> Probably not.
>>>>
>>>>> Would it be acceptable to set the default to
>>>>> memory_order_acquire_release and specify memory_order_conservative
>>>>> for the new usage? I think this would fit perfectly for PPC64, but
>>>>> some people may not like it. One could say PPC64 is the problem, but
>>>>> one could also say the VM code is the problem
>>>> Indeed. In as much as we use stronger memory ordering than we need in
>>>> more than one place, the VM code is the problem.
>>>>
>>>>> I don't really like the straight forward fix to insert a fence with
>>>>> #ifdef AIX.
>>>> I agree. It looks to me like it's sufficient if the increment of
>>>> global_counter and the load of thread->get_rcu_counter() are
>>>> sequentially consistent. On at least one platform, AArch64, we can
>>>> do that without additional fencing.
>>> I would also like to have seq_cst available. In the Access API, I need
>>> MO_SEQ_CST for JMM volatile support. I would like the semantic
>>> requirements to go straight into Atomic, instead of doing a "if
>>> (support_IRIW_for_not_multiple_copy_atomic_cpu) { OrderAccess::fence();
>>> }" dance in shared code, that may or may not make sense at a given
>>> platform (among other reasons because it enforces whether leading or
>>> trailing sync convention is used in shared code, rather than leave that
>>> decision to the platform level). Plus I have a mouse pad that says I
>>> love sequential consistency, although that might not be a valid argument
>>> to introduce this to Atomic.
>>>
>>> Thanks,
>>> /Erik
>
More information about the hotspot-dev
mailing list