RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)

Tue Apr 24 16:27:53 UTC 2018

Hi Martin,

That is incorrect. With C++11, you can choose both what memory ordering 
semantics successful CAS and failing CAS should have.

Thanks,
/Erik

On 2018-04-24 16:01, Doerr, Martin wrote:
> Hi again,
>
> another issue with C++ 2011 on PPC64 is that Cmpxchg Acquire/AcqRel/SeqCst don't acquire in case of failing Cmpxchg while our Atomic::PlatformCmpxchg does that. I believe that there are places in hotspot which rely on that. So I guess that all "else" cases will need to get double-checked and possibly get enhanced by leading OrderAccess:acquire().
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Doerr, Martin
> Sent: Dienstag, 24. April 2018 15:44
> To: 'Erik Österlund' <erik.osterlund at oracle.com>; Andrew Haley <aph at redhat.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: RE: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>
> Hi Erik,
>
> I highly appreciate if we can get the VM code to work correctly with C++ 2011 semantics on PPC64.
> But I think this needs to be done very carefully in order to avoid missing sync instructions between CAS and Load.
>
> E.g. the new global counter implementation uses (after fix):
> Atomic::add(memory_order_conservative) + OrderAccess::load_acquire
>
> In C++2011 this should become:
> Cmpxchg SeqCst + Load Seq Cst
>
> Cmpxchg SeqCst + any other Load would be incorrect on PPC64. So we will need to double-check all corresponding loads.
>
> I've seen that you have moved the support_IRIW_for_not_multiple_copy_atomic_cpu fence into RawAccessBarrier<decorators>::load_internal. Ideally, RawAccessBarrier<decorators>::store_internal should use OrderAccess::release_store without fence in case of support_IRIW_for_not_multiple_copy_atomic_cpu, but I'm not requesting to change this now. We'd get this by moving to C++ 2011 semantics (which assumes support_IRIW_for_not_multiple_copy_atomic_cpu), right?
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
> Sent: Montag, 23. April 2018 12:41
> To: Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; David Holmes <david.holmes at oracle.com>; Aleksey Shipilev <shade at redhat.com>; hotspot-dev at openjdk.java.net; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: RFR 8201799: Build failures after JDK-8195099 (Concurrent safe-memory-reclamation mechanism)
>
> Hi Andrew,
>
> On 2018-04-23 12:20, Andrew Haley wrote:
>> On 04/18/2018 05:37 PM, Doerr, Martin wrote:
>>
>>> It'd be great if we could specify semantics for Atomic:add like we
>>> do for Atomic::cmpchgx.
>>> However, the double-fence is a very bad default from performance
>>> perspective. I wonder if PPC64 is the only platform which gets hit.
>> Probably not.
>>
>>> Would it be acceptable to set the default to
>>> memory_order_acquire_release and specify memory_order_conservative
>>> for the new usage? I think this would fit perfectly for PPC64, but
>>> some people may not like it. One could say PPC64 is the problem, but
>>> one could also say the VM code is the problem ��
>> Indeed.  In as much as we use stronger memory ordering than we need in
>> more than one place, the VM code is the problem.
>>
>>> I don't really like the straight forward fix to insert a fence with
>>> #ifdef AIX.
>> I agree.  It looks to me like it's sufficient if the increment of
>> global_counter and the load of thread->get_rcu_counter() are
>> sequentially consistent.  On at least one platform, AArch64, we can
>> do that without additional fencing.
> I would also like to have seq_cst available. In the Access API, I need
> MO_SEQ_CST for JMM volatile support. I would like the semantic
> requirements to go straight into Atomic, instead of doing a "if
> (support_IRIW_for_not_multiple_copy_atomic_cpu) { OrderAccess::fence();
> }" dance in shared code, that may or may not make sense at a given
> platform (among other reasons because it enforces whether leading or
> trailing sync convention is used in shared code, rather than leave that
> decision to the platform level). Plus I have a mouse pad that says I
> love sequential consistency, although that might not be a valid argument
> to introduce this to Atomic.
>
> Thanks,
> /Erik