RFR(M): 8155949: Support relaxed semantics in cmpxchg

Wed May 18 12:32:15 UTC 2016

Hi David,

ok, this comment specifies it clear enough:
   // these semantics reflect the strength of atomic operations that are
   // provided on SPARC/X86.
So my change does the right thing :-)

Thanks for your quick response and especially for pushing this change forward.

Best regards,
Martin

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Mittwoch, 18. Mai 2016 13:53
To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

On 18/05/2016 9:08 PM, Doerr, Martin wrote:
> Hi David,
>
> in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation.
>
> C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails.

The hotspot semantics are quite simple - no difference between success 
or failure - just two-way barriers around the operations.

> In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed.

Yes multi-copy-atomicity is implicit in the "two way barriers" - nothing 
can be reordered in relation to the operation, so implicitly all 
observers see the same thing at the same time.

This may well be stronger than required by actual algorithms using the 
operations but as the comment block continues:

   // these semantics reflect the strength of atomic operations that are
   // provided on SPARC/X86. We assume that strength is necessary unless
   // we can prove that a weaker form is sufficiently safe.

Cheers,
David
-----

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 18. Mai 2016 12:52
> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> On 18/05/2016 8:12 PM, Doerr, Martin wrote:
>> Hi Kim,
>>
>> thank you very much for the detailed review.
>>
>> I agree with your comments and I have made all your requested changes here:
>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>
>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>
> What further specification are you looking for:
>
>    // All of the atomic operations that imply a read-modify-write action
>    // guarantee a two-way memory barrier across that operation.
>
> ??
>
> David
>
>
>> For performance optimization, we should better use (or introduce additional) enum values.
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Kim Barrett [mailto:kim.barrett at oracle.com]
>> Sent: Mittwoch, 18. Mai 2016 03:26
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>>> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>
>>> Hello everybody,
>>>
>>> thanks for finding this issue. New webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>>
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.hpp
>>   30 typedef enum cmpxchg_cmpxchg_memory_order {
>>   31   memory_order_relaxed,
>>   32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
>>   33   memory_order_conservative = 8
>>   34 } cmpxchg_memory_order;
>>
>> This is C++, where enum tag names are types, so we don't need a
>> typedef here. Just use "enum cmpxchg_memory_order { ... };".
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.cpp
>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>   60                            volatile unsigned int* dest, unsigned int compare_value,
>>   61                            cmpxchg_memory_order order) {
>>
>> Misaligned parameters.
>>
>> I'm surprised this was ever out-of-line. But with this change it's
>> quite bad to be out-of-line, as that's going to kill the constant
>> propogation of the order value.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> In each use, the cmpxchg_post_membar's are after the exit label,
>> whereas the acquire fences they are replacing were before the exit
>> label.  This means we'll be fencing on failure exit, where we weren't
>> doing so before.
>>
>> It's not clear whether this change is intentional.  Note that this
>> change is consistent with the C++11 one-order forms of cmpxchg, where
>> the single order argument is used as the sucess order and (with
>> potentially some modification) as the failure order.
>>
>> [I was going to suggest the asm goto syntax might be used to obtain
>> the original ordering, but "An asm goto statement cannot have
>> outputs." So some non-trivial restructuring will probably be needed to
>> get the original ordering.]
>>
>> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> These repeated comments need updating:
>>  315   // Note that cmpxchg guarantees a two-way memory barrier across
>>  316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
>>  317   // (see atomic.hpp).
>>
>> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> [pre-existing]
>>
>> The cmpxchg asm sequences all originally looked like
>>
>>   /* fence */
>>   strasm_sync
>>   ...
>>   /* acquire */
>>   strasm_sync
>>   ...
>>
>> So they were using strasm_sync (the full fence) in both places, even
>> though the comments suggest it could/should have been
>>   strasm_fence ... strasm_acquire
>> However, the description in runtime/atomic.hpp seems to indicate
>> something stronger than "acquire" is required here, so the second
>> comment seems wrong. Maybe its a good thing the comments are being
>> removed by the proposed change.
>>
>> Similarly in other corresponding places in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>  138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>>  139                               (volatile intptr_t*) dest,
>>  140                               (intptr_t) compare_value, order);
>>
>> I'd prefer the order argument be placed on its own line, rather than
>> added to an existing line where it's kind of hiding.
>>
>> Similarly in other corresponding places in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>  271 inline jint Atomic::cmpxchg(jint exchange_value,
>>  272                             volatile jint* dest,
>>  273                             jint compare_value, cmpxchg_memory_order order) {
>>
>> and similarly elsewhere, I'd prefer the order parameter be on it's own
>> line like the other parameters.
>>
>> Similarly in other corresponding places here and in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>  306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
>>  307                                  volatile void* dest,
>>  308                                  void* compare_value, cmpxchg_memory_order order) {
>>  309
>>  310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>>  311                               (volatile intptr_t*) dest,
>>  312                               (intptr_t) compare_value);
>>  313 }
>>
>> Inner cmpxchg_ptr is missing the order argument. This will discard an
>> outer relaxed order.  (atomic_linux_zero is OK.)
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.inline.hpp
>>
>> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
>> argument through to cmpxchg_general.  Of course, then we might want to
>> figure out what cmpxchg_general should be doing with the order.
>>
>> ------------------------------------------------------------------------------
>>