[aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations

Tue Nov 12 12:14:48 UTC 2019

> -----Original Message-----
> From: Yangfei (Felix)
> Sent: Tuesday, November 12, 2019 8:03 PM
> To: 'Andrew Dinn' <adinn at redhat.com>; Andrew Haley <aph at redhat.com>;
> aarch64-port-dev at openjdk.java.net
> Cc: hotspot-dev at openjdk.java.net
> Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic
> operations
> 
> > -----Original Message-----
> > From: Andrew Dinn [mailto:adinn at redhat.com]
> > Sent: Tuesday, November 12, 2019 5:42 PM
> > To: Andrew Haley <aph at redhat.com>; Yangfei (Felix)
> > <felix.yang at huawei.com>; aarch64-port-dev at openjdk.java.net
> > Cc: hotspot-dev at openjdk.java.net
> > Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of
> > atomic operations
> >
> > On 12/11/2019 09:25, Andrew Haley wrote:
> > > On 11/12/19 8:37 AM, Yangfei (Felix) wrote:
> > >> This has been discussed somewhere before:
> > >> https://patchwork.kernel.org/patch/3575821/
> > >> Let's keep the current status for safe.
> > >
> > > Yes.
> > >
> > > It's been interesting to see the progress of this patch. I don't
> > > think it's the first time that someone has been tempted to change
> > > this code to make it "more efficient".
> > >
> > > I wonder if we could perhaps add a comment to that code so that it
> > > doesn't happen again. I'm not sure exactly what the patch should say
> > > beyond "do not touch". Perhaps something along the lines of "Do not
> > > touch this code unless you have at least Black Belt, 4th Dan in
> > > memory ordering."  :-)
> > >
> > > More seriously, maybe simply "Note that memory_order_conservative
> > > requires a full barrier after atomic stores. See
> > > https://patchwork.kernel.org/patch/3575821/"
> > Yes, that would be a help. It's particularly easy to get confused here
> > because we happily omit the ordering of an stlr store wrt subsequent
> > stores when the strl is implementing a Java volatile write or a Java cmpxchg.
> >
> > So, it might be worth adding a rider that implementing the full
> > memory_order_conservative semantics is necessary because VM code
> > relies on the strong ordering wrt writes that the cmpxchg is required to
> provide.
> >
> 
> I also suggest we implement these functions with inline assembly here.
> For Atomic::PlatformXchg, we may issue two consecutive full memory barriers
> with the current status.
> I used GCC 7.3.0 to compile the following function:
> 
> $ cat test.c
> long foo(long add_value, long volatile* dest, long exchange_value) {
>   long val = __sync_lock_test_and_set(dest, exchange_value);
> 
>   __sync_synchronize();
> 
>   return val;
> }
> 
> $ cat test.s
>         .arch armv8-a
>         .file   "test.c"
>         .text
>         .align  2
>         .p2align 3,,7
>         .global foo
>         .type   foo, %function
> foo:
> .L2:
>         ldxr    x0, [x1]
>         stxr    w3, x2, [x1]
>         cbnz    w3, .L2
>         dmb     ish           < ========
>         dmb     ish           < ========
>         ret
>         .size   foo, .-foo
>         .ident  "GCC: (GNU) 7.3.0"
>         .section        .note.GNU-stack,"", at progbits

Also this is different from the following sequence (stxr instead of stlxr).  

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

I think the two-way memory barrier may not be ensured for this case.  

Felix