[aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations

Yangfei (Felix) felix.yang at huawei.com
Tue Nov 12 12:02:34 UTC 2019


> -----Original Message-----
> From: Andrew Dinn [mailto:adinn at redhat.com]
> Sent: Tuesday, November 12, 2019 5:42 PM
> To: Andrew Haley <aph at redhat.com>; Yangfei (Felix)
> <felix.yang at huawei.com>; aarch64-port-dev at openjdk.java.net
> Cc: hotspot-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic
> operations
> 
> On 12/11/2019 09:25, Andrew Haley wrote:
> > On 11/12/19 8:37 AM, Yangfei (Felix) wrote:
> >> This has been discussed somewhere before:
> >> https://patchwork.kernel.org/patch/3575821/
> >> Let's keep the current status for safe.
> >
> > Yes.
> >
> > It's been interesting to see the progress of this patch. I don't think
> > it's the first time that someone has been tempted to change this code
> > to make it "more efficient".
> >
> > I wonder if we could perhaps add a comment to that code so that it
> > doesn't happen again. I'm not sure exactly what the patch should say
> > beyond "do not touch". Perhaps something along the lines of "Do not
> > touch this code unless you have at least Black Belt, 4th Dan in memory
> > ordering."  :-)
> >
> > More seriously, maybe simply "Note that memory_order_conservative
> > requires a full barrier after atomic stores. See
> > https://patchwork.kernel.org/patch/3575821/"
> Yes, that would be a help. It's particularly easy to get confused here because
> we happily omit the ordering of an stlr store wrt subsequent stores when the
> strl is implementing a Java volatile write or a Java cmpxchg.
> 
> So, it might be worth adding a rider that implementing the full
> memory_order_conservative semantics is necessary because VM code relies
> on the strong ordering wrt writes that the cmpxchg is required to provide.
> 

I also suggest we implement these functions with inline assembly here.  
For Atomic::PlatformXchg, we may issue two consecutive full memory barriers with the current status.  
I used GCC 7.3.0 to compile the following function:

$ cat test.c
long foo(long add_value, long volatile* dest, long exchange_value)
{
  long val = __sync_lock_test_and_set(dest, exchange_value);

  __sync_synchronize();

  return val;
}

$ cat test.s
        .arch armv8-a
        .file   "test.c"
        .text
        .align  2
        .p2align 3,,7
        .global foo
        .type   foo, %function
foo:
.L2:
        ldxr    x0, [x1]
        stxr    w3, x2, [x1]
        cbnz    w3, .L2
        dmb     ish           < ========
        dmb     ish           < ========
        ret
        .size   foo, .-foo
        .ident  "GCC: (GNU) 7.3.0"
        .section        .note.GNU-stack,"", at progbits


More information about the hotspot-dev mailing list