[aarch64-port-dev ] RFR: aarch64: minor improvements of atomic operations
Erik Österlund
erik.osterlund at oracle.com
Tue Nov 12 17:38:11 UTC 2019
Hi Felix,
I was hoping to stay out of this conversation, but couldn't resist
butting in unfortunately.
I have to agree with you - you are absolutely right. We have a mix of
the JMM, the C++ memory model and HotSpot's memory model, which predates
that. JMM and C++ memory model are indeed quite similar now in terms of
semantics (yet there exists choice in implementation of it), but the old
memory model used in HotSpot is kind of not similar. Ideally we would
have less memory models and just go with the one used by C++/JMM, and
then we just have to convince ourselves that the choice of
implementation of seq_cst by the compiler is compatible to the one we
use to implement the JMM in our JIT-compiled code. But it seems to me
that we are not there.
Last time I discussed this with Andrew Haley, we disagreed and didn't
really get anywhere. Andrew wanted to use the GCC intrinsics, and I was
arguing that we should use inline assembly as a) the memory model we are
supporting is not the same as what the intrinsic is providing, and b) we
are relying on the implementation of the intrinsics to emit very
specific instruction sequences to be compatible with the memory model,
and it would be more clear if we could see in the inline assembly that
we indeed used exactly those instructions that we expected and not
something unexpected, which we would only randomly find out when
disassembling the code (ahem).
Now it looks like you have discovered that we sometimes have double
trailing dmb ish, and sometimes lacking leading dmb ish if I am reading
this right. That seems to make the case stronger, that by looking at the
intrinsic calls, it's not obvious what instruction sequence will be
emitted, and whether that is compatible with the memory model it is
implementing or not, and you really have to disassemble it to find out
what we actually got. And it looks like what we got is not at all what
we wanted.
My hope is that the AArch64 port should use inline assembly as you
suggest, so we can see that the generated code is correct, as we wait
for the glorious future where all HotSpot code has been rewritten to
work with seq_cst (and we are *not* there now).
Having said that, now I will try to go and hide in a corner again...
Thanks,
/Erik
On 2019-11-12 13:14, Yangfei (Felix) wrote:
>> -----Original Message-----
>> From: Yangfei (Felix)
>> Sent: Tuesday, November 12, 2019 8:03 PM
>> To: 'Andrew Dinn' <adinn at redhat.com>; Andrew Haley <aph at redhat.com>;
>> aarch64-port-dev at openjdk.java.net
>> Cc: hotspot-dev at openjdk.java.net
>> Subject: RE: [aarch64-port-dev ] RFR: aarch64: minor improvements of atomic
>> operations
>>
>>> -----Original Message-----
>>> From: Andrew Dinn [mailto:adinn at redhat.com]
>>> Sent: Tuesday, November 12, 2019 5:42 PM
>>> To: Andrew Haley <aph at redhat.com>; Yangfei (Felix)
>>> <felix.yang at huawei.com>; aarch64-port-dev at openjdk.java.net
>>> Cc: hotspot-dev at openjdk.java.net
>>> Subject: Re: [aarch64-port-dev ] RFR: aarch64: minor improvements of
>>> atomic operations
>>>
>>> On 12/11/2019 09:25, Andrew Haley wrote:
>>>> On 11/12/19 8:37 AM, Yangfei (Felix) wrote:
>>>>> This has been discussed somewhere before:
>>>>> https://patchwork.kernel.org/patch/3575821/
>>>>> Let's keep the current status for safe.
>>>> Yes.
>>>>
>>>> It's been interesting to see the progress of this patch. I don't
>>>> think it's the first time that someone has been tempted to change
>>>> this code to make it "more efficient".
>>>>
>>>> I wonder if we could perhaps add a comment to that code so that it
>>>> doesn't happen again. I'm not sure exactly what the patch should say
>>>> beyond "do not touch". Perhaps something along the lines of "Do not
>>>> touch this code unless you have at least Black Belt, 4th Dan in
>>>> memory ordering." :-)
>>>>
>>>> More seriously, maybe simply "Note that memory_order_conservative
>>>> requires a full barrier after atomic stores. See
>>>> https://patchwork.kernel.org/patch/3575821/"
>>> Yes, that would be a help. It's particularly easy to get confused here
>>> because we happily omit the ordering of an stlr store wrt subsequent
>>> stores when the strl is implementing a Java volatile write or a Java cmpxchg.
>>>
>>> So, it might be worth adding a rider that implementing the full
>>> memory_order_conservative semantics is necessary because VM code
>>> relies on the strong ordering wrt writes that the cmpxchg is required to
>> provide.
>> I also suggest we implement these functions with inline assembly here.
>> For Atomic::PlatformXchg, we may issue two consecutive full memory barriers
>> with the current status.
>> I used GCC 7.3.0 to compile the following function:
>>
>> $ cat test.c
>> long foo(long add_value, long volatile* dest, long exchange_value) {
>> long val = __sync_lock_test_and_set(dest, exchange_value);
>>
>> __sync_synchronize();
>>
>> return val;
>> }
>>
>> $ cat test.s
>> .arch armv8-a
>> .file "test.c"
>> .text
>> .align 2
>> .p2align 3,,7
>> .global foo
>> .type foo, %function
>> foo:
>> .L2:
>> ldxr x0, [x1]
>> stxr w3, x2, [x1]
>> cbnz w3, .L2
>> dmb ish < ========
>> dmb ish < ========
>> ret
>> .size foo, .-foo
>> .ident "GCC: (GNU) 7.3.0"
>> .section .note.GNU-stack,"", at progbits
> Also this is different from the following sequence (stxr instead of stlxr).
>
> <Access [A]>
>
> // atomic_op (B)
> 1: ldxr x0, [B] // Exclusive load
> <op(B)>
> stlxr w1, x0, [B] // Exclusive store with release
> cbnz w1, 1b
> dmb ish // Full barrier
>
> <Access [C]>
>
> I think the two-way memory barrier may not be ensured for this case.
>
> Felix
More information about the aarch64-port-dev
mailing list