RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

David Holmes david.holmes at oracle.com
Wed May 4 05:55:29 UTC 2016


Hi Hiroshi,

Sorry for the delay on getting back to this.

On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> Hi David,
>
> Thank you for your comments and questions.
>
>> 1. Are the current cmpxchg semantics exactly the same as
>> memory_order_seq_cst?
>
> This is very good question..
>
> I guess, cmpxchg needs a more conservative constraint for memory ordering
> than C++11, to add sync after a compare-and-exchange operation.
>
> Could someone give comments or thoughts?

I don't want to comment on the comparison with C++11. What I would 
prefer to see is an additional memory_order value (such as 
memory_order_ignored) which is the default for all methods declared to 
take a memory_order parameter. That way existing implementations are 
clearly ignoring the memory_order attribute and there is no potential 
for confusion as to whether the existing implementations equate to 
memory_order_seq_cst or not.

That said, I'm not sure it makes sense to add the memory_order parameter 
to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, 
oopDesc::cas_forward_to, unless those methods can sensibly be called 
with any value for memory_order - which seems highly unlikely. Perhaps 
those methods should identify the weakest form of memory_order they 
support and that should be hard-wired into them?

Thanks,
David

> memory_order_seq_cst is defined as
>     "Any operation with this memory order is both an acquire operation and
>      a release operation, plus a single total order exists in which all
> threads
>      observe all modifications (see below) in the same order."
> (http://en.cppreference.com/w/cpp/atomic/memory_order)
>
> In my environment, g++ and xlc generate following assemblies on ppc64le.
> (interestingly, they generates the same assemblies for any memory_order)
>
> g++ (4.9.2)
>     100008a4:   ac 04 00 7c     sync
>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>     100008ac:   00 18 09 7c     cmpw    r9,r3
>     100008b0:   0c 00 c2 40     bne-    100008bc
>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>     100008b8:   f0 ff c2 40     bne-    100008a8
>     100008bc:   2c 01 00 4c     isync
>
> xlc (13.1.3)
>     10000888:   ac 04 00 7c     sync
>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>     10000890:   40 00 26 7c     cmpld   r6,r0
>     10000894:   0c 00 82 40     bne     100008a0
>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>     1000089c:   f0 ff e2 40     bne+    1000088c
>     100008a0:   2c 01 00 4c     isync
>
> On the other hand, the current OpenJDK generates following assemblies.
>
>     508:   ac 04 00 7c     sync
>     50c:   00 00 5c e9     ld      r10,0(r28)
>     510:   00 50 3b 7c     cmpd    r27,r10
>     514:   1c 00 c2 40     bne-    530
>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>     51c:   00 50 3b 7c     cmpd    r27,r10
>     520:   10 00 c2 40     bne-    530
>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>     528:   f0 ff c2 40     bne-    518
>     52c:   ac 04 00 7c     sync
>     530:   00 50 bb 7f     ...
>
> Though we can ignore 50c-514 (because they are a duplicated guard
> condition),
> the last sync instruction (52c) makes cmpxchg more strict than
> memory_order_seq_cst.
>
> In some cases, the last sync is necessary when this thread must be able
> to read
> all of the changes in the other threads while executing from 508 to 530
> (that processes compare-and-exchange).
>
>> 2. Has there been a discussion already, establishing that the modified
>> GC code can indeed use memory_order_relaxed? Otherwise who is
>> postulating that and based on what evidence?
>
> Volker and his colleagues have investigated the current GC codes
> according to this.
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html
> However, I believe, we need comments of other GC experts to change
> the shared codes.
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>
>> From: David Holmes <david.holmes at oracle.com>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
> ppc-aix-port-dev at openjdk.java.net
>> Date: 04/22/2016 21:58
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> Hi Hiroshi,
>>
>> Two initial questions:
>>
>> 1. Are the current cmpxchg semantics exactly the same as
>> memory_order_seq_cst?
>>
>> 2. Has there been a discussion already, establishing that the modified
>> GC code can indeed use memory_order_relaxed? Otherwise who is
>> postulating that and based on what evidence?
>>
>> Missing memory barriers have caused very difficult to track down bugs in
>> the past - very rare race conditions. So any relaxation here has to be
>> done with extreme confidence.
>>
>> Thanks,
>> David
>>
>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>> > Dear all:
>> >
>> > Can I please request reviews for the following change?
>> >
>> > Code change:
>> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>> > (I initially created and Martin enhanced so much)
>> >
>> > This change follows the discussion started from this mail.
>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/018960.html
>> >
>> > Description:
>> > This change provides relaxed compare-and-exchange by introducing
>> > similar semantics of C++ atomic memory operators, enum memory_order.
>> > As described in atomic_linux_ppc.inline.hpp, the current
> implementation of
>> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>> > general purposes because twice calls of sync before and after
> cmpxchg will
>> > provide strict consistency. However, they sometimes cause overheads
>> > because
>> > sync instructions are very expensive in the current POWER chip design.
>> > In addition, for the other platforms, such as aarch64, this strict
>> > semantics
>> > may cause some overheads (according to the Andrew's mail).
>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/019073.html
>> >
>> > With this change, callers can explicitly specify constraints of memory
>> > ordering
>> > for cmpxchg with an additional parameter, memory_order order.
>> >
>> > typedef enum memory_order {
>> >    memory_order_relaxed,
>> >    memory_order_consume,
>> >    memory_order_acquire,
>> >    memory_order_release,
>> >    memory_order_acq_rel,
>> >    memory_order_seq_cst
>> > } memory_order;
>> >
>> > Because the default value of the parameter is memory_order_seq_cst,
>> > existing codes can use the same semantics of cmpxchg without any
>> > modification. The relaxed cmpxchg is implemented only on ppc
>> > in this changeset. Therefore, the behavior on the other platforms will
>> > not be changed with this changeset.
>> >
>> > In addition, with the new parameter of cmpxchg, this change improves
>> > performance of copy_to_survivor in the parallel GC.
>> > copy_to_survivor changes forward pointers by using cmpxchg. This
>> > operation doesn't require any sync instructions.  A pointer is changed
>> > at most once in a GC and when cmpxchg fails, the latest pointer is
>> > available for the caller. cas_set_mark and cas_forward_to are extended
>> > with an additional memory_order parameter as cmpxchg and
> copy_to_survivor
>> > uses memory_order_relaxed to modify the forward pointers.
>> >
>> > Summary of source code changes:
>> >
>> > * src/share/vm/runtime/atomic.hpp
>> >       - Defines enum memory_order and adds a parameter to cmpxchg.
>> >
>> > * src/share/vm/runtime/atomic.cpp
>> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>> >       - Added a parameter for each cmpxchg function to follow
>> >          the change of atomic.hpp. Their implementations are not
> changed.
>> >
>> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>> >       - Added a parameter for each cmpxchg function to follow
>> >          the change of atomic.hpp. In addition, implementations
>> >          are changed corresponding to the specified memory_order.
>> >
>> > * src/share/vm/oops/oop.hpp
>> > * src/share/vm/oops/oop.inline.hpp
>> >       - Add a memory_order parameter to use relaxed cmpxchg in
>> >          cas_set_mark and cas_forward_to.
>> >
>> > * src/share/vm/gc/parallel/psPromotionManager.cpp
>> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>> >
>> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
>> > darwinintel64.
>> > Though more time is needed to test on the other platform, we would
> like to
>> > ask
>> > reviews and start discussion on this changeset.
>> > I also tested this changeset with SPECjbb2013 and confirmed that gc
> pause
>> > time
>> > is reduced.
>> >
>> > Regards,
>> > Hiroshi
>> > -----------------------
>> > Hiroshi Horii, Ph.D.
>> > IBM Research - Tokyo
>> >
>> >
>>
>


More information about the hotspot-runtime-dev mailing list