RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Hiroshi H Horii HORII at jp.ibm.com
Mon Apr 25 07:09:30 UTC 2016


Hi David,

Thank you for your comments and questions.

> 1. Are the current cmpxchg semantics exactly the same as 
> memory_order_seq_cst?

This is very good question..

I guess, cmpxchg needs a more conservative constraint for memory ordering
than C++11, to add sync after a compare-and-exchange operation. 

Could someone give comments or thoughts?

memory_order_seq_cst is defined as 
    "Any operation with this memory order is both an acquire operation and 

     a release operation, plus a single total order exists in which all 
threads
     observe all modifications (see below) in the same order."
(http://en.cppreference.com/w/cpp/atomic/memory_order)

In my environment, g++ and xlc generate following assemblies on ppc64le.
(interestingly, they generates the same assemblies for any memory_order)

g++ (4.9.2)
    100008a4:   ac 04 00 7c     sync 
    100008a8:   28 50 20 7d     lwarx   r9,0,r10
    100008ac:   00 18 09 7c     cmpw    r9,r3
    100008b0:   0c 00 c2 40     bne-    100008bc
    100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
    100008b8:   f0 ff c2 40     bne-    100008a8
    100008bc:   2c 01 00 4c     isync

xlc (13.1.3)
    10000888:   ac 04 00 7c     sync 
    1000088c:   28 28 c0 7c     lwarx   r6,0,r5
    10000890:   40 00 26 7c     cmpld   r6,r0
    10000894:   0c 00 82 40     bne     100008a0
    10000898:   2d 29 80 7c     stwcx.  r4,0,r5
    1000089c:   f0 ff e2 40     bne+    1000088c
    100008a0:   2c 01 00 4c     isync

On the other hand, the current OpenJDK generates following assemblies.

    508:   ac 04 00 7c     sync 
    50c:   00 00 5c e9     ld      r10,0(r28)
    510:   00 50 3b 7c     cmpd    r27,r10
    514:   1c 00 c2 40     bne-    530
    518:   a8 40 5c 7d     ldarx   r10,r28,r8
    51c:   00 50 3b 7c     cmpd    r27,r10
    520:   10 00 c2 40     bne-    530
    524:   ad 41 3c 7d     stdcx.  r9,r28,r8
    528:   f0 ff c2 40     bne-    518
    52c:   ac 04 00 7c     sync 
    530:   00 50 bb 7f     ...

Though we can ignore 50c-514 (because they are a duplicated guard 
condition), 
the last sync instruction (52c) makes cmpxchg more strict than 
memory_order_seq_cst.

In some cases, the last sync is necessary when this thread must be able to 
read
all of the changes in the other threads while executing from 508 to 530 
(that processes compare-and-exchange).

> 2. Has there been a discussion already, establishing that the modified 
> GC code can indeed use memory_order_relaxed? Otherwise who is 
> postulating that and based on what evidence?

Volker and his colleagues have investigated the current GC codes according 
to this.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html
However, I believe, we need comments of other GC experts to change 
the shared codes.

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:

> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>, 
ppc-aix-port-dev at openjdk.java.net
> Date: 04/22/2016 21:58
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Hi Hiroshi,
> 
> Two initial questions:
> 
> 1. Are the current cmpxchg semantics exactly the same as 
> memory_order_seq_cst?
> 
> 2. Has there been a discussion already, establishing that the modified 
> GC code can indeed use memory_order_relaxed? Otherwise who is 
> postulating that and based on what evidence?
> 
> Missing memory barriers have caused very difficult to track down bugs in 

> the past - very rare race conditions. So any relaxation here has to be 
> done with extreme confidence.
> 
> Thanks,
> David
> 
> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> > Dear all:
> >
> > Can I please request reviews for the following change?
> >
> > Code change:
> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> > (I initially created and Martin enhanced so much)
> >
> > This change follows the discussion started from this mail.
> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/018960.html
> >
> > Description:
> > This change provides relaxed compare-and-exchange by introducing
> > similar semantics of C++ atomic memory operators, enum memory_order.
> > As described in atomic_linux_ppc.inline.hpp, the current 
implementation of
> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
> > general purposes because twice calls of sync before and after cmpxchg 
will
> > provide strict consistency. However, they sometimes cause overheads
> > because
> > sync instructions are very expensive in the current POWER chip design.
> > In addition, for the other platforms, such as aarch64, this strict
> > semantics
> > may cause some overheads (according to the Andrew's mail).
> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019073.html
> >
> > With this change, callers can explicitly specify constraints of memory
> > ordering
> > for cmpxchg with an additional parameter, memory_order order.
> >
> > typedef enum memory_order {
> >    memory_order_relaxed,
> >    memory_order_consume,
> >    memory_order_acquire,
> >    memory_order_release,
> >    memory_order_acq_rel,
> >    memory_order_seq_cst
> > } memory_order;
> >
> > Because the default value of the parameter is memory_order_seq_cst,
> > existing codes can use the same semantics of cmpxchg without any
> > modification. The relaxed cmpxchg is implemented only on ppc
> > in this changeset. Therefore, the behavior on the other platforms will
> > not be changed with this changeset.
> >
> > In addition, with the new parameter of cmpxchg, this change improves
> > performance of copy_to_survivor in the parallel GC.
> > copy_to_survivor changes forward pointers by using cmpxchg. This
> > operation doesn't require any sync instructions.  A pointer is changed
> > at most once in a GC and when cmpxchg fails, the latest pointer is
> > available for the caller. cas_set_mark and cas_forward_to are extended
> > with an additional memory_order parameter as cmpxchg and 
copy_to_survivor
> > uses memory_order_relaxed to modify the forward pointers.
> >
> > Summary of source code changes:
> >
> > * src/share/vm/runtime/atomic.hpp
> >       - Defines enum memory_order and adds a parameter to cmpxchg.
> >
> > * src/share/vm/runtime/atomic.cpp
> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >       - Added a parameter for each cmpxchg function to follow
> >          the change of atomic.hpp. Their implementations are not 
changed.
> >
> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >       - Added a parameter for each cmpxchg function to follow
> >          the change of atomic.hpp. In addition, implementations
> >          are changed corresponding to the specified memory_order.
> >
> > * src/share/vm/oops/oop.hpp
> > * src/share/vm/oops/oop.inline.hpp
> >       - Add a memory_order parameter to use relaxed cmpxchg in
> >          cas_set_mark and cas_forward_to.
> >
> > * src/share/vm/gc/parallel/psPromotionManager.cpp
> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >
> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
> > darwinintel64.
> > Though more time is needed to test on the other platform, we would 
like to
> > ask
> > reviews and start discussion on this changeset.
> > I also tested this changeset with SPECjbb2013 and confirmed that gc 
pause
> > time
> > is reduced.
> >
> > Regards,
> > Hiroshi
> > -----------------------
> > Hiroshi Horii, Ph.D.
> > IBM Research - Tokyo
> >
> >
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160425/3b4978c0/attachment-0001.html>


More information about the ppc-aix-port-dev mailing list