RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Hiroshi H Horii HORII at jp.ibm.com
Wed Apr 27 03:34:12 UTC 2016


Hi Martin,

> I think we shouldn’t better use an own enum (e.g. like AccessKind in
> library_call.cpp).
> Otherwise we’ll get trouble when we switch to C++11.  Would you agree?

I agree. 

I think, to use the enum and semantics of C++11, callers of cmpxchg need 
to call 
memory-barrier after cmpxchg when all of updates in the other processes 
must be
available for the following instructions of the cmpxchg. Correct?

> Would it be better to split this bug into 2 and discuss the cmpxchg 
> interface change on the runtime list and the GC change on the gc list?

Do you mean that a new cmpxchg with relaxed semantics will be added and 
used in 
the GC change? Or, after the discussion of the new cmpxchg interface, will 
be
the discussion of the GC change started?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


"Doerr, Martin" <martin.doerr at sap.com> wrote on 04/25/2016 19:25:15:

> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP, David Holmes 
<david.holmes at oracle.com>
> Cc: "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-
> dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" 
> <hotspot-runtime-dev at openjdk.java.net>, "ppc-aix-port-
> dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>, Tim 
> Ellison <Tim_Ellison at uk.ibm.com>, Volker Simonis 
> <volker.simonis at gmail.com>, "Lindenmaier, Goetz" 
<goetz.lindenmaier at sap.com>
> Date: 04/25/2016 19:26
> Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Hi David and Hiroshi,
> 
> thank you very much for this interesting question and analysis.
> 
> I think we shouldn’t better use an own enum (e.g. like AccessKind in
> library_call.cpp).
> Otherwise we’ll get trouble when we switch to C++11.  Would you agree?
> 
> Would it be better to split this bug into 2 and discuss the cmpxchg 
> interface change on the runtime list and the GC change on the gc list?
> 
> Best regards,
> Martin
> 
> From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] 
> Sent: Montag, 25. April 2016 09:10
> To: David Holmes <david.holmes at oracle.com>
> Cc: hotspot-gc-dev at openjdk.java.net; hotspot-runtime-
> dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison
> <Tim_Ellison at uk.ibm.com>; Volker Simonis <volker.simonis at gmail.com>;
> Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz 
> <goetz.lindenmaier at sap.com>
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Hi David,
> 
> Thank you for your comments and questions.
> 
> > 1. Are the current cmpxchg semantics exactly the same as 
> > memory_order_seq_cst?
> 
> This is very good question..
> 
> I guess, cmpxchg needs a more conservative constraint for memory 
ordering
> than C++11, to add sync after a compare-and-exchange operation. 
> 
> Could someone give comments or thoughts?
> 
> memory_order_seq_cst is defined as 
>     "Any operation with this memory order is both an acquire operation 
and 
>      a release operation, plus a single total order exists in which 
> all threads
>      observe all modifications (see below) in the same order."
> (http://en.cppreference.com/w/cpp/atomic/memory_order)
> 
> In my environment, g++ and xlc generate following assemblies on ppc64le.
> (interestingly, they generates the same assemblies for any memory_order)
> 
> g++ (4.9.2)
>     100008a4:   ac 04 00 7c     sync 
>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>     100008ac:   00 18 09 7c     cmpw    r9,r3
>     100008b0:   0c 00 c2 40     bne-    100008bc
>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>     100008b8:   f0 ff c2 40     bne-    100008a8
>     100008bc:   2c 01 00 4c     isync
> 
> xlc (13.1.3)
>     10000888:   ac 04 00 7c     sync 
>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>     10000890:   40 00 26 7c     cmpld   r6,r0
>     10000894:   0c 00 82 40     bne     100008a0
>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>     1000089c:   f0 ff e2 40     bne+    1000088c
>     100008a0:   2c 01 00 4c     isync
> 
> On the other hand, the current OpenJDK generates following assemblies.
> 
>     508:   ac 04 00 7c     sync 
>     50c:   00 00 5c e9     ld      r10,0(r28)
>     510:   00 50 3b 7c     cmpd    r27,r10
>     514:   1c 00 c2 40     bne-    530
>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>     51c:   00 50 3b 7c     cmpd    r27,r10
>     520:   10 00 c2 40     bne-    530
>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>     528:   f0 ff c2 40     bne-    518
>     52c:   ac 04 00 7c     sync 
>     530:   00 50 bb 7f     ...
> 
> Though we can ignore 50c-514 (because they are a duplicated guard 
condition), 
> the last sync instruction (52c) makes cmpxchg more strict than 
> memory_order_seq_cst.
> 
> In some cases, the last sync is necessary when this thread must be 
> able to read
> all of the changes in the other threads while executing from 508 to 530 
> (that processes compare-and-exchange).
> 
> > 2. Has there been a discussion already, establishing that the modified 

> > GC code can indeed use memory_order_relaxed? Otherwise who is 
> > postulating that and based on what evidence?
> 
> Volker and his colleagues have investigated the current GC codes 
> according to this.
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019079.html
> However, I believe, we need comments of other GC experts to change 
> the shared codes.
> 
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
> 
> 
> David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
> 
> > From: David Holmes <david.holmes at oracle.com>
> > To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> > dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
> > Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>, 
ppc-aix-port-dev at openjdk.java.net
> > Date: 04/22/2016 21:58
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> > copy_to_survivor for ppc64
> > 
> > Hi Hiroshi,
> > 
> > Two initial questions:
> > 
> > 1. Are the current cmpxchg semantics exactly the same as 
> > memory_order_seq_cst?
> > 
> > 2. Has there been a discussion already, establishing that the modified 

> > GC code can indeed use memory_order_relaxed? Otherwise who is 
> > postulating that and based on what evidence?
> > 
> > Missing memory barriers have caused very difficult to track down bugs 
in 
> > the past - very rare race conditions. So any relaxation here has to be 

> > done with extreme confidence.
> > 
> > Thanks,
> > David
> > 
> > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> > > Dear all:
> > >
> > > Can I please request reviews for the following change?
> > >
> > > Code change:
> > > 
http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> > > (I initially created and Martin enhanced so much)
> > >
> > > This change follows the discussion started from this mail.
> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> > April/018960.html
> > >
> > > Description:
> > > This change provides relaxed compare-and-exchange by introducing
> > > similar semantics of C++ atomic memory operators, enum memory_order.
> > > As described in atomic_linux_ppc.inline.hpp, the current 
implementation of
> > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
> > > general purposes because twice calls of sync before and after 
cmpxchg will
> > > provide strict consistency. However, they sometimes cause overheads
> > > because
> > > sync instructions are very expensive in the current POWER chip 
design.
> > > In addition, for the other platforms, such as aarch64, this strict
> > > semantics
> > > may cause some overheads (according to the Andrew's mail).
> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> > April/019073.html
> > >
> > > With this change, callers can explicitly specify constraints of 
memory
> > > ordering
> > > for cmpxchg with an additional parameter, memory_order order.
> > >
> > > typedef enum memory_order {
> > >    memory_order_relaxed,
> > >    memory_order_consume,
> > >    memory_order_acquire,
> > >    memory_order_release,
> > >    memory_order_acq_rel,
> > >    memory_order_seq_cst
> > > } memory_order;
> > >
> > > Because the default value of the parameter is memory_order_seq_cst,
> > > existing codes can use the same semantics of cmpxchg without any
> > > modification. The relaxed cmpxchg is implemented only on ppc
> > > in this changeset. Therefore, the behavior on the other platforms 
will
> > > not be changed with this changeset.
> > >
> > > In addition, with the new parameter of cmpxchg, this change improves
> > > performance of copy_to_survivor in the parallel GC.
> > > copy_to_survivor changes forward pointers by using cmpxchg. This
> > > operation doesn't require any sync instructions.  A pointer is 
changed
> > > at most once in a GC and when cmpxchg fails, the latest pointer is
> > > available for the caller. cas_set_mark and cas_forward_to are 
extended
> > > with an additional memory_order parameter as cmpxchg and 
copy_to_survivor
> > > uses memory_order_relaxed to modify the forward pointers.
> > >
> > > Summary of source code changes:
> > >
> > > * src/share/vm/runtime/atomic.hpp
> > >       - Defines enum memory_order and adds a parameter to cmpxchg.
> > >
> > > * src/share/vm/runtime/atomic.cpp
> > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> > >       - Added a parameter for each cmpxchg function to follow
> > >          the change of atomic.hpp. Their implementations are not 
changed.
> > >
> > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> > >       - Added a parameter for each cmpxchg function to follow
> > >          the change of atomic.hpp. In addition, implementations
> > >          are changed corresponding to the specified memory_order.
> > >
> > > * src/share/vm/oops/oop.hpp
> > > * src/share/vm/oops/oop.inline.hpp
> > >       - Add a memory_order parameter to use relaxed cmpxchg in
> > >          cas_set_mark and cas_forward_to.
> > >
> > > * src/share/vm/gc/parallel/psPromotionManager.cpp
> > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> > >
> > > Martin tested this changeset  on linuxx86_64, linuxppc64le and
> > > darwinintel64.
> > > Though more time is needed to test on the other platform, we would 
like to
> > > ask
> > > reviews and start discussion on this changeset.
> > > I also tested this changeset with SPECjbb2013 and confirmed that gc 
pause
> > > time
> > > is reduced.
> > >
> > > Regards,
> > > Hiroshi
> > > -----------------------
> > > Hiroshi Horii, Ph.D.
> > > IBM Research - Tokyo
> > >
> > >
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160427/2ee43041/attachment-0001.html>


More information about the ppc-aix-port-dev mailing list