RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Doerr, Martin martin.doerr at sap.com
Wed Apr 27 14:46:18 UTC 2016


Hi Hiroshi,

> I think, to use the enum and semantics of C++11, callers of cmpxchg need to call
> memory-barrier after cmpxchg when all of updates in the other processes must be
> available for the following instructions of the cmpxchg. Correct?

I think the problem is that our current C++ code doesn’t use seq_cst semantics for volatile loads.
The sync at the end of the cmpxchg is only needed if a volatile load is following which is not preceded by a sync instruction.
As long as this is the case, I think we should keep the sync at the end of cmpxchg.

> Do you mean that a new cmpxchg with relaxed semantics will be added and used in
> the GC change? Or, after the discussion of the new cmpxchg interface, will be
> the discussion of the GC change started?

The second.

Best regards,
Martin

From: Hiroshi H Horii [mailto:HORII at jp.ibm.com]
Sent: Mittwoch, 27. April 2016 05:34
To: Doerr, Martin <martin.doerr at sap.com>
Cc: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison <Tim_Ellison at uk.ibm.com>; Volker Simonis <volker.simonis at gmail.com>
Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Hi Martin,

> I think we shouldn’t better use an own enum (e.g. like AccessKind in
> library_call.cpp).
> Otherwise we’ll get trouble when we switch to C++11.  Would you agree?

I agree.

I think, to use the enum and semantics of C++11, callers of cmpxchg need to call
memory-barrier after cmpxchg when all of updates in the other processes must be
available for the following instructions of the cmpxchg. Correct?

> Would it be better to split this bug into 2 and discuss the cmpxchg
> interface change on the runtime list and the GC change on the gc list?

Do you mean that a new cmpxchg with relaxed semantics will be added and used in
the GC change? Or, after the discussion of the new cmpxchg interface, will be
the discussion of the GC change started?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


"Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote on 04/25/2016 19:25:15:

> From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
> To: Hiroshi H Horii/Japan/IBM at IBMJP, David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>
> Cc: "hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>" <hotspot-gc-
> dev at openjdk.java.net<mailto:dev at openjdk.java.net>>, "hotspot-runtime-dev at openjdk.java.net<mailto:hotspot-runtime-dev at openjdk.java.net>"
> <hotspot-runtime-dev at openjdk.java.net<mailto:hotspot-runtime-dev at openjdk.java.net>>, "ppc-aix-port-
> dev at openjdk.java.net<mailto:dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, Tim
> Ellison <Tim_Ellison at uk.ibm.com<mailto:Tim_Ellison at uk.ibm.com>>, Volker Simonis
> <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
> Date: 04/25/2016 19:26
> Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
>
> Hi David and Hiroshi,
>
> thank you very much for this interesting question and analysis.
>
> I think we shouldn’t better use an own enum (e.g. like AccessKind in
> library_call.cpp).
> Otherwise we’ll get trouble when we switch to C++11.  Would you agree?
>
> Would it be better to split this bug into 2 and discuss the cmpxchg
> interface change on the runtime list and the GC change on the gc list?
>
> Best regards,
> Martin
>
> From: Hiroshi H Horii [mailto:HORII at jp.ibm.com]
> Sent: Montag, 25. April 2016 09:10
> To: David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>
> Cc: hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>; hotspot-runtime-
> dev at openjdk.java.net<mailto:dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>; Tim Ellison
> <Tim_Ellison at uk.ibm.com<mailto:Tim_Ellison at uk.ibm.com>>; Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>;
> Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
>
> Hi David,
>
> Thank you for your comments and questions.
>
> > 1. Are the current cmpxchg semantics exactly the same as
> > memory_order_seq_cst?
>
> This is very good question..
>
> I guess, cmpxchg needs a more conservative constraint for memory ordering
> than C++11, to add sync after a compare-and-exchange operation.
>
> Could someone give comments or thoughts?
>
> memory_order_seq_cst is defined as
>     "Any operation with this memory order is both an acquire operation and
>      a release operation, plus a single total order exists in which
> all threads
>      observe all modifications (see below) in the same order."
> (http://en.cppreference.com/w/cpp/atomic/memory_order)
>
> In my environment, g++ and xlc generate following assemblies on ppc64le.
> (interestingly, they generates the same assemblies for any memory_order)
>
> g++ (4.9.2)
>     100008a4:   ac 04 00 7c     sync
>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>     100008ac:   00 18 09 7c     cmpw    r9,r3
>     100008b0:   0c 00 c2 40     bne-    100008bc
>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>     100008b8:   f0 ff c2 40     bne-    100008a8
>     100008bc:   2c 01 00 4c     isync
>
> xlc (13.1.3)
>     10000888:   ac 04 00 7c     sync
>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>     10000890:   40 00 26 7c     cmpld   r6,r0
>     10000894:   0c 00 82 40     bne     100008a0
>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>     1000089c:   f0 ff e2 40     bne+    1000088c
>     100008a0:   2c 01 00 4c     isync
>
> On the other hand, the current OpenJDK generates following assemblies.
>
>     508:   ac 04 00 7c     sync
>     50c:   00 00 5c e9     ld      r10,0(r28)
>     510:   00 50 3b 7c     cmpd    r27,r10
>     514:   1c 00 c2 40     bne-    530
>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>     51c:   00 50 3b 7c     cmpd    r27,r10
>     520:   10 00 c2 40     bne-    530
>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>     528:   f0 ff c2 40     bne-    518
>     52c:   ac 04 00 7c     sync
>     530:   00 50 bb 7f     ...
>
> Though we can ignore 50c-514 (because they are a duplicated guard condition),
> the last sync instruction (52c) makes cmpxchg more strict than
> memory_order_seq_cst.
>
> In some cases, the last sync is necessary when this thread must be
> able to read
> all of the changes in the other threads while executing from 508 to 530
> (that processes compare-and-exchange).
>
> > 2. Has there been a discussion already, establishing that the modified
> > GC code can indeed use memory_order_relaxed? Otherwise who is
> > postulating that and based on what evidence?
>
> Volker and his colleagues have investigated the current GC codes
> according to this.
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019079.html
> However, I believe, we need comments of other GC experts to change
> the shared codes.
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote on 04/22/2016 21:57:07:
>
> > From: David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>
> > To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> > dev at openjdk.java.net<mailto:dev at openjdk.java.net>, hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>
> > Cc: Tim Ellison <Tim_Ellison at uk.ibm.com<mailto:Tim_Ellison at uk.ibm.com>>, ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>
> > Date: 04/22/2016 21:58
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> > copy_to_survivor for ppc64
> >
> > Hi Hiroshi,
> >
> > Two initial questions:
> >
> > 1. Are the current cmpxchg semantics exactly the same as
> > memory_order_seq_cst?
> >
> > 2. Has there been a discussion already, establishing that the modified
> > GC code can indeed use memory_order_relaxed? Otherwise who is
> > postulating that and based on what evidence?
> >
> > Missing memory barriers have caused very difficult to track down bugs in
> > the past - very rare race conditions. So any relaxation here has to be
> > done with extreme confidence.
> >
> > Thanks,
> > David
> >
> > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> > > Dear all:
> > >
> > > Can I please request reviews for the following change?
> > >
> > > Code change:
> > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> > > (I initially created and Martin enhanced so much)
> > >
> > > This change follows the discussion started from this mail.
> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> > April/018960.html
> > >
> > > Description:
> > > This change provides relaxed compare-and-exchange by introducing
> > > similar semantics of C++ atomic memory operators, enum memory_order.
> > > As described in atomic_linux_ppc.inline.hpp, the current implementation of
> > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
> > > general purposes because twice calls of sync before and after cmpxchg will
> > > provide strict consistency. However, they sometimes cause overheads
> > > because
> > > sync instructions are very expensive in the current POWER chip design.
> > > In addition, for the other platforms, such as aarch64, this strict
> > > semantics
> > > may cause some overheads (according to the Andrew's mail).
> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> > April/019073.html
> > >
> > > With this change, callers can explicitly specify constraints of memory
> > > ordering
> > > for cmpxchg with an additional parameter, memory_order order.
> > >
> > > typedef enum memory_order {
> > >    memory_order_relaxed,
> > >    memory_order_consume,
> > >    memory_order_acquire,
> > >    memory_order_release,
> > >    memory_order_acq_rel,
> > >    memory_order_seq_cst
> > > } memory_order;
> > >
> > > Because the default value of the parameter is memory_order_seq_cst,
> > > existing codes can use the same semantics of cmpxchg without any
> > > modification. The relaxed cmpxchg is implemented only on ppc
> > > in this changeset. Therefore, the behavior on the other platforms will
> > > not be changed with this changeset.
> > >
> > > In addition, with the new parameter of cmpxchg, this change improves
> > > performance of copy_to_survivor in the parallel GC.
> > > copy_to_survivor changes forward pointers by using cmpxchg. This
> > > operation doesn't require any sync instructions.  A pointer is changed
> > > at most once in a GC and when cmpxchg fails, the latest pointer is
> > > available for the caller. cas_set_mark and cas_forward_to are extended
> > > with an additional memory_order parameter as cmpxchg and copy_to_survivor
> > > uses memory_order_relaxed to modify the forward pointers.
> > >
> > > Summary of source code changes:
> > >
> > > * src/share/vm/runtime/atomic.hpp
> > >       - Defines enum memory_order and adds a parameter to cmpxchg.
> > >
> > > * src/share/vm/runtime/atomic.cpp
> > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> > >       - Added a parameter for each cmpxchg function to follow
> > >          the change of atomic.hpp. Their implementations are not changed.
> > >
> > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> > >       - Added a parameter for each cmpxchg function to follow
> > >          the change of atomic.hpp. In addition, implementations
> > >          are changed corresponding to the specified memory_order.
> > >
> > > * src/share/vm/oops/oop.hpp
> > > * src/share/vm/oops/oop.inline.hpp
> > >       - Add a memory_order parameter to use relaxed cmpxchg in
> > >          cas_set_mark and cas_forward_to.
> > >
> > > * src/share/vm/gc/parallel/psPromotionManager.cpp
> > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> > >
> > > Martin tested this changeset  on linuxx86_64, linuxppc64le and
> > > darwinintel64.
> > > Though more time is needed to test on the other platform, we would like to
> > > ask
> > > reviews and start discussion on this changeset.
> > > I also tested this changeset with SPECjbb2013 and confirmed that gc pause
> > > time
> > > is reduced.
> > >
> > > Regards,
> > > Hiroshi
> > > -----------------------
> > > Hiroshi Horii, Ph.D.
> > > IBM Research - Tokyo
> > >
> > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160427/463f3282/attachment-0001.html>


More information about the ppc-aix-port-dev mailing list