RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
Hiroshi H Horii
HORII at jp.ibm.com
Tue May 10 09:35:51 UTC 2016
Hi David,
> > Martin thankfully created a new webrev that include a change of
cmpxchg.
> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> > He has already tested it with AIX, linuxx86_64, linuxppc64le and
> > darwinintel64.
> > (Please tell me if I need to send a new mail for this PFR)
>
> Please do as it will be simpler to track that way.
I will send a new PFR for this cmxchg change.
> That is fine by me. I don't think "ignored" would be confused with
> "relaxed", but "conservative" is fine.
Sure.
> We are quickly running into a hard deadline with Feature Complete
> however - possibly less than 24 hours - for hotspot changes. If this
> doesn't get in in time I will see if I can shepherd it through the
> approval process.
Thanks.
Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo
David Holmes <david.holmes at oracle.com> wrote on 05/10/2016 16:34:32:
> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot-gc-
> dev at openjdk.java.net, hotspot-runtime-dev at openjdk.java.net, "Doerr,
> Martin" <martin.doerr at sap.com>, ppc-aix-port-dev at openjdk.java.net,
> Tim Ellison <Tim_Ellison at uk.ibm.com>, Volker Simonis
> <volker.simonis at gmail.com>
> Date: 05/10/2016 16:35
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
>
> Hi Hiroshi,
>
> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
> > Hi David,
> >
> > Thank you for your comments.
> >
> > As Martin suggested me, I would like to separate this proposal to
> > - relaxing memory order of cmpxchg
> > - improvement of copy_to_survivior with relaxed cmpxchg
> > and discuss the former first.
> >
> > Martin thankfully created a new webrev that include a change of
cmpxchg.
> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> > He has already tested it with AIX, linuxx86_64, linuxppc64le and
> > darwinintel64.
> > (Please tell me if I need to send a new mail for this PFR)
>
> Please do as it will be simpler to track that way.
>
> >> What I would prefer to see is an additional memory_order value (such
as
> >> memory_order_ignored) which is the default for all methods declared
to
> >> take a memory_order parameter.
> >
> > We added simple enum to specify memory order in atomic.hpp as follows.
> >
> > typedef enum cmpxchg_cmpxchg_memory_order {
> > memory_order_relaxed,
> > memory_order_conservative
> > } cmpxchg_memory_order;
> >
> > All of cmpxchg functions have an argument of cmpxchg_memory_order
> > with a default value memory_order_conservative that uses the same
> > semantics with the existing cmpxchg and requires no change for the
existing
> > callers. If you think "memory_order_ignored" is better than
> > "memory_order_conservative", I will be happy to modify this change.
> > (I just thought, "ignored" may resemble "relaxed" and may make
> > people who are familiar with C++11's memory semantics confused.
> > I would like to know thoughts of native speakers.)
>
> That is fine by me. I don't think "ignored" would be confused with
> "relaxed", but "conservative" is fine.
>
> I will run the patch through our internal build system while you prepare
> the updated RFR. My only concern is "unused argument" warnings from the
> compiler. :)
>
> We are quickly running into a hard deadline with Feature Complete
> however - possibly less than 24 hours - for hotspot changes. If this
> doesn't get in in time I will see if I can shepherd it through the
> approval process.
>
> Thanks,
> David
>
>
> > Regards,
> > Hiroshi
> > -----------------------
> > Hiroshi Horii, Ph.D.
> > IBM Research - Tokyo
> >
> >
> > David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
> >
> >> From: David Holmes <david.holmes at oracle.com>
> >> To: Hiroshi H Horii/Japan/IBM at IBMJP
> >> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
> >> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
> >> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
> >> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
> >> <goetz.lindenmaier at sap.com>
> >> Date: 05/04/2016 14:57
> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> copy_to_survivor for ppc64
> >>
> >> Hi Hiroshi,
> >>
> >> Sorry for the delay on getting back to this.
> >>
> >> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> >> > Hi David,
> >> >
> >> > Thank you for your comments and questions.
> >> >
> >> >> 1. Are the current cmpxchg semantics exactly the same as
> >> >> memory_order_seq_cst?
> >> >
> >> > This is very good question..
> >> >
> >> > I guess, cmpxchg needs a more conservative constraint for memory
> > ordering
> >> > than C++11, to add sync after a compare-and-exchange operation.
> >> >
> >> > Could someone give comments or thoughts?
> >>
> >> I don't want to comment on the comparison with C++11. What I would
> >> prefer to see is an additional memory_order value (such as
> >> memory_order_ignored) which is the default for all methods declared
to
> >> take a memory_order parameter. That way existing implementations are
> >> clearly ignoring the memory_order attribute and there is no potential
> >> for confusion as to whether the existing implementations equate to
> >> memory_order_seq_cst or not.
> >>
> >> That said, I'm not sure it makes sense to add the memory_order
parameter
> >> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
> >> oopDesc::cas_forward_to, unless those methods can sensibly be called
> >> with any value for memory_order - which seems highly unlikely.
Perhaps
> >> those methods should identify the weakest form of memory_order they
> >> support and that should be hard-wired into them?
> >>
> >> Thanks,
> >> David
> >>
> >> > memory_order_seq_cst is defined as
> >> > "Any operation with this memory order is both an acquire
> > operation and
> >> > a release operation, plus a single total order exists in which
all
> >> > threads
> >> > observe all modifications (see below) in the same order."
> >> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
> >> >
> >> > In my environment, g++ and xlc generate following assemblies on
ppc64le.
> >> > (interestingly, they generates the same assemblies for any
memory_order)
> >> >
> >> > g++ (4.9.2)
> >> > 100008a4: ac 04 00 7c sync
> >> > 100008a8: 28 50 20 7d lwarx r9,0,r10
> >> > 100008ac: 00 18 09 7c cmpw r9,r3
> >> > 100008b0: 0c 00 c2 40 bne- 100008bc
> >> > 100008b4: 2d 51 80 7c stwcx. r4,0,r10
> >> > 100008b8: f0 ff c2 40 bne- 100008a8
> >> > 100008bc: 2c 01 00 4c isync
> >> >
> >> > xlc (13.1.3)
> >> > 10000888: ac 04 00 7c sync
> >> > 1000088c: 28 28 c0 7c lwarx r6,0,r5
> >> > 10000890: 40 00 26 7c cmpld r6,r0
> >> > 10000894: 0c 00 82 40 bne 100008a0
> >> > 10000898: 2d 29 80 7c stwcx. r4,0,r5
> >> > 1000089c: f0 ff e2 40 bne+ 1000088c
> >> > 100008a0: 2c 01 00 4c isync
> >> >
> >> > On the other hand, the current OpenJDK generates following
assemblies.
> >> >
> >> > 508: ac 04 00 7c sync
> >> > 50c: 00 00 5c e9 ld r10,0(r28)
> >> > 510: 00 50 3b 7c cmpd r27,r10
> >> > 514: 1c 00 c2 40 bne- 530
> >> > 518: a8 40 5c 7d ldarx r10,r28,r8
> >> > 51c: 00 50 3b 7c cmpd r27,r10
> >> > 520: 10 00 c2 40 bne- 530
> >> > 524: ad 41 3c 7d stdcx. r9,r28,r8
> >> > 528: f0 ff c2 40 bne- 518
> >> > 52c: ac 04 00 7c sync
> >> > 530: 00 50 bb 7f ...
> >> >
> >> > Though we can ignore 50c-514 (because they are a duplicated guard
> >> > condition),
> >> > the last sync instruction (52c) makes cmpxchg more strict than
> >> > memory_order_seq_cst.
> >> >
> >> > In some cases, the last sync is necessary when this thread must be
able
> >> > to read
> >> > all of the changes in the other threads while executing from 508 to
530
> >> > (that processes compare-and-exchange).
> >> >
> >> >> 2. Has there been a discussion already, establishing that the
modified
> >> >> GC code can indeed use memory_order_relaxed? Otherwise who is
> >> >> postulating that and based on what evidence?
> >> >
> >> > Volker and his colleagues have investigated the current GC codes
> >> > according to this.
> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> April/019079.html
> >> > However, I believe, we need comments of other GC experts to change
> >> > the shared codes.
> >> >
> >> > Regards,
> >> > Hiroshi
> >> > -----------------------
> >> > Hiroshi Horii, Ph.D.
> >> > IBM Research - Tokyo
> >> >
> >> >
> >> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016
21:57:07:
> >> >
> >> >> From: David Holmes <david.holmes at oracle.com>
> >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> >> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
> >> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
> >> > ppc-aix-port-dev at openjdk.java.net
> >> >> Date: 04/22/2016 21:58
> >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> >> copy_to_survivor for ppc64
> >> >>
> >> >> Hi Hiroshi,
> >> >>
> >> >> Two initial questions:
> >> >>
> >> >> 1. Are the current cmpxchg semantics exactly the same as
> >> >> memory_order_seq_cst?
> >> >>
> >> >> 2. Has there been a discussion already, establishing that the
modified
> >> >> GC code can indeed use memory_order_relaxed? Otherwise who is
> >> >> postulating that and based on what evidence?
> >> >>
> >> >> Missing memory barriers have caused very difficult to track down
> > bugs in
> >> >> the past - very rare race conditions. So any relaxation here has
to be
> >> >> done with extreme confidence.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> >> >> > Dear all:
> >> >> >
> >> >> > Can I please request reviews for the following change?
> >> >> >
> >> >> > Code change:
> >> >> >
> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> >> >> > (I initially created and Martin enhanced so much)
> >> >> >
> >> >> > This change follows the discussion started from this mail.
> >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> >> April/018960.html
> >> >> >
> >> >> > Description:
> >> >> > This change provides relaxed compare-and-exchange by introducing
> >> >> > similar semantics of C++ atomic memory operators, enum
memory_order.
> >> >> > As described in atomic_linux_ppc.inline.hpp, the current
> >> > implementation of
> >> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful
for
> >> >> > general purposes because twice calls of sync before and after
> >> > cmpxchg will
> >> >> > provide strict consistency. However, they sometimes cause
overheads
> >> >> > because
> >> >> > sync instructions are very expensive in the current POWER chip
> > design.
> >> >> > In addition, for the other platforms, such as aarch64, this
strict
> >> >> > semantics
> >> >> > may cause some overheads (according to the Andrew's mail).
> >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> >> April/019073.html
> >> >> >
> >> >> > With this change, callers can explicitly specify constraints of
> > memory
> >> >> > ordering
> >> >> > for cmpxchg with an additional parameter, memory_order order.
> >> >> >
> >> >> > typedef enum memory_order {
> >> >> > memory_order_relaxed,
> >> >> > memory_order_consume,
> >> >> > memory_order_acquire,
> >> >> > memory_order_release,
> >> >> > memory_order_acq_rel,
> >> >> > memory_order_seq_cst
> >> >> > } memory_order;
> >> >> >
> >> >> > Because the default value of the parameter is
memory_order_seq_cst,
> >> >> > existing codes can use the same semantics of cmpxchg without any
> >> >> > modification. The relaxed cmpxchg is implemented only on ppc
> >> >> > in this changeset. Therefore, the behavior on the other
platforms
> > will
> >> >> > not be changed with this changeset.
> >> >> >
> >> >> > In addition, with the new parameter of cmpxchg, this change
improves
> >> >> > performance of copy_to_survivor in the parallel GC.
> >> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
> >> >> > operation doesn't require any sync instructions. A pointer is
> > changed
> >> >> > at most once in a GC and when cmpxchg fails, the latest pointer
is
> >> >> > available for the caller. cas_set_mark and cas_forward_to are
> > extended
> >> >> > with an additional memory_order parameter as cmpxchg and
> >> > copy_to_survivor
> >> >> > uses memory_order_relaxed to modify the forward pointers.
> >> >> >
> >> >> > Summary of source code changes:
> >> >> >
> >> >> > * src/share/vm/runtime/atomic.hpp
> >> >> > - Defines enum memory_order and adds a parameter to
cmpxchg.
> >> >> >
> >> >> > * src/share/vm/runtime/atomic.cpp
> >> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >> >> > - Added a parameter for each cmpxchg function to follow
> >> >> > the change of atomic.hpp. Their implementations are not
> >> > changed.
> >> >> >
> >> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >> >> > - Added a parameter for each cmpxchg function to follow
> >> >> > the change of atomic.hpp. In addition, implementations
> >> >> > are changed corresponding to the specified
memory_order.
> >> >> >
> >> >> > * src/share/vm/oops/oop.hpp
> >> >> > * src/share/vm/oops/oop.inline.hpp
> >> >> > - Add a memory_order parameter to use relaxed cmpxchg in
> >> >> > cas_set_mark and cas_forward_to.
> >> >> >
> >> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
> >> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >> >> >
> >> >> > Martin tested this changeset on linuxx86_64, linuxppc64le and
> >> >> > darwinintel64.
> >> >> > Though more time is needed to test on the other platform, we
would
> >> > like to
> >> >> > ask
> >> >> > reviews and start discussion on this changeset.
> >> >> > I also tested this changeset with SPECjbb2013 and confirmed that
gc
> >> > pause
> >> >> > time
> >> >> > is reduced.
> >> >> >
> >> >> > Regards,
> >> >> > Hiroshi
> >> >> > -----------------------
> >> >> > Hiroshi Horii, Ph.D.
> >> >> > IBM Research - Tokyo
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160510/79b9f845/attachment-0001.html>
More information about the ppc-aix-port-dev
mailing list