RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
David Holmes
david.holmes at oracle.com
Thu Sep 29 22:16:16 UTC 2016
On 30/09/2016 12:47 AM, Carsten Varming wrote:
> Dear Hiroshi,
>
> In hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:266
> the log line reads data from the forwardee even when the CAS fails. I
> believe those reads will be unsafe without barriers after the copy of
> the content of the object.
I find it extremely hard to reason about a barrier-less cmpxchg in general.
If you feel that the use of new_obj->size() is potentially unsafe then
the fact we return new_obj means that any use of new_obj by the caller
may also potentially be unsafe.
David
-----
> hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:288 same
> problem as in line 266
>
> I would argue that the logging should only happen if the thread
> successfully copied the object and CAS failures should be logged
> separately without reading data from the forwardee.
>
> BTW, unrelated to your change: It seems like the logging in line 266
> should be guarded by something like "if (log_develop_is_enabled(Trace,
> gc, scavenge)" like the logging in line 288.
>
> Carsten
>
> On Thu, Sep 29, 2016 at 8:00 AM, Hiroshi H Horii <HORII at jp.ibm.com
> <mailto:HORII at jp.ibm.com>> wrote:
>
> Hi all,
>
> Can I please request reviews for a change for 8154736 that improve
> copy_to_survivor performance of ppc64 and aarch64?
> If possible, I would like to include this change into jdk9.
>
> 8154736 includes two changes, cmpxchg and copy_to_suvivor, and the
> former
> was resolved as 8155949.
> Now, I would like to ask a review for the remaining, copy_to_suvivor
> change.
>
> webrev:
> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.01/
> <http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.01/>
> JIRA: https://bugs.openjdk.java.net/browse/JDK-8154736
> <https://bugs.openjdk.java.net/browse/JDK-8154736>
>
> I tested this change with SPECjbb2013. Also, I re-check that relaxed
> cmpxchg is available for changing forwarding pointers. However, because
> this change is sensitive, we need more reviews not only from
> compiler-dev,
> but also from gc-dev.
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
>
>
> From: David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> To: "Doerr, Martin" <martin.doerr at sap.com
> <mailto:martin.doerr at sap.com>>, Hiroshi H
> Horii/Japan/IBM at IBMJP
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
> <mailto:Tim_Ellison at uk.ibm.com>>,
> "ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>"
> <ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>>,
> "hotspot-gc-dev at openjdk.java.net
> <mailto:hotspot-gc-dev at openjdk.java.net>"
> <hotspot-gc-dev at openjdk.java.net
> <mailto:hotspot-gc-dev at openjdk.java.net>>,
> "hotspot-runtime-dev at openjdk.java.net
> <mailto:hotspot-runtime-dev at openjdk.java.net>"
> <hotspot-runtime-dev at openjdk.java.net
> <mailto:hotspot-runtime-dev at openjdk.java.net>>
> Date: 05/10/2016 19:31
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
>
>
>
> On 10/05/2016 7:41 PM, Doerr, Martin wrote:
> > Hi David,
> >
> > thank you very much for testing the other platforms.
> >
> > Here's an updated webrev:
> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
> <http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/>
>
> Thanks. Second test run on its way.
>
> David
> -----
>
> > Best regards,
> > Martin
> >
> > -----Original Message-----
> > From: hotspot-runtime-dev [
> mailto:hotspot-runtime-dev-bounces at openjdk.java.net
> <mailto:hotspot-runtime-dev-bounces at openjdk.java.net>] On Behalf Of
> David
> Holmes
> > Sent: Dienstag, 10. Mai 2016 11:11
> > To: Hiroshi H Horii <HORII at jp.ibm.com <mailto:HORII at jp.ibm.com>>
> > Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
> <mailto:Tim_Ellison at uk.ibm.com>>;
> ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>;
> hotspot-gc-dev at openjdk.java.net
> <mailto:hotspot-gc-dev at openjdk.java.net>;
> hotspot-runtime-dev at openjdk.java.net
> <mailto:hotspot-runtime-dev at openjdk.java.net>
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
> >
> > The fix seems incomplete for solaris:
> >
> > make/Main.gmk:232: recipe for target 'hotspot' failed
> >
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> > line 124: Error: Too many arguments in call to
> > "_Atomic_cmpxchg_long(long, volatile long*, long)".
> >
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> > line 128: Error: Too many arguments in call to
> > "_Atomic_cmpxchg_long(long, volatile long*, long)".
> >
> > David
> >
> > On 10/05/2016 5:34 PM, David Holmes wrote:
> >> Hi Hiroshi,
> >>
> >> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
> >>> Hi David,
> >>>
> >>> Thank you for your comments.
> >>>
> >>> As Martin suggested me, I would like to separate this proposal to
> >>> - relaxing memory order of cmpxchg
> >>> - improvement of copy_to_survivior with relaxed cmpxchg
> >>> and discuss the former first.
> >>>
> >>> Martin thankfully created a new webrev that include a change of
> cmpxchg.
> >>>
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> <http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/>
> >>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
> >>> darwinintel64.
> >>> (Please tell me if I need to send a new mail for this PFR)
> >>
> >> Please do as it will be simpler to track that way.
> >>
> >>>> What I would prefer to see is an additional memory_order value
> (such
> as
> >>>> memory_order_ignored) which is the default for all methods declared
> to
> >>>> take a memory_order parameter.
> >>>
> >>> We added simple enum to specify memory order in atomic.hpp as
> follows.
> >>>
> >>> typedef enum cmpxchg_cmpxchg_memory_order {
> >>> memory_order_relaxed,
> >>> memory_order_conservative
> >>> } cmpxchg_memory_order;
> >>>
> >>> All of cmpxchg functions have an argument of cmpxchg_memory_order
> >>> with a default value memory_order_conservative that uses the same
> >>> semantics with the existing cmpxchg and requires no change for the
> >>> existing
> >>> callers. If you think "memory_order_ignored" is better than
> >>> "memory_order_conservative", I will be happy to modify this change.
> >>> (I just thought, "ignored" may resemble "relaxed" and may make
> >>> people who are familiar with C++11's memory semantics confused.
> >>> I would like to know thoughts of native speakers.)
> >>
> >> That is fine by me. I don't think "ignored" would be confused with
> >> "relaxed", but "conservative" is fine.
> >>
> >> I will run the patch through our internal build system while you
> prepare
> >> the updated RFR. My only concern is "unused argument" warnings
> from the
> >> compiler. :)
> >>
> >> We are quickly running into a hard deadline with Feature Complete
> >> however - possibly less than 24 hours - for hotspot changes. If this
> >> doesn't get in in time I will see if I can shepherd it through the
> >> approval process.
> >>
> >> Thanks,
> >> David
> >>
> >>
> >>> Regards,
> >>> Hiroshi
> >>> -----------------------
> >>> Hiroshi Horii, Ph.D.
> >>> IBM Research - Tokyo
> >>>
> >>>
> >>> David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote on 05/04/2016 14:55:29:
> >>>
> >>>> From: David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
> >>>> Cc: hotspot-gc-dev at openjdk.java.net
> <mailto:hotspot-gc-dev at openjdk.java.net>, hotspot-runtime-
> >>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>,
> ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>, Tim Ellison
> >>>> <Tim_Ellison at uk.ibm.com <mailto:Tim_Ellison at uk.ibm.com>>,
> Volker Simonis <volker.simonis at gmail.com
> <mailto:volker.simonis at gmail.com>>,
> >>>> "Doerr, Martin" <martin.doerr at sap.com
> <mailto:martin.doerr at sap.com>>, "Lindenmaier, Goetz"
> >>>> <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>
> >>>> Date: 05/04/2016 14:57
> >>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >>>> copy_to_survivor for ppc64
> >>>>
> >>>> Hi Hiroshi,
> >>>>
> >>>> Sorry for the delay on getting back to this.
> >>>>
> >>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> Thank you for your comments and questions.
> >>>>>
> >>>>>> 1. Are the current cmpxchg semantics exactly the same as
> >>>>>> memory_order_seq_cst?
> >>>>>
> >>>>> This is very good question..
> >>>>>
> >>>>> I guess, cmpxchg needs a more conservative constraint for memory
> >>> ordering
> >>>>> than C++11, to add sync after a compare-and-exchange operation.
> >>>>>
> >>>>> Could someone give comments or thoughts?
> >>>>
> >>>> I don't want to comment on the comparison with C++11. What I would
> >>>> prefer to see is an additional memory_order value (such as
> >>>> memory_order_ignored) which is the default for all methods declared
> to
> >>>> take a memory_order parameter. That way existing
> implementations are
> >>>> clearly ignoring the memory_order attribute and there is no
> potential
> >>>> for confusion as to whether the existing implementations equate to
> >>>> memory_order_seq_cst or not.
> >>>>
> >>>> That said, I'm not sure it makes sense to add the memory_order
> parameter
> >>>> to all methods with "cas" in their name, e.g.
> oopDesc::cas_set_mark,
> >>>> oopDesc::cas_forward_to, unless those methods can sensibly be
> called
> >>>> with any value for memory_order - which seems highly unlikely.
> Perhaps
> >>>> those methods should identify the weakest form of memory_order they
> >>>> support and that should be hard-wired into them?
> >>>>
> >>>> Thanks,
> >>>> David
> >>>>
> >>>>> memory_order_seq_cst is defined as
> >>>>> "Any operation with this memory order is both an acquire
> >>> operation and
> >>>>> a release operation, plus a single total order exists in
> which
> >>>> all
> >>>>> threads
> >>>>> observe all modifications (see below) in the same order."
> >>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order
> <http://en.cppreference.com/w/cpp/atomic/memory_order>)
> >>>>>
> >>>>> In my environment, g++ and xlc generate following assemblies on
> >>>> ppc64le.
> >>>>> (interestingly, they generates the same assemblies for any
> >>>> memory_order)
> >>>>>
> >>>>> g++ (4.9.2)
> >>>>> 100008a4: ac 04 00 7c sync
> >>>>> 100008a8: 28 50 20 7d lwarx r9,0,r10
> >>>>> 100008ac: 00 18 09 7c cmpw r9,r3
> >>>>> 100008b0: 0c 00 c2 40 bne- 100008bc
> >>>>> 100008b4: 2d 51 80 7c stwcx. r4,0,r10
> >>>>> 100008b8: f0 ff c2 40 bne- 100008a8
> >>>>> 100008bc: 2c 01 00 4c isync
> >>>>>
> >>>>> xlc (13.1.3)
> >>>>> 10000888: ac 04 00 7c sync
> >>>>> 1000088c: 28 28 c0 7c lwarx r6,0,r5
> >>>>> 10000890: 40 00 26 7c cmpld r6,r0
> >>>>> 10000894: 0c 00 82 40 bne 100008a0
> >>>>> 10000898: 2d 29 80 7c stwcx. r4,0,r5
> >>>>> 1000089c: f0 ff e2 40 bne+ 1000088c
> >>>>> 100008a0: 2c 01 00 4c isync
> >>>>>
> >>>>> On the other hand, the current OpenJDK generates following
> assemblies.
> >>>>>
> >>>>> 508: ac 04 00 7c sync
> >>>>> 50c: 00 00 5c e9 ld r10,0(r28)
> >>>>> 510: 00 50 3b 7c cmpd r27,r10
> >>>>> 514: 1c 00 c2 40 bne- 530
> >>>>> 518: a8 40 5c 7d ldarx r10,r28,r8
> >>>>> 51c: 00 50 3b 7c cmpd r27,r10
> >>>>> 520: 10 00 c2 40 bne- 530
> >>>>> 524: ad 41 3c 7d stdcx. r9,r28,r8
> >>>>> 528: f0 ff c2 40 bne- 518
> >>>>> 52c: ac 04 00 7c sync
> >>>>> 530: 00 50 bb 7f ...
> >>>>>
> >>>>> Though we can ignore 50c-514 (because they are a duplicated guard
> >>>>> condition),
> >>>>> the last sync instruction (52c) makes cmpxchg more strict than
> >>>>> memory_order_seq_cst.
> >>>>>
> >>>>> In some cases, the last sync is necessary when this thread must be
> >>>> able
> >>>>> to read
> >>>>> all of the changes in the other threads while executing from
> 508 to
> >>>> 530
> >>>>> (that processes compare-and-exchange).
> >>>>>
> >>>>>> 2. Has there been a discussion already, establishing that the
> >>>> modified
> >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
> >>>>>> postulating that and based on what evidence?
> >>>>>
> >>>>> Volker and his colleagues have investigated the current GC codes
> >>>>> according to this.
> >>>>>
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016->
> >>>> April/019079.html
> >>>>> However, I believe, we need comments of other GC experts to change
> >>>>> the shared codes.
> >>>>>
> >>>>> Regards,
> >>>>> Hiroshi
> >>>>> -----------------------
> >>>>> Hiroshi Horii, Ph.D.
> >>>>> IBM Research - Tokyo
> >>>>>
> >>>>>
> >>>>> David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote on 04/22/2016 21:57:07:
> >>>>>
> >>>>>> From: David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> >>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> >>>>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>,
> hotspot-gc-dev at openjdk.java.net <mailto:hotspot-gc-dev at openjdk.java.net>
> >>>>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
> <mailto:Tim_Ellison at uk.ibm.com>>,
> >>>>> ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>
> >>>>>> Date: 04/22/2016 21:58
> >>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >>>>>> copy_to_survivor for ppc64
> >>>>>>
> >>>>>> Hi Hiroshi,
> >>>>>>
> >>>>>> Two initial questions:
> >>>>>>
> >>>>>> 1. Are the current cmpxchg semantics exactly the same as
> >>>>>> memory_order_seq_cst?
> >>>>>>
> >>>>>> 2. Has there been a discussion already, establishing that the
> >>>> modified
> >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
> >>>>>> postulating that and based on what evidence?
> >>>>>>
> >>>>>> Missing memory barriers have caused very difficult to track down
> >>> bugs in
> >>>>>> the past - very rare race conditions. So any relaxation here has
> >>>> to be
> >>>>>> done with extreme confidence.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>>
> >>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> >>>>>>> Dear all:
> >>>>>>>
> >>>>>>> Can I please request reviews for the following change?
> >>>>>>>
> >>>>>>> Code change:
> >>>>>>>
> >>>
> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> <http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/>
> >>>>>>> (I initially created and Martin enhanced so much)
> >>>>>>>
> >>>>>>> This change follows the discussion started from this mail.
> >>>>>>>
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016->
> >>>>>> April/018960.html
> >>>>>>>
> >>>>>>> Description:
> >>>>>>> This change provides relaxed compare-and-exchange by introducing
> >>>>>>> similar semantics of C++ atomic memory operators, enum
> >>>> memory_order.
> >>>>>>> As described in atomic_linux_ppc.inline.hpp, the current
> >>>>> implementation of
> >>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful
> for
> >>>>>>> general purposes because twice calls of sync before and after
> >>>>> cmpxchg will
> >>>>>>> provide strict consistency. However, they sometimes cause
> overheads
> >>>>>>> because
> >>>>>>> sync instructions are very expensive in the current POWER chip
> >>> design.
> >>>>>>> In addition, for the other platforms, such as aarch64, this
> strict
> >>>>>>> semantics
> >>>>>>> may cause some overheads (according to the Andrew's mail).
> >>>>>>>
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016->
> >>>>>> April/019073.html
> >>>>>>>
> >>>>>>> With this change, callers can explicitly specify constraints of
> >>> memory
> >>>>>>> ordering
> >>>>>>> for cmpxchg with an additional parameter, memory_order order.
> >>>>>>>
> >>>>>>> typedef enum memory_order {
> >>>>>>> memory_order_relaxed,
> >>>>>>> memory_order_consume,
> >>>>>>> memory_order_acquire,
> >>>>>>> memory_order_release,
> >>>>>>> memory_order_acq_rel,
> >>>>>>> memory_order_seq_cst
> >>>>>>> } memory_order;
> >>>>>>>
> >>>>>>> Because the default value of the parameter is
> memory_order_seq_cst,
> >>>>>>> existing codes can use the same semantics of cmpxchg without any
> >>>>>>> modification. The relaxed cmpxchg is implemented only on ppc
> >>>>>>> in this changeset. Therefore, the behavior on the other
> platforms
> >>> will
> >>>>>>> not be changed with this changeset.
> >>>>>>>
> >>>>>>> In addition, with the new parameter of cmpxchg, this change
> >>>> improves
> >>>>>>> performance of copy_to_survivor in the parallel GC.
> >>>>>>> copy_to_survivor changes forward pointers by using cmpxchg. This
> >>>>>>> operation doesn't require any sync instructions. A pointer is
> >>> changed
> >>>>>>> at most once in a GC and when cmpxchg fails, the latest
> pointer is
> >>>>>>> available for the caller. cas_set_mark and cas_forward_to are
> >>> extended
> >>>>>>> with an additional memory_order parameter as cmpxchg and
> >>>>> copy_to_survivor
> >>>>>>> uses memory_order_relaxed to modify the forward pointers.
> >>>>>>>
> >>>>>>> Summary of source code changes:
> >>>>>>>
> >>>>>>> * src/share/vm/runtime/atomic.hpp
> >>>>>>> - Defines enum memory_order and adds a parameter to
> cmpxchg.
> >>>>>>>
> >>>>>>> * src/share/vm/runtime/atomic.cpp
> >>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >>>>>>> - Added a parameter for each cmpxchg function to follow
> >>>>>>> the change of atomic.hpp. Their implementations are not
> >>>>> changed.
> >>>>>>>
> >>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >>>>>>> - Added a parameter for each cmpxchg function to follow
> >>>>>>> the change of atomic.hpp. In addition, implementations
> >>>>>>> are changed corresponding to the specified
> memory_order.
> >>>>>>>
> >>>>>>> * src/share/vm/oops/oop.hpp
> >>>>>>> * src/share/vm/oops/oop.inline.hpp
> >>>>>>> - Add a memory_order parameter to use relaxed cmpxchg in
> >>>>>>> cas_set_mark and cas_forward_to.
> >>>>>>>
> >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp
> >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >>>>>>>
> >>>>>>> Martin tested this changeset on linuxx86_64, linuxppc64le and
> >>>>>>> darwinintel64.
> >>>>>>> Though more time is needed to test on the other platform, we
> would
> >>>>> like to
> >>>>>>> ask
> >>>>>>> reviews and start discussion on this changeset.
> >>>>>>> I also tested this changeset with SPECjbb2013 and confirmed that
> gc
> >>>>> pause
> >>>>>>> time
> >>>>>>> is reduced.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Hiroshi
> >>>>>>> -----------------------
> >>>>>>> Hiroshi Horii, Ph.D.
> >>>>>>> IBM Research - Tokyo
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
>
>
>
>
>
>
More information about the hotspot-gc-dev
mailing list