RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Thu Sep 29 14:47:29 UTC 2016

Dear Hiroshi,

In hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:266 the
log line reads data from the forwardee even when the CAS fails. I believe
those reads will be unsafe without barriers after the copy of the content
of the object.

hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:288 same
problem as in line 266

I would argue that the logging should only happen if the thread
successfully copied the object and CAS failures should be logged separately
without reading data from the forwardee.

BTW, unrelated to your change: It seems like the logging in line 266 should
be guarded by something like "if (log_develop_is_enabled(Trace, gc,
scavenge)" like the logging in line 288.

Carsten

On Thu, Sep 29, 2016 at 8:00 AM, Hiroshi H Horii <HORII at jp.ibm.com> wrote:

> Hi all,
>
> Can I please request reviews for a change for 8154736 that improve
> copy_to_survivor performance of ppc64 and aarch64?
> If possible, I would like to include this change into jdk9.
>
> 8154736 includes two changes, cmpxchg and copy_to_suvivor, and the former
> was resolved as 8155949.
> Now, I would like to ask a review for the remaining, copy_to_suvivor
> change.
>
> webrev:
> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.01/
> JIRA: https://bugs.openjdk.java.net/browse/JDK-8154736
>
> I tested this change with SPECjbb2013. Also, I re-check that relaxed
> cmpxchg is available for changing forwarding pointers. However, because
> this change is sensitive, we need more reviews not only from compiler-dev,
> but also from gc-dev.
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
>
>
> From:   David Holmes <david.holmes at oracle.com>
> To:     "Doerr, Martin" <martin.doerr at sap.com>, Hiroshi H
> Horii/Japan/IBM at IBMJP
> Cc:     Tim Ellison <Tim_Ellison at uk.ibm.com>,
> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
> "hotspot-runtime-dev at openjdk.java.net"
> <hotspot-runtime-dev at openjdk.java.net>
> Date:   05/10/2016 19:31
> Subject:        Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
>
>
>
> On 10/05/2016 7:41 PM, Doerr, Martin wrote:
> > Hi David,
> >
> > thank you very much for testing the other platforms.
> >
> > Here's an updated webrev:
> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
>
> Thanks. Second test run on its way.
>
> David
> -----
>
> > Best regards,
> > Martin
> >
> > -----Original Message-----
> > From: hotspot-runtime-dev [
> mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David
> Holmes
> > Sent: Dienstag, 10. Mai 2016 11:11
> > To: Hiroshi H Horii <HORII at jp.ibm.com>
> > Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>;
> ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net;
> hotspot-runtime-dev at openjdk.java.net
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> copy_to_survivor for ppc64
> >
> > The fix seems incomplete for solaris:
> >
> > make/Main.gmk:232: recipe for target 'hotspot' failed
> >
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/
> solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> > line 124: Error: Too many arguments in call to
> > "_Atomic_cmpxchg_long(long, volatile long*, long)".
> >
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/
> solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> > line 128: Error: Too many arguments in call to
> > "_Atomic_cmpxchg_long(long, volatile long*, long)".
> >
> > David
> >
> > On 10/05/2016 5:34 PM, David Holmes wrote:
> >> Hi Hiroshi,
> >>
> >> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
> >>> Hi David,
> >>>
> >>> Thank you for your comments.
> >>>
> >>> As Martin suggested me, I would like to separate this proposal to
> >>>   - relaxing memory order of cmpxchg
> >>>   - improvement of copy_to_survivior with relaxed cmpxchg
> >>> and discuss the former first.
> >>>
> >>> Martin thankfully created a new webrev that include a change of
> cmpxchg.
> >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> >>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
> >>> darwinintel64.
> >>> (Please tell me if I need to send a new mail for this PFR)
> >>
> >> Please do as it will be simpler to track that way.
> >>
> >>>> What I would prefer to see is an additional memory_order value (such
> as
> >>>> memory_order_ignored) which is the default for all methods declared
> to
> >>>> take a memory_order parameter.
> >>>
> >>> We added simple enum to specify memory order in atomic.hpp as follows.
> >>>
> >>> typedef enum cmpxchg_cmpxchg_memory_order {
> >>>   memory_order_relaxed,
> >>>   memory_order_conservative
> >>> } cmpxchg_memory_order;
> >>>
> >>> All of cmpxchg functions have an argument of cmpxchg_memory_order
> >>> with a default value memory_order_conservative that uses the same
> >>> semantics with the existing cmpxchg and requires no change for the
> >>> existing
> >>> callers. If you think "memory_order_ignored" is better than
> >>> "memory_order_conservative", I will be happy to modify this change.
> >>> (I just thought, "ignored" may resemble "relaxed" and may make
> >>> people who are familiar with C++11's memory semantics confused.
> >>> I would like to know thoughts of native speakers.)
> >>
> >> That is fine by me. I don't think "ignored" would be confused with
> >> "relaxed", but "conservative" is fine.
> >>
> >> I will run the patch through our internal build system while you
> prepare
> >> the updated RFR. My only concern is "unused argument" warnings from the
> >> compiler. :)
> >>
> >> We are quickly running into a hard deadline with Feature Complete
> >> however - possibly less than 24 hours - for hotspot changes. If this
> >> doesn't get in in time I will see if I can shepherd it through the
> >> approval process.
> >>
> >> Thanks,
> >> David
> >>
> >>
> >>> Regards,
> >>> Hiroshi
> >>> -----------------------
> >>> Hiroshi Horii, Ph.D.
> >>> IBM Research - Tokyo
> >>>
> >>>
> >>> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
> >>>
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
> >>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
> >>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
> >>>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
> >>>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
> >>>> <goetz.lindenmaier at sap.com>
> >>>> Date: 05/04/2016 14:57
> >>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >>>> copy_to_survivor for ppc64
> >>>>
> >>>> Hi Hiroshi,
> >>>>
> >>>> Sorry for the delay on getting back to this.
> >>>>
> >>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> Thank you for your comments and questions.
> >>>>>
> >>>>>> 1. Are the current cmpxchg semantics exactly the same as
> >>>>>> memory_order_seq_cst?
> >>>>>
> >>>>> This is very good question..
> >>>>>
> >>>>> I guess, cmpxchg needs a more conservative constraint for memory
> >>> ordering
> >>>>> than C++11, to add sync after a compare-and-exchange operation.
> >>>>>
> >>>>> Could someone give comments or thoughts?
> >>>>
> >>>> I don't want to comment on the comparison with C++11. What I would
> >>>> prefer to see is an additional memory_order value (such as
> >>>> memory_order_ignored) which is the default for all methods declared
> to
> >>>> take a memory_order parameter. That way existing implementations are
> >>>> clearly ignoring the memory_order attribute and there is no potential
> >>>> for confusion as to whether the existing implementations equate to
> >>>> memory_order_seq_cst or not.
> >>>>
> >>>> That said, I'm not sure it makes sense to add the memory_order
> parameter
> >>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
> >>>> oopDesc::cas_forward_to, unless those methods can sensibly be called
> >>>> with any value for memory_order - which seems highly unlikely.
> Perhaps
> >>>> those methods should identify the weakest form of memory_order they
> >>>> support and that should be hard-wired into them?
> >>>>
> >>>> Thanks,
> >>>> David
> >>>>
> >>>>> memory_order_seq_cst is defined as
> >>>>>     "Any operation with this memory order is both an acquire
> >>> operation and
> >>>>>      a release operation, plus a single total order exists in which
> >>>> all
> >>>>> threads
> >>>>>      observe all modifications (see below) in the same order."
> >>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order)
> >>>>>
> >>>>> In my environment, g++ and xlc generate following assemblies on
> >>>> ppc64le.
> >>>>> (interestingly, they generates the same assemblies for any
> >>>> memory_order)
> >>>>>
> >>>>> g++ (4.9.2)
> >>>>>     100008a4:   ac 04 00 7c     sync
> >>>>>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
> >>>>>     100008ac:   00 18 09 7c     cmpw    r9,r3
> >>>>>     100008b0:   0c 00 c2 40     bne-    100008bc
> >>>>>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
> >>>>>     100008b8:   f0 ff c2 40     bne-    100008a8
> >>>>>     100008bc:   2c 01 00 4c     isync
> >>>>>
> >>>>> xlc (13.1.3)
> >>>>>     10000888:   ac 04 00 7c     sync
> >>>>>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
> >>>>>     10000890:   40 00 26 7c     cmpld   r6,r0
> >>>>>     10000894:   0c 00 82 40     bne     100008a0
> >>>>>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
> >>>>>     1000089c:   f0 ff e2 40     bne+    1000088c
> >>>>>     100008a0:   2c 01 00 4c     isync
> >>>>>
> >>>>> On the other hand, the current OpenJDK generates following
> assemblies.
> >>>>>
> >>>>>     508:   ac 04 00 7c     sync
> >>>>>     50c:   00 00 5c e9     ld      r10,0(r28)
> >>>>>     510:   00 50 3b 7c     cmpd    r27,r10
> >>>>>     514:   1c 00 c2 40     bne-    530
> >>>>>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
> >>>>>     51c:   00 50 3b 7c     cmpd    r27,r10
> >>>>>     520:   10 00 c2 40     bne-    530
> >>>>>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
> >>>>>     528:   f0 ff c2 40     bne-    518
> >>>>>     52c:   ac 04 00 7c     sync
> >>>>>     530:   00 50 bb 7f     ...
> >>>>>
> >>>>> Though we can ignore 50c-514 (because they are a duplicated guard
> >>>>> condition),
> >>>>> the last sync instruction (52c) makes cmpxchg more strict than
> >>>>> memory_order_seq_cst.
> >>>>>
> >>>>> In some cases, the last sync is necessary when this thread must be
> >>>> able
> >>>>> to read
> >>>>> all of the changes in the other threads while executing from 508 to
> >>>> 530
> >>>>> (that processes compare-and-exchange).
> >>>>>
> >>>>>> 2. Has there been a discussion already, establishing that the
> >>>> modified
> >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
> >>>>>> postulating that and based on what evidence?
> >>>>>
> >>>>> Volker and his colleagues have investigated the current GC codes
> >>>>> according to this.
> >>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>> April/019079.html
> >>>>> However, I believe, we need comments of other GC experts to change
> >>>>> the shared codes.
> >>>>>
> >>>>> Regards,
> >>>>> Hiroshi
> >>>>> -----------------------
> >>>>> Hiroshi Horii, Ph.D.
> >>>>> IBM Research - Tokyo
> >>>>>
> >>>>>
> >>>>> David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
> >>>>>
> >>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> >>>>>> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
> >>>>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
> >>>>> ppc-aix-port-dev at openjdk.java.net
> >>>>>> Date: 04/22/2016 21:58
> >>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >>>>>> copy_to_survivor for ppc64
> >>>>>>
> >>>>>> Hi Hiroshi,
> >>>>>>
> >>>>>> Two initial questions:
> >>>>>>
> >>>>>> 1. Are the current cmpxchg semantics exactly the same as
> >>>>>> memory_order_seq_cst?
> >>>>>>
> >>>>>> 2. Has there been a discussion already, establishing that the
> >>>> modified
> >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
> >>>>>> postulating that and based on what evidence?
> >>>>>>
> >>>>>> Missing memory barriers have caused very difficult to track down
> >>> bugs in
> >>>>>> the past - very rare race conditions. So any relaxation here has
> >>>> to be
> >>>>>> done with extreme confidence.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>>
> >>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> >>>>>>> Dear all:
> >>>>>>>
> >>>>>>> Can I please request reviews for the following change?
> >>>>>>>
> >>>>>>> Code change:
> >>>>>>>
> >>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> >>>>>>> (I initially created and Martin enhanced so much)
> >>>>>>>
> >>>>>>> This change follows the discussion started from this mail.
> >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>>>> April/018960.html
> >>>>>>>
> >>>>>>> Description:
> >>>>>>> This change provides relaxed compare-and-exchange by introducing
> >>>>>>> similar semantics of C++ atomic memory operators, enum
> >>>> memory_order.
> >>>>>>> As described in atomic_linux_ppc.inline.hpp, the current
> >>>>> implementation of
> >>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful
> for
> >>>>>>> general purposes because twice calls of sync before and after
> >>>>> cmpxchg will
> >>>>>>> provide strict consistency. However, they sometimes cause
> overheads
> >>>>>>> because
> >>>>>>> sync instructions are very expensive in the current POWER chip
> >>> design.
> >>>>>>> In addition, for the other platforms, such as aarch64, this strict
> >>>>>>> semantics
> >>>>>>> may cause some overheads (according to the Andrew's mail).
> >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>>>> April/019073.html
> >>>>>>>
> >>>>>>> With this change, callers can explicitly specify constraints of
> >>> memory
> >>>>>>> ordering
> >>>>>>> for cmpxchg with an additional parameter, memory_order order.
> >>>>>>>
> >>>>>>> typedef enum memory_order {
> >>>>>>>    memory_order_relaxed,
> >>>>>>>    memory_order_consume,
> >>>>>>>    memory_order_acquire,
> >>>>>>>    memory_order_release,
> >>>>>>>    memory_order_acq_rel,
> >>>>>>>    memory_order_seq_cst
> >>>>>>> } memory_order;
> >>>>>>>
> >>>>>>> Because the default value of the parameter is
> memory_order_seq_cst,
> >>>>>>> existing codes can use the same semantics of cmpxchg without any
> >>>>>>> modification. The relaxed cmpxchg is implemented only on ppc
> >>>>>>> in this changeset. Therefore, the behavior on the other platforms
> >>> will
> >>>>>>> not be changed with this changeset.
> >>>>>>>
> >>>>>>> In addition, with the new parameter of cmpxchg, this change
> >>>> improves
> >>>>>>> performance of copy_to_survivor in the parallel GC.
> >>>>>>> copy_to_survivor changes forward pointers by using cmpxchg. This
> >>>>>>> operation doesn't require any sync instructions.  A pointer is
> >>> changed
> >>>>>>> at most once in a GC and when cmpxchg fails, the latest pointer is
> >>>>>>> available for the caller. cas_set_mark and cas_forward_to are
> >>> extended
> >>>>>>> with an additional memory_order parameter as cmpxchg and
> >>>>> copy_to_survivor
> >>>>>>> uses memory_order_relaxed to modify the forward pointers.
> >>>>>>>
> >>>>>>> Summary of source code changes:
> >>>>>>>
> >>>>>>> * src/share/vm/runtime/atomic.hpp
> >>>>>>>       - Defines enum memory_order and adds a parameter to cmpxchg.
> >>>>>>>
> >>>>>>> * src/share/vm/runtime/atomic.cpp
> >>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >>>>>>>       - Added a parameter for each cmpxchg function to follow
> >>>>>>>          the change of atomic.hpp. Their implementations are not
> >>>>> changed.
> >>>>>>>
> >>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >>>>>>>       - Added a parameter for each cmpxchg function to follow
> >>>>>>>          the change of atomic.hpp. In addition, implementations
> >>>>>>>          are changed corresponding to the specified memory_order.
> >>>>>>>
> >>>>>>> * src/share/vm/oops/oop.hpp
> >>>>>>> * src/share/vm/oops/oop.inline.hpp
> >>>>>>>       - Add a memory_order parameter to use relaxed cmpxchg in
> >>>>>>>          cas_set_mark and cas_forward_to.
> >>>>>>>
> >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp
> >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >>>>>>>
> >>>>>>> Martin tested this changeset  on linuxx86_64, linuxppc64le and
> >>>>>>> darwinintel64.
> >>>>>>> Though more time is needed to test on the other platform, we would
> >>>>> like to
> >>>>>>> ask
> >>>>>>> reviews and start discussion on this changeset.
> >>>>>>> I also tested this changeset with SPECjbb2013 and confirmed that
> gc
> >>>>> pause
> >>>>>>> time
> >>>>>>> is reduced.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Hiroshi
> >>>>>>> -----------------------
> >>>>>>> Hiroshi Horii, Ph.D.
> >>>>>>> IBM Research - Tokyo
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160929/4dd7f3a4/attachment.htm>