<font size=2 face="sans-serif">Hi David,</font><br><br><font size=2 face="sans-serif">Thank you for your comments.</font><br><br><font size=2 face="sans-serif">As Martin suggested me, I would like
to separate this proposal to</font><br><font size=2 face="sans-serif"> - relaxing memory order of cmpxchg</font><br><font size=2 face="sans-serif"> - improvement of copy_to_survivior
with relaxed cmpxchg</font><br><font size=2 face="sans-serif">and discuss the former first.</font><br><br><font size=2 face="sans-serif">Martin thankfully created a new webrev
that include a change of cmpxchg.</font><br><a href=http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/><font size=2 color=blue face="sans-serif">http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/</font></a><br><font size=2 face="sans-serif">He has already tested it with AIX, linuxx86_64,
linuxppc64le and darwinintel64.</font><br><font size=2 face="sans-serif">(Please tell me if I need to send a
new mail for this PFR)</font><br><br><font size=2 face="sans-serif">> What I would prefer to see is an
additional memory_order value (such as </font><br><font size=2 face="sans-serif">> memory_order_ignored) which is
the default for all methods declared to </font><br><font size=2 face="sans-serif">> take a memory_order parameter.
</font><br><br><font size=2 face="sans-serif">We added simple enum to specify memory
order in atomic.hpp as follows.</font><br><br><font size=2 face="sans-serif">typedef enum cmpxchg_cmpxchg_memory_order
{</font><br><font size=2 face="sans-serif"> memory_order_relaxed,</font><br><font size=2 face="sans-serif"> memory_order_conservative</font><br><font size=2 face="sans-serif">} cmpxchg_memory_order;</font><br><br><font size=2 face="sans-serif">All of cmpxchg functions have an argument
of cmpxchg_memory_order</font><br><font size=2 face="sans-serif">with a default value memory_order_conservative
that uses the same </font><br><font size=2 face="sans-serif">semantics with the existing cmpxchg
and requires no change for the existing</font><br><font size=2 face="sans-serif">callers. If you think "memory_order_ignored"
is better than </font><br><font size=2 face="sans-serif">"memory_order_conservative",
I will be happy to modify this change. </font><br><font size=2 face="sans-serif">(I just thought, "ignored"
may resemble "relaxed" and may make </font><br><font size=2 face="sans-serif">people who are familiar with C++11's
memory semantics confused.</font><br><font size=2 face="sans-serif">I would like to know thoughts of native
speakers.)</font><br><br><font size=2 face="sans-serif">Regards,<br>Hiroshi<br>-----------------------<br>Hiroshi Horii, Ph.D.<br>IBM Research - Tokyo<br></font><br><br><tt><font size=2>David Holmes <david.holmes@oracle.com> wrote
on 05/04/2016 14:55:29:<br><br>> From: David Holmes <david.holmes@oracle.com></font></tt><br><tt><font size=2>> To: Hiroshi H Horii/Japan/IBM@IBMJP</font></tt><br><tt><font size=2>> Cc: hotspot-gc-dev@openjdk.java.net, hotspot-runtime-<br>> dev@openjdk.java.net, ppc-aix-port-dev@openjdk.java.net, Tim Ellison<br>> <Tim_Ellison@uk.ibm.com>, Volker Simonis <volker.simonis@gmail.com>,<br>> "Doerr, Martin" <martin.doerr@sap.com>, "Lindenmaier,
Goetz" <br>> <goetz.lindenmaier@sap.com></font></tt><br><tt><font size=2>> Date: 05/04/2016 14:57</font></tt><br><tt><font size=2>> Subject: Re: RFR(M): 8154736: enhancement of
cmpxchg and <br>> copy_to_survivor for ppc64</font></tt><br><tt><font size=2>> <br>> Hi Hiroshi,<br>> <br>> Sorry for the delay on getting back to this.<br>> <br>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:<br>> > Hi David,<br>> ><br>> > Thank you for your comments and questions.<br>> ><br>> >> 1. Are the current cmpxchg semantics exactly the same as<br>> >> memory_order_seq_cst?<br>> ><br>> > This is very good question..<br>> ><br>> > I guess, cmpxchg needs a more conservative constraint for memory
ordering<br>> > than C++11, to add sync after a compare-and-exchange operation.<br>> ><br>> > Could someone give comments or thoughts?<br>> <br>> I don't want to comment on the comparison with C++11. What I would
<br>> prefer to see is an additional memory_order value (such as <br>> memory_order_ignored) which is the default for all methods declared
to <br>> take a memory_order parameter. That way existing implementations are
<br>> clearly ignoring the memory_order attribute and there is no potential
<br>> for confusion as to whether the existing implementations equate to
<br>> memory_order_seq_cst or not.<br>> <br>> That said, I'm not sure it makes sense to add the memory_order parameter
<br>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
<br>> oopDesc::cas_forward_to, unless those methods can sensibly be called
<br>> with any value for memory_order - which seems highly unlikely. Perhaps
<br>> those methods should identify the weakest form of memory_order they
<br>> support and that should be hard-wired into them?<br>> <br>> Thanks,<br>> David<br>> <br>> > memory_order_seq_cst is defined as<br>> > "Any operation with this memory order is both
an acquire operation and<br>> > a release operation, plus a single total
order exists in which all<br>> > threads<br>> > observe all modifications (see below) in
the same order."<br>> > (</font></tt><a href=http://en.cppreference.com/w/cpp/atomic/memory_order><tt><font size=2>http://en.cppreference.com/w/cpp/atomic/memory_order</font></tt></a><tt><font size=2>)<br>> ><br>> > In my environment, g++ and xlc generate following assemblies
on ppc64le.<br>> > (interestingly, they generates the same assemblies for any memory_order)<br>> ><br>> > g++ (4.9.2)<br>> > 100008a4: ac 04 00 7c sync<br>> > 100008a8: 28 50 20 7d lwarx
r9,0,r10<br>> > 100008ac: 00 18 09 7c cmpw
r9,r3<br>> > 100008b0: 0c 00 c2 40 bne-
100008bc<br>> > 100008b4: 2d 51 80 7c stwcx.
r4,0,r10<br>> > 100008b8: f0 ff c2 40 bne-
100008a8<br>> > 100008bc: 2c 01 00 4c isync<br>> ><br>> > xlc (13.1.3)<br>> > 10000888: ac 04 00 7c sync<br>> > 1000088c: 28 28 c0 7c lwarx
r6,0,r5<br>> > 10000890: 40 00 26 7c cmpld
r6,r0<br>> > 10000894: 0c 00 82 40 bne
100008a0<br>> > 10000898: 2d 29 80 7c stwcx.
r4,0,r5<br>> > 1000089c: f0 ff e2 40 bne+
1000088c<br>> > 100008a0: 2c 01 00 4c isync<br>> ><br>> > On the other hand, the current OpenJDK generates following assemblies.<br>> ><br>> > 508: ac 04 00 7c sync<br>> > 50c: 00 00 5c e9 ld
r10,0(r28)<br>> > 510: 00 50 3b 7c cmpd
r27,r10<br>> > 514: 1c 00 c2 40 bne-
530<br>> > 518: a8 40 5c 7d ldarx
r10,r28,r8<br>> > 51c: 00 50 3b 7c cmpd
r27,r10<br>> > 520: 10 00 c2 40 bne-
530<br>> > 524: ad 41 3c 7d stdcx. r9,r28,r8<br>> > 528: f0 ff c2 40 bne-
518<br>> > 52c: ac 04 00 7c sync<br>> > 530: 00 50 bb 7f ...<br>> ><br>> > Though we can ignore 50c-514 (because they are a duplicated guard<br>> > condition),<br>> > the last sync instruction (52c) makes cmpxchg more strict than<br>> > memory_order_seq_cst.<br>> ><br>> > In some cases, the last sync is necessary when this thread must
be able<br>> > to read<br>> > all of the changes in the other threads while executing from
508 to 530<br>> > (that processes compare-and-exchange).<br>> ><br>> >> 2. Has there been a discussion already, establishing that
the modified<br>> >> GC code can indeed use memory_order_relaxed? Otherwise who
is<br>> >> postulating that and based on what evidence?<br>> ><br>> > Volker and his colleagues have investigated the current GC codes<br>> > according to this.<br>> > </font></tt><a href="http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-"><tt><font size=2>http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-</font></tt></a><tt><font size=2><br>> April/019079.html<br>> > However, I believe, we need comments of other GC experts to change<br>> > the shared codes.<br>> ><br>> > Regards,<br>> > Hiroshi<br>> > -----------------------<br>> > Hiroshi Horii, Ph.D.<br>> > IBM Research - Tokyo<br>> ><br>> ><br>> > David Holmes <david.holmes@oracle.com> wrote on 04/22/2016
21:57:07:<br>> ><br>> >> From: David Holmes <david.holmes@oracle.com><br>> >> To: Hiroshi H Horii/Japan/IBM@IBMJP, hotspot-runtime-<br>> >> dev@openjdk.java.net, hotspot-gc-dev@openjdk.java.net<br>> >> Cc: Tim Ellison <Tim_Ellison@uk.ibm.com>,<br>> > ppc-aix-port-dev@openjdk.java.net<br>> >> Date: 04/22/2016 21:58<br>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and<br>> >> copy_to_survivor for ppc64<br>> >><br>> >> Hi Hiroshi,<br>> >><br>> >> Two initial questions:<br>> >><br>> >> 1. Are the current cmpxchg semantics exactly the same as<br>> >> memory_order_seq_cst?<br>> >><br>> >> 2. Has there been a discussion already, establishing that
the modified<br>> >> GC code can indeed use memory_order_relaxed? Otherwise who
is<br>> >> postulating that and based on what evidence?<br>> >><br>> >> Missing memory barriers have caused very difficult to track
down bugs in<br>> >> the past - very rare race conditions. So any relaxation here
has to be<br>> >> done with extreme confidence.<br>> >><br>> >> Thanks,<br>> >> David<br>> >><br>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:<br>> >> > Dear all:<br>> >> ><br>> >> > Can I please request reviews for the following change?<br>> >> ><br>> >> > Code change:<br>> >> > </font></tt><a href=http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/><tt><font size=2>http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/</font></tt></a><tt><font size=2><br>> >> > (I initially created and Martin enhanced so much)<br>> >> ><br>> >> > This change follows the discussion started from this
mail.<br>> >> > </font></tt><a href="http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-"><tt><font size=2>http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-</font></tt></a><tt><font size=2><br>> >> April/018960.html<br>> >> ><br>> >> > Description:<br>> >> > This change provides relaxed compare-and-exchange by
introducing<br>> >> > similar semantics of C++ atomic memory operators, enum
memory_order.<br>> >> > As described in atomic_linux_ppc.inline.hpp, the current<br>> > implementation of<br>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation
is useful for<br>> >> > general purposes because twice calls of sync before
and after<br>> > cmpxchg will<br>> >> > provide strict consistency. However, they sometimes
cause overheads<br>> >> > because<br>> >> > sync instructions are very expensive in the current
POWER chip design.<br>> >> > In addition, for the other platforms, such as aarch64,
this strict<br>> >> > semantics<br>> >> > may cause some overheads (according to the Andrew's
mail).<br>> >> > </font></tt><a href="http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-"><tt><font size=2>http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-</font></tt></a><tt><font size=2><br>> >> April/019073.html<br>> >> ><br>> >> > With this change, callers can explicitly specify constraints
of memory<br>> >> > ordering<br>> >> > for cmpxchg with an additional parameter, memory_order
order.<br>> >> ><br>> >> > typedef enum memory_order {<br>> >> > memory_order_relaxed,<br>> >> > memory_order_consume,<br>> >> > memory_order_acquire,<br>> >> > memory_order_release,<br>> >> > memory_order_acq_rel,<br>> >> > memory_order_seq_cst<br>> >> > } memory_order;<br>> >> ><br>> >> > Because the default value of the parameter is memory_order_seq_cst,<br>> >> > existing codes can use the same semantics of cmpxchg
without any<br>> >> > modification. The relaxed cmpxchg is implemented only
on ppc<br>> >> > in this changeset. Therefore, the behavior on the other
platforms will<br>> >> > not be changed with this changeset.<br>> >> ><br>> >> > In addition, with the new parameter of cmpxchg, this
change improves<br>> >> > performance of copy_to_survivor in the parallel GC.<br>> >> > copy_to_survivor changes forward pointers by using cmpxchg.
This<br>> >> > operation doesn't require any sync instructions. A
pointer is changed<br>> >> > at most once in a GC and when cmpxchg fails, the latest
pointer is<br>> >> > available for the caller. cas_set_mark and cas_forward_to
are extended<br>> >> > with an additional memory_order parameter as cmpxchg
and<br>> > copy_to_survivor<br>> >> > uses memory_order_relaxed to modify the forward pointers.<br>> >> ><br>> >> > Summary of source code changes:<br>> >> ><br>> >> > * src/share/vm/runtime/atomic.hpp<br>> >> > - Defines enum memory_order and
adds a parameter to cmpxchg.<br>> >> ><br>> >> > * src/share/vm/runtime/atomic.cpp<br>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp<br>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp<br>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp<br>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp<br>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp<br>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp<br>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp<br>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp<br>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp<br>> >> > - Added a parameter for each cmpxchg
function to follow<br>> >> > the change of atomic.hpp.
Their implementations are not<br>> > changed.<br>> >> ><br>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp<br>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp<br>> >> > - Added a parameter for each cmpxchg
function to follow<br>> >> > the change of atomic.hpp.
In addition, implementations<br>> >> > are changed corresponding
to the specified memory_order.<br>> >> ><br>> >> > * src/share/vm/oops/oop.hpp<br>> >> > * src/share/vm/oops/oop.inline.hpp<br>> >> > - Add a memory_order parameter
to use relaxed cmpxchg in<br>> >> > cas_set_mark and cas_forward_to.<br>> >> ><br>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp<br>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp<br>> >> ><br>> >> > Martin tested this changeset on linuxx86_64, linuxppc64le
and<br>> >> > darwinintel64.<br>> >> > Though more time is needed to test on the other platform,
we would<br>> > like to<br>> >> > ask<br>> >> > reviews and start discussion on this changeset.<br>> >> > I also tested this changeset with SPECjbb2013 and confirmed
that gc<br>> > pause<br>> >> > time<br>> >> > is reduced.<br>> >> ><br>> >> > Regards,<br>> >> > Hiroshi<br>> >> > -----------------------<br>> >> > Hiroshi Horii, Ph.D.<br>> >> > IBM Research - Tokyo<br>> >> ><br>> >> ><br>> >><br>> ><br>> <br></font></tt><BR>