RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

David Holmes david.holmes at oracle.com
Tue May 10 09:11:20 UTC 2016


The fix seems incomplete for solaris:

make/Main.gmk:232: recipe for target 'hotspot' failed
"/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", 
line 124: Error: Too many arguments in call to 
"_Atomic_cmpxchg_long(long, volatile long*, long)".
"/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", 
line 128: Error: Too many arguments in call to 
"_Atomic_cmpxchg_long(long, volatile long*, long)".

David

On 10/05/2016 5:34 PM, David Holmes wrote:
> Hi Hiroshi,
>
> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
>> Hi David,
>>
>> Thank you for your comments.
>>
>> As Martin suggested me, I would like to separate this proposal to
>>   - relaxing memory order of cmpxchg
>>   - improvement of copy_to_survivior with relaxed cmpxchg
>> and discuss the former first.
>>
>> Martin thankfully created a new webrev that include a change of cmpxchg.
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
>> darwinintel64.
>> (Please tell me if I need to send a new mail for this PFR)
>
> Please do as it will be simpler to track that way.
>
>>> What I would prefer to see is an additional memory_order value (such as
>>> memory_order_ignored) which is the default for all methods declared to
>>> take a memory_order parameter.
>>
>> We added simple enum to specify memory order in atomic.hpp as follows.
>>
>> typedef enum cmpxchg_cmpxchg_memory_order {
>>   memory_order_relaxed,
>>   memory_order_conservative
>> } cmpxchg_memory_order;
>>
>> All of cmpxchg functions have an argument of cmpxchg_memory_order
>> with a default value memory_order_conservative that uses the same
>> semantics with the existing cmpxchg and requires no change for the
>> existing
>> callers. If you think "memory_order_ignored" is better than
>> "memory_order_conservative", I will be happy to modify this change.
>> (I just thought, "ignored" may resemble "relaxed" and may make
>> people who are familiar with C++11's memory semantics confused.
>> I would like to know thoughts of native speakers.)
>
> That is fine by me. I don't think "ignored" would be confused with
> "relaxed", but "conservative" is fine.
>
> I will run the patch through our internal build system while you prepare
> the updated RFR. My only concern is "unused argument" warnings from the
> compiler. :)
>
> We are quickly running into a hard deadline with Feature Complete
> however - possibly less than 24 hours - for hotspot changes. If this
> doesn't get in in time I will see if I can shepherd it through the
> approval process.
>
> Thanks,
> David
>
>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>
>>
>> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
>>
>>> From: David Holmes <david.holmes at oracle.com>
>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
>>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
>>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
>>> <goetz.lindenmaier at sap.com>
>>> Date: 05/04/2016 14:57
>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>> copy_to_survivor for ppc64
>>>
>>> Hi Hiroshi,
>>>
>>> Sorry for the delay on getting back to this.
>>>
>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
>>> > Hi David,
>>> >
>>> > Thank you for your comments and questions.
>>> >
>>> >> 1. Are the current cmpxchg semantics exactly the same as
>>> >> memory_order_seq_cst?
>>> >
>>> > This is very good question..
>>> >
>>> > I guess, cmpxchg needs a more conservative constraint for memory
>> ordering
>>> > than C++11, to add sync after a compare-and-exchange operation.
>>> >
>>> > Could someone give comments or thoughts?
>>>
>>> I don't want to comment on the comparison with C++11. What I would
>>> prefer to see is an additional memory_order value (such as
>>> memory_order_ignored) which is the default for all methods declared to
>>> take a memory_order parameter. That way existing implementations are
>>> clearly ignoring the memory_order attribute and there is no potential
>>> for confusion as to whether the existing implementations equate to
>>> memory_order_seq_cst or not.
>>>
>>> That said, I'm not sure it makes sense to add the memory_order parameter
>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
>>> oopDesc::cas_forward_to, unless those methods can sensibly be called
>>> with any value for memory_order - which seems highly unlikely. Perhaps
>>> those methods should identify the weakest form of memory_order they
>>> support and that should be hard-wired into them?
>>>
>>> Thanks,
>>> David
>>>
>>> > memory_order_seq_cst is defined as
>>> >     "Any operation with this memory order is both an acquire
>> operation and
>>> >      a release operation, plus a single total order exists in which
>>> all
>>> > threads
>>> >      observe all modifications (see below) in the same order."
>>> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
>>> >
>>> > In my environment, g++ and xlc generate following assemblies on
>>> ppc64le.
>>> > (interestingly, they generates the same assemblies for any
>>> memory_order)
>>> >
>>> > g++ (4.9.2)
>>> >     100008a4:   ac 04 00 7c     sync
>>> >     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>>> >     100008ac:   00 18 09 7c     cmpw    r9,r3
>>> >     100008b0:   0c 00 c2 40     bne-    100008bc
>>> >     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>>> >     100008b8:   f0 ff c2 40     bne-    100008a8
>>> >     100008bc:   2c 01 00 4c     isync
>>> >
>>> > xlc (13.1.3)
>>> >     10000888:   ac 04 00 7c     sync
>>> >     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>>> >     10000890:   40 00 26 7c     cmpld   r6,r0
>>> >     10000894:   0c 00 82 40     bne     100008a0
>>> >     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>>> >     1000089c:   f0 ff e2 40     bne+    1000088c
>>> >     100008a0:   2c 01 00 4c     isync
>>> >
>>> > On the other hand, the current OpenJDK generates following assemblies.
>>> >
>>> >     508:   ac 04 00 7c     sync
>>> >     50c:   00 00 5c e9     ld      r10,0(r28)
>>> >     510:   00 50 3b 7c     cmpd    r27,r10
>>> >     514:   1c 00 c2 40     bne-    530
>>> >     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>>> >     51c:   00 50 3b 7c     cmpd    r27,r10
>>> >     520:   10 00 c2 40     bne-    530
>>> >     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>>> >     528:   f0 ff c2 40     bne-    518
>>> >     52c:   ac 04 00 7c     sync
>>> >     530:   00 50 bb 7f     ...
>>> >
>>> > Though we can ignore 50c-514 (because they are a duplicated guard
>>> > condition),
>>> > the last sync instruction (52c) makes cmpxchg more strict than
>>> > memory_order_seq_cst.
>>> >
>>> > In some cases, the last sync is necessary when this thread must be
>>> able
>>> > to read
>>> > all of the changes in the other threads while executing from 508 to
>>> 530
>>> > (that processes compare-and-exchange).
>>> >
>>> >> 2. Has there been a discussion already, establishing that the
>>> modified
>>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>>> >> postulating that and based on what evidence?
>>> >
>>> > Volker and his colleagues have investigated the current GC codes
>>> > according to this.
>>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> April/019079.html
>>> > However, I believe, we need comments of other GC experts to change
>>> > the shared codes.
>>> >
>>> > Regards,
>>> > Hiroshi
>>> > -----------------------
>>> > Hiroshi Horii, Ph.D.
>>> > IBM Research - Tokyo
>>> >
>>> >
>>> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>>> >
>>> >> From: David Holmes <david.holmes at oracle.com>
>>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>>> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>>> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
>>> > ppc-aix-port-dev at openjdk.java.net
>>> >> Date: 04/22/2016 21:58
>>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>> >> copy_to_survivor for ppc64
>>> >>
>>> >> Hi Hiroshi,
>>> >>
>>> >> Two initial questions:
>>> >>
>>> >> 1. Are the current cmpxchg semantics exactly the same as
>>> >> memory_order_seq_cst?
>>> >>
>>> >> 2. Has there been a discussion already, establishing that the
>>> modified
>>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>>> >> postulating that and based on what evidence?
>>> >>
>>> >> Missing memory barriers have caused very difficult to track down
>> bugs in
>>> >> the past - very rare race conditions. So any relaxation here has
>>> to be
>>> >> done with extreme confidence.
>>> >>
>>> >> Thanks,
>>> >> David
>>> >>
>>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>>> >> > Dear all:
>>> >> >
>>> >> > Can I please request reviews for the following change?
>>> >> >
>>> >> > Code change:
>>> >> >
>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>>> >> > (I initially created and Martin enhanced so much)
>>> >> >
>>> >> > This change follows the discussion started from this mail.
>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> >> April/018960.html
>>> >> >
>>> >> > Description:
>>> >> > This change provides relaxed compare-and-exchange by introducing
>>> >> > similar semantics of C++ atomic memory operators, enum
>>> memory_order.
>>> >> > As described in atomic_linux_ppc.inline.hpp, the current
>>> > implementation of
>>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>>> >> > general purposes because twice calls of sync before and after
>>> > cmpxchg will
>>> >> > provide strict consistency. However, they sometimes cause overheads
>>> >> > because
>>> >> > sync instructions are very expensive in the current POWER chip
>> design.
>>> >> > In addition, for the other platforms, such as aarch64, this strict
>>> >> > semantics
>>> >> > may cause some overheads (according to the Andrew's mail).
>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> >> April/019073.html
>>> >> >
>>> >> > With this change, callers can explicitly specify constraints of
>> memory
>>> >> > ordering
>>> >> > for cmpxchg with an additional parameter, memory_order order.
>>> >> >
>>> >> > typedef enum memory_order {
>>> >> >    memory_order_relaxed,
>>> >> >    memory_order_consume,
>>> >> >    memory_order_acquire,
>>> >> >    memory_order_release,
>>> >> >    memory_order_acq_rel,
>>> >> >    memory_order_seq_cst
>>> >> > } memory_order;
>>> >> >
>>> >> > Because the default value of the parameter is memory_order_seq_cst,
>>> >> > existing codes can use the same semantics of cmpxchg without any
>>> >> > modification. The relaxed cmpxchg is implemented only on ppc
>>> >> > in this changeset. Therefore, the behavior on the other platforms
>> will
>>> >> > not be changed with this changeset.
>>> >> >
>>> >> > In addition, with the new parameter of cmpxchg, this change
>>> improves
>>> >> > performance of copy_to_survivor in the parallel GC.
>>> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
>>> >> > operation doesn't require any sync instructions.  A pointer is
>> changed
>>> >> > at most once in a GC and when cmpxchg fails, the latest pointer is
>>> >> > available for the caller. cas_set_mark and cas_forward_to are
>> extended
>>> >> > with an additional memory_order parameter as cmpxchg and
>>> > copy_to_survivor
>>> >> > uses memory_order_relaxed to modify the forward pointers.
>>> >> >
>>> >> > Summary of source code changes:
>>> >> >
>>> >> > * src/share/vm/runtime/atomic.hpp
>>> >> >       - Defines enum memory_order and adds a parameter to cmpxchg.
>>> >> >
>>> >> > * src/share/vm/runtime/atomic.cpp
>>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>> >> >       - Added a parameter for each cmpxchg function to follow
>>> >> >          the change of atomic.hpp. Their implementations are not
>>> > changed.
>>> >> >
>>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>> >> >       - Added a parameter for each cmpxchg function to follow
>>> >> >          the change of atomic.hpp. In addition, implementations
>>> >> >          are changed corresponding to the specified memory_order.
>>> >> >
>>> >> > * src/share/vm/oops/oop.hpp
>>> >> > * src/share/vm/oops/oop.inline.hpp
>>> >> >       - Add a memory_order parameter to use relaxed cmpxchg in
>>> >> >          cas_set_mark and cas_forward_to.
>>> >> >
>>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
>>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>> >> >
>>> >> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
>>> >> > darwinintel64.
>>> >> > Though more time is needed to test on the other platform, we would
>>> > like to
>>> >> > ask
>>> >> > reviews and start discussion on this changeset.
>>> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc
>>> > pause
>>> >> > time
>>> >> > is reduced.
>>> >> >
>>> >> > Regards,
>>> >> > Hiroshi
>>> >> > -----------------------
>>> >> > Hiroshi Horii, Ph.D.
>>> >> > IBM Research - Tokyo
>>> >> >
>>> >> >
>>> >>
>>> >
>>>
>>


More information about the hotspot-runtime-dev mailing list