RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

David Holmes david.holmes at oracle.com
Tue May 10 10:30:36 UTC 2016


On 10/05/2016 7:41 PM, Doerr, Martin wrote:
> Hi David,
>
> thank you very much for testing the other platforms.
>
> Here's an updated webrev:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/

Thanks. Second test run on its way.

David
-----

> Best regards,
> Martin
>
> -----Original Message-----
> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes
> Sent: Dienstag, 10. Mai 2016 11:11
> To: Hiroshi H Horii <HORII at jp.ibm.com>
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
>
> The fix seems incomplete for solaris:
>
> make/Main.gmk:232: recipe for target 'hotspot' failed
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> line 124: Error: Too many arguments in call to
> "_Atomic_cmpxchg_long(long, volatile long*, long)".
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> line 128: Error: Too many arguments in call to
> "_Atomic_cmpxchg_long(long, volatile long*, long)".
>
> David
>
> On 10/05/2016 5:34 PM, David Holmes wrote:
>> Hi Hiroshi,
>>
>> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
>>> Hi David,
>>>
>>> Thank you for your comments.
>>>
>>> As Martin suggested me, I would like to separate this proposal to
>>>   - relaxing memory order of cmpxchg
>>>   - improvement of copy_to_survivior with relaxed cmpxchg
>>> and discuss the former first.
>>>
>>> Martin thankfully created a new webrev that include a change of cmpxchg.
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
>>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
>>> darwinintel64.
>>> (Please tell me if I need to send a new mail for this PFR)
>>
>> Please do as it will be simpler to track that way.
>>
>>>> What I would prefer to see is an additional memory_order value (such as
>>>> memory_order_ignored) which is the default for all methods declared to
>>>> take a memory_order parameter.
>>>
>>> We added simple enum to specify memory order in atomic.hpp as follows.
>>>
>>> typedef enum cmpxchg_cmpxchg_memory_order {
>>>   memory_order_relaxed,
>>>   memory_order_conservative
>>> } cmpxchg_memory_order;
>>>
>>> All of cmpxchg functions have an argument of cmpxchg_memory_order
>>> with a default value memory_order_conservative that uses the same
>>> semantics with the existing cmpxchg and requires no change for the
>>> existing
>>> callers. If you think "memory_order_ignored" is better than
>>> "memory_order_conservative", I will be happy to modify this change.
>>> (I just thought, "ignored" may resemble "relaxed" and may make
>>> people who are familiar with C++11's memory semantics confused.
>>> I would like to know thoughts of native speakers.)
>>
>> That is fine by me. I don't think "ignored" would be confused with
>> "relaxed", but "conservative" is fine.
>>
>> I will run the patch through our internal build system while you prepare
>> the updated RFR. My only concern is "unused argument" warnings from the
>> compiler. :)
>>
>> We are quickly running into a hard deadline with Feature Complete
>> however - possibly less than 24 hours - for hotspot changes. If this
>> doesn't get in in time I will see if I can shepherd it through the
>> approval process.
>>
>> Thanks,
>> David
>>
>>
>>> Regards,
>>> Hiroshi
>>> -----------------------
>>> Hiroshi Horii, Ph.D.
>>> IBM Research - Tokyo
>>>
>>>
>>> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
>>>
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
>>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
>>>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
>>>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
>>>> <goetz.lindenmaier at sap.com>
>>>> Date: 05/04/2016 14:57
>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>>> copy_to_survivor for ppc64
>>>>
>>>> Hi Hiroshi,
>>>>
>>>> Sorry for the delay on getting back to this.
>>>>
>>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
>>>>> Hi David,
>>>>>
>>>>> Thank you for your comments and questions.
>>>>>
>>>>>> 1. Are the current cmpxchg semantics exactly the same as
>>>>>> memory_order_seq_cst?
>>>>>
>>>>> This is very good question..
>>>>>
>>>>> I guess, cmpxchg needs a more conservative constraint for memory
>>> ordering
>>>>> than C++11, to add sync after a compare-and-exchange operation.
>>>>>
>>>>> Could someone give comments or thoughts?
>>>>
>>>> I don't want to comment on the comparison with C++11. What I would
>>>> prefer to see is an additional memory_order value (such as
>>>> memory_order_ignored) which is the default for all methods declared to
>>>> take a memory_order parameter. That way existing implementations are
>>>> clearly ignoring the memory_order attribute and there is no potential
>>>> for confusion as to whether the existing implementations equate to
>>>> memory_order_seq_cst or not.
>>>>
>>>> That said, I'm not sure it makes sense to add the memory_order parameter
>>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
>>>> oopDesc::cas_forward_to, unless those methods can sensibly be called
>>>> with any value for memory_order - which seems highly unlikely. Perhaps
>>>> those methods should identify the weakest form of memory_order they
>>>> support and that should be hard-wired into them?
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> memory_order_seq_cst is defined as
>>>>>     "Any operation with this memory order is both an acquire
>>> operation and
>>>>>      a release operation, plus a single total order exists in which
>>>> all
>>>>> threads
>>>>>      observe all modifications (see below) in the same order."
>>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order)
>>>>>
>>>>> In my environment, g++ and xlc generate following assemblies on
>>>> ppc64le.
>>>>> (interestingly, they generates the same assemblies for any
>>>> memory_order)
>>>>>
>>>>> g++ (4.9.2)
>>>>>     100008a4:   ac 04 00 7c     sync
>>>>>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>>>>>     100008ac:   00 18 09 7c     cmpw    r9,r3
>>>>>     100008b0:   0c 00 c2 40     bne-    100008bc
>>>>>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>>>>>     100008b8:   f0 ff c2 40     bne-    100008a8
>>>>>     100008bc:   2c 01 00 4c     isync
>>>>>
>>>>> xlc (13.1.3)
>>>>>     10000888:   ac 04 00 7c     sync
>>>>>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>>>>>     10000890:   40 00 26 7c     cmpld   r6,r0
>>>>>     10000894:   0c 00 82 40     bne     100008a0
>>>>>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>>>>>     1000089c:   f0 ff e2 40     bne+    1000088c
>>>>>     100008a0:   2c 01 00 4c     isync
>>>>>
>>>>> On the other hand, the current OpenJDK generates following assemblies.
>>>>>
>>>>>     508:   ac 04 00 7c     sync
>>>>>     50c:   00 00 5c e9     ld      r10,0(r28)
>>>>>     510:   00 50 3b 7c     cmpd    r27,r10
>>>>>     514:   1c 00 c2 40     bne-    530
>>>>>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>>>>>     51c:   00 50 3b 7c     cmpd    r27,r10
>>>>>     520:   10 00 c2 40     bne-    530
>>>>>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>>>>>     528:   f0 ff c2 40     bne-    518
>>>>>     52c:   ac 04 00 7c     sync
>>>>>     530:   00 50 bb 7f     ...
>>>>>
>>>>> Though we can ignore 50c-514 (because they are a duplicated guard
>>>>> condition),
>>>>> the last sync instruction (52c) makes cmpxchg more strict than
>>>>> memory_order_seq_cst.
>>>>>
>>>>> In some cases, the last sync is necessary when this thread must be
>>>> able
>>>>> to read
>>>>> all of the changes in the other threads while executing from 508 to
>>>> 530
>>>>> (that processes compare-and-exchange).
>>>>>
>>>>>> 2. Has there been a discussion already, establishing that the
>>>> modified
>>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
>>>>>> postulating that and based on what evidence?
>>>>>
>>>>> Volker and his colleagues have investigated the current GC codes
>>>>> according to this.
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>> April/019079.html
>>>>> However, I believe, we need comments of other GC experts to change
>>>>> the shared codes.
>>>>>
>>>>> Regards,
>>>>> Hiroshi
>>>>> -----------------------
>>>>> Hiroshi Horii, Ph.D.
>>>>> IBM Research - Tokyo
>>>>>
>>>>>
>>>>> David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>>>>>
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>>>>>> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>>>>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
>>>>> ppc-aix-port-dev at openjdk.java.net
>>>>>> Date: 04/22/2016 21:58
>>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>>>>> copy_to_survivor for ppc64
>>>>>>
>>>>>> Hi Hiroshi,
>>>>>>
>>>>>> Two initial questions:
>>>>>>
>>>>>> 1. Are the current cmpxchg semantics exactly the same as
>>>>>> memory_order_seq_cst?
>>>>>>
>>>>>> 2. Has there been a discussion already, establishing that the
>>>> modified
>>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
>>>>>> postulating that and based on what evidence?
>>>>>>
>>>>>> Missing memory barriers have caused very difficult to track down
>>> bugs in
>>>>>> the past - very rare race conditions. So any relaxation here has
>>>> to be
>>>>>> done with extreme confidence.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>>>>>>> Dear all:
>>>>>>>
>>>>>>> Can I please request reviews for the following change?
>>>>>>>
>>>>>>> Code change:
>>>>>>>
>>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>>>>>>> (I initially created and Martin enhanced so much)
>>>>>>>
>>>>>>> This change follows the discussion started from this mail.
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/018960.html
>>>>>>>
>>>>>>> Description:
>>>>>>> This change provides relaxed compare-and-exchange by introducing
>>>>>>> similar semantics of C++ atomic memory operators, enum
>>>> memory_order.
>>>>>>> As described in atomic_linux_ppc.inline.hpp, the current
>>>>> implementation of
>>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>>>>>>> general purposes because twice calls of sync before and after
>>>>> cmpxchg will
>>>>>>> provide strict consistency. However, they sometimes cause overheads
>>>>>>> because
>>>>>>> sync instructions are very expensive in the current POWER chip
>>> design.
>>>>>>> In addition, for the other platforms, such as aarch64, this strict
>>>>>>> semantics
>>>>>>> may cause some overheads (according to the Andrew's mail).
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/019073.html
>>>>>>>
>>>>>>> With this change, callers can explicitly specify constraints of
>>> memory
>>>>>>> ordering
>>>>>>> for cmpxchg with an additional parameter, memory_order order.
>>>>>>>
>>>>>>> typedef enum memory_order {
>>>>>>>    memory_order_relaxed,
>>>>>>>    memory_order_consume,
>>>>>>>    memory_order_acquire,
>>>>>>>    memory_order_release,
>>>>>>>    memory_order_acq_rel,
>>>>>>>    memory_order_seq_cst
>>>>>>> } memory_order;
>>>>>>>
>>>>>>> Because the default value of the parameter is memory_order_seq_cst,
>>>>>>> existing codes can use the same semantics of cmpxchg without any
>>>>>>> modification. The relaxed cmpxchg is implemented only on ppc
>>>>>>> in this changeset. Therefore, the behavior on the other platforms
>>> will
>>>>>>> not be changed with this changeset.
>>>>>>>
>>>>>>> In addition, with the new parameter of cmpxchg, this change
>>>> improves
>>>>>>> performance of copy_to_survivor in the parallel GC.
>>>>>>> copy_to_survivor changes forward pointers by using cmpxchg. This
>>>>>>> operation doesn't require any sync instructions.  A pointer is
>>> changed
>>>>>>> at most once in a GC and when cmpxchg fails, the latest pointer is
>>>>>>> available for the caller. cas_set_mark and cas_forward_to are
>>> extended
>>>>>>> with an additional memory_order parameter as cmpxchg and
>>>>> copy_to_survivor
>>>>>>> uses memory_order_relaxed to modify the forward pointers.
>>>>>>>
>>>>>>> Summary of source code changes:
>>>>>>>
>>>>>>> * src/share/vm/runtime/atomic.hpp
>>>>>>>       - Defines enum memory_order and adds a parameter to cmpxchg.
>>>>>>>
>>>>>>> * src/share/vm/runtime/atomic.cpp
>>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>>>>>>       - Added a parameter for each cmpxchg function to follow
>>>>>>>          the change of atomic.hpp. Their implementations are not
>>>>> changed.
>>>>>>>
>>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>>>>>>       - Added a parameter for each cmpxchg function to follow
>>>>>>>          the change of atomic.hpp. In addition, implementations
>>>>>>>          are changed corresponding to the specified memory_order.
>>>>>>>
>>>>>>> * src/share/vm/oops/oop.hpp
>>>>>>> * src/share/vm/oops/oop.inline.hpp
>>>>>>>       - Add a memory_order parameter to use relaxed cmpxchg in
>>>>>>>          cas_set_mark and cas_forward_to.
>>>>>>>
>>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp
>>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>>>>>>
>>>>>>> Martin tested this changeset  on linuxx86_64, linuxppc64le and
>>>>>>> darwinintel64.
>>>>>>> Though more time is needed to test on the other platform, we would
>>>>> like to
>>>>>>> ask
>>>>>>> reviews and start discussion on this changeset.
>>>>>>> I also tested this changeset with SPECjbb2013 and confirmed that gc
>>>>> pause
>>>>>>> time
>>>>>>> is reduced.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Hiroshi
>>>>>>> -----------------------
>>>>>>> Hiroshi Horii, Ph.D.
>>>>>>> IBM Research - Tokyo
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>


More information about the ppc-aix-port-dev mailing list