enhancement of cmpxchg and copy_to_survivor for ppc64

David Holmes david.holmes at oracle.com
Tue Apr 19 00:03:39 UTC 2016


Hi Carsten,

On 18/04/2016 11:01 PM, Carsten Varming wrote:
> Dear Hiroshi and David,
>
> I took a brief look at the patch and the idea looks good to me, in fact,
> a bunch of the garbage collectors could benefit from similar changes (G1
> and ParNew both use oopDesc::forward_to_atomic which does not call
> cas_set_mark, and they could both benefit from the weaker semantics).
>
> An important question is: Should the shared parts of hotspot move
> towards weaker memory models? If yes, then everybody should review code
> assuming the weaker semantics. If no, then there really isn't room for
> patches like this one :(.

The shared code should be written in a form that is correct on the 
weakest memory model we support - always using the appropriate Atomic 
and/or OrderAccess operations. On individual platforms those operations 
then reduce - sometimes to no-ops - based on the platform semantics.

But that also requires we have a comprehensive set of Atomic and/or 
OrderAccess operations to choose from, so that we don't define 
algorithms that are unnecessarily excessive in their use of memory 
barriers. That said, establishing that a given algorithm+data-structure 
can be coded in a more relaxed form is a significant challenge in terms 
of "proving" correctness and adequate testing. I expect conservatism to 
rule in many places.

More on Volker's response ...

Thanks,
David

> The patch needs some work (IMHO). The name cas_set_mark must be renamed
> to reflect the weaker semantics, and the ifdefs in the shared code
> should be removed. All platforms should be changed as the Atomic API is
> extended, and platforms with weaker memory models should implement the
> weaker semantics. Finally, oopDesc::forward_to_atomic should use the
> weaker semantics, preferable by calling cas_set_mark instead of calling
> into Atomic directly.
>
> BTW. what is going on in oopDesc::forward_to_atomic? If curMark !=
> oldMark, then curMark must have been marked (a similar fact is checked
> by "assert(is_forwarded())" (modulo a short race)), oldMark is set to
> curMark and then oldMark->is_marked is checked in the look condition.
> Why is there a loop? It seems much simpler if the assert is changed to a
> "guarantee(curMark->is_marked)" and the loop is changed to an if
> statement. Oh, and an assert is probably sufficient.
>
> My 2 cents,
> Carsten
>
> On Mon, Apr 18, 2016 at 12:38 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Hiroshi,
>
>     On 18/04/2016 12:15 PM, Hiroshi H Horii wrote:
>
>         Hi David,
>
>         Thank you for your replying and sorry that I didn't append my
>         diff file
>         when the discussion was forwarded to this mailing list. I
>         appended my
>         original diff file (hg diff -g) to this mail.
>
>           > It is fine for ppc to have variations of cmpxchg with
>         different memory
>           > barrier semantics, but the shared API must not be affected
>         as there is a
>           > requirement that the basic form of this operation provide "full
>           > bi-directional fence" semantics. Note that these semantics
>         are not in
>           > place to fulfill Java Memory Model requirements, but are an
>         internal
>           > contract in hotspot.
>
>         Sure. Probably, it is better for me to modify my patch because it
>         changes the internal contract. I will create a new patch that
>         adds new
>         cmpxchg functions for ppc.
>
>
>     I think this is only usable from PPC specific code, not from the
>     shared code as per your original patch. The oopDesc::cas_set_mark
>     may be written to expect the full bi-directional fence that is
>     required by the atomic.hpp contract. If we break that contract we
>     would have to prove correctness along all code paths using that code
>     - well the ppc64 folk would have to do that :). But I would object
>     to the platform-specific code in the shared file - sorry.
>
>     Thanks,
>     David
>
>           > Also can you get someone to host the webrev
>           > for you on cr.openjdk.java.net <http://cr.openjdk.java.net>?
>         Or else include the diff in the bug
>         report.
>
>         I will ask someone to create webrev after my next patch is created.
>
>
>
>         Regards,
>         Hiroshi
>         -----------------------
>         Hiroshi Horii, Ph.D.
>         IBM Research - Tokyo
>
>
>         David Holmes <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>> wrote on 04/16/2016 16:43:20:
>
>           > From: David Holmes <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>
>           > To: Christian Thalinger <christian.thalinger at oracle.com
>         <mailto:christian.thalinger at oracle.com>>, Hiroshi H
>           > Horii/Japan/IBM at IBMJP
>           > Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
>         <mailto:Tim_Ellison at uk.ibm.com>>, ppc-aix-port-
>           > dev at openjdk.java.net <mailto:dev at openjdk.java.net>,
>         hotspot-runtime-dev at openjdk.java.net
>         <mailto:hotspot-runtime-dev at openjdk.java.net>
>           > Date: 04/16/2016 16:46
>           > Subject: Re: enhancement of cmpxchg and copy_to_survivor for
>         ppc64
>
>           >
>           > Hi Hiroshi,
>           >
>           > As the diff file does not survive the mail process I can't
>         see the
>           > actual proposed changes. There doesn't seem to be a bug for
>         this so
>           > could you please file one. Also can you get someone to host
>         the webrev
>           > for you on cr.openjdk.java.net <http://cr.openjdk.java.net>?
>         Or else include the diff in the bug
>         report.
>           >
>           > It is fine for ppc to have variations of cmpxchg with
>         different memory
>           > barrier semantics, but the shared API must not be affected
>         as there is a
>           > requirement that the basic form of this operation provide "full
>           > bi-directional fence" semantics. Note that these semantics
>         are not in
>           > place to fulfill Java Memory Model requirements, but are an
>         internal
>           > contract in hotspot.
>           >
>           > Thanks,
>           > David
>           >
>           > On 12/04/2016 3:59 AM, Christian Thalinger wrote:
>           > > [This should be on hotspot-runtime-dev.  BCC’ing
>         hotspot-compiler-dev.]
>           > >
>           > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii
>         <HORII at jp.ibm.com <mailto:HORII at jp.ibm.com>> wrote:
>           > >>
>           > >> Dear all:
>           > >>
>           > >> Can I please request reviews for the following change?
>           > >> This change was created for JDK 9 and ppc64.
>           > >>
>           > >> Description:
>           > >> This change adds options of compare-and-exchange for POWER
>         architecture.
>           > >> As described in atomic_linux_ppc.inline.hpp, the current
>         implementation of
>           > >> cmpxchg is fence_cmpxchg_acquire. This implementation is
>         useful for
>           > >> general purposes because twice calls of sync before and after
>         cmpxchg will
>           > >> keep consistency. However, they sometimes cause overheads
>         because
>           > >> sync instructions are very expensive in the current POWER
>         chip design.
>           > >> With this change, callers can explicitly specify to run
>         fence and
>           > acquire with
>           > >> two additional bool parameters. Because their default
>         values are
>         "true",
>           > >> it is not necessary to modify existing cmpxchg calls.
>           > >>
>           > >> In addition, with the new parameters of cmpxchg, this
>         change improves
>           > >> performance of copy_to_survivor in the parallel GC.
>           > >> copy_to_survivor changes forward pointers by using
>         cmpxchg. This
>           > >> operation doesn't require any sync instructions, in my
>         understanding.
>           > >> A pointer is changed at most once in a GC and when
>         cmpxchg fails,
>           > >> the latest pointer is available for the caller.
>           > >>
>           > >> When I evaluated SPECjbb2013 (slightly customized because
>         obsolete
>         grizzly
>           > >> doesn't support new version format of Java 9), pause time
>         of young
>         GC was
>           > >> reduced from 10% to 20%.
>           > >>
>           > >> Summary of source code changes:
>           > >>
>           > >> * src/share/vm/runtime/atomic.hpp
>           > >> * src/share/vm/runtime/atomic.cpp
>           > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>           > >>         - Add two arguments of fence and acquire to
>         cmpxchg only
>         for PPC64.
>           > >>           Though cmpxchg in atomic_linux_ppc.inline.hpp
>         has some
>         branches,
>           > >>           they are reduced while inlining to callers.
>           > >>
>           > >> * src/share/vm/oops/oop.inline.hpp
>           > >>        - Changed cas_set_mark to call cmpxchg without
>         fence and
>         acquire.
>           > >>           cas_set_mark is called only by cas_forward_to
>         that is
>           > called only by
>           > >>           copy_to_survivor_space and oop_promotion_failed in
>           > >>           psPromotionManager.
>           > >>
>           > >> Code change:
>           > >>
>           > >>     Please see an attached diff file that was generated
>         with "hg
>         diff -g"
>           > >>     under the latest hotspot directory.
>           > >>
>           > >> Passed test:
>           > >>      SPECjbb2013 (customized)
>           > >>
>           > >> * I believe some other cmpxchg will be optimized by
>         reducing fence
>           > >>    or acquire because twice calls of sync are too
>         conservative
>         toimplement
>           > >>    Java memory model.
>           > >>
>           > >>
>           > >>
>           > >> Regards,
>           > >> Hiroshi
>           > >> -----------------------
>           > >> Hiroshi Horii, Ph.D.
>           > >> IBM Research - Tokyo
>           > >>
>           > >
>           >
>
>


More information about the ppc-aix-port-dev mailing list