RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Michihiro Horie HORIE at jp.ibm.com
Mon Jun 4 11:30:40 UTC 2018


>Hi Michihiro,
>
>looks good to me, too. Thanks for adding comments where you have changed
memory barriers.
>Pushed to submission repo and our internal testing.

Thanks a lot, Martin!


Best regards,
--
Michihiro,
IBM Research - Tokyo



From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie <HORIE at jp.ibm.com>, "Erik Österlund"
            <erik.osterlund at oracle.com>
Cc:	"Andrew Haley (aph at redhat.com)" <aph at redhat.com>,
            "david.holmes at oracle.com" <david.holmes at oracle.com>,
            "hotspot-gc-dev at openjdk.java.net"
            <hotspot-gc-dev at openjdk.java.net>, "Kim Barrett"
            <kim.barrett at oracle.com>, "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>
Date:	2018/06/04 19:17
Subject:	RE: RFR(M): 8154736: enhancement of cmpxchg and
            copy_to_survivor for ppc64



Hi Michihiro,

looks good to me, too. Thanks for adding comments where you have changed
memory barriers.
Pushed to submission repo and our internal testing.

Best regards,
Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Freitag, 1. Juni 2018 17:37
To: Erik Österlund <erik.osterlund at oracle.com>
Cc: Andrew Haley (aph at redhat.com) <aph at redhat.com>;
david.holmes at oracle.com; hotspot-gc-dev at openjdk.java.net; Kim Barrett
<kim.barrett at oracle.com>; Doerr, Martin <martin.doerr at sap.com>;
ppc-aix-port-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor
for ppc64



>Hi Michihiro,
>
>Looks good to me.

Thanks a lot, Erik!


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for "Erik Österlund" ---2018/06/02 00:15:15---Hi
Michihiro, Looks good to me."Erik Österlund" ---2018/06/02 00:15:15---Hi
Michihiro, Looks good to me.

From: "Erik Österlund" <erik.osterlund at oracle.com>
To: Michihiro Horie <HORIE at jp.ibm.com>, "Doerr, Martin" <
martin.doerr at sap.com>
Cc: "Andrew Haley (aph at redhat.com)" <aph at redhat.com>, "
david.holmes at oracle.com" <david.holmes at oracle.com>, "
hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>, Kim
Barrett <kim.barrett at oracle.com>, "ppc-aix-port-dev at openjdk.java.net" <
ppc-aix-port-dev at openjdk.java.net>
Date: 2018/06/02 00:15
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor
for ppc64




Hi Michihiro,

Looks good to me.

Thanks,
/Erik

On 2018-06-01 17:08, Michihiro Horie wrote:
            Hi Kim, Erik, and Martin,

            Thank you very much for reminding me that an acquire barrier in
            the else-statement for “!test_mark->is_marked()” is necessary
            under the criteria of not relying on the consume.

            I uploaded a new webrev :
            http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/
            This change uses forwardee_acquire(), which would generate
            better code on ARM.

            Necessary barriers are located in all the paths in
            copy_to_survivor_space, and the returned new_obj can be safely
            handled in the caller sites.

            I measured SPECjbb2015 with the latest webrev. Critical-jOPS
            improved by 5%. Since my previous measurement with implicit
            consume showed 6% improvement, adding acquire barriers degraded
            the performance a little, but 5% is still good enough.


            Best regards,
            --
            Michihiro,
            IBM Research - Tokyo

            Inactive

          hide details for "Doerr, Martin"
            ---2018/05/30

          16:18:09---Hi Erik, the current
            implementation works on PPC

          because of "Doerr, Martin"
            ---2018/05/30 16:18:09---Hi Erik, the current implementation
            works on PPC because of "MP+sync+addr".

            From: "Doerr, Martin" <martin.doerr at sap.com>
            To: "Erik Österlund" <erik.osterlund at oracle.com>, Kim Barrett
            <kim.barrett at oracle.com>, Michihiro Horie <HORIE at jp.ibm.com>,
            "Andrew Haley (aph at redhat.com)" <aph at redhat.com>
            Cc: "david.holmes at oracle.com" <david.holmes at oracle.com>,
            "hotspot-gc-dev at openjdk.java.net"
            <hotspot-gc-dev at openjdk.java.net>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>
            Date: 2018/05/30 16:18
            Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and
            copy_to_survivor for ppc64




            Hi Erik,

            the current implementation works on PPC because of "MP+sync
            +addr".
            So we already rely on ordering of "load volatile field" +
            "implicit consume" on the reader's side. We have never seen any
            issues related to this with the compilers we have been using
            during the ~10 years the PPC implementation exists.

            PPC supports "MP+lwsync+addr" the same way, so Michihiro's
            proposal doesn't make it unreliable for PPC.

            But I'm ok with evaluating acquire barriers although they are
            not required by the PPC/ARM memory models.
            ARM/aarch64 will also be affected when the o->forwardee uses
            load_acquire. So somebody should check the impact. If it is not
            acceptable we may need to introduce explicit consume.

            Implicit consume is also bad in shared code because somebody
            may want to run it on DEC Alpha.

            Thanks and best regards,
            Martin


            -----Original Message-----
            From: Erik Österlund [mailto:erik.osterlund at oracle.com]
            Sent: Dienstag, 29. Mai 2018 14:01
            To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
            <kim.barrett at oracle.com>; Michihiro Horie <HORIE at jp.ibm.com>
            Cc: david.holmes at oracle.com; Gustavo Bueno Romero
            <gromero at br.ibm.com>; hotspot-dev at openjdk.java.net;
            hotspot-gc-dev at openjdk.java.net;
            ppc-aix-port-dev at openjdk.java.net
            Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
            copy_to_survivor for ppc64

            Hi Martin and Michihiro,

            On 2018-05-29 12:30, Doerr, Martin wrote:
            > Hi Kim,
            >
            > I'm trying to understand how this is related to Michihiro's
            change. The else path of the initial test is not affected by it
            AFAICS.
            > So it sounds like a request to fix the current implementation
            in addition to what his original intend was.

            I think we are just trying to nail down the correct fencing and
            just go
            for that. And yes, this is arguably a pre-existing problem, but
            in a
            race involving the very same accesses that we are changing the
            fencing
            for. So it is not completely unrelated I suppose.

            In particular, hotspot has code that assumes that if you on the
            writer
            side issue a full fence before publishing a pointer to newly
            initialized
            data, then the initializing stores and their side effects
            should be
            globally "visible" across the system before the pointer to it
            is
            published, and hence elide the need for acquire on the loading
            side,
            without relying on retained data dependencies on the loader
            side. I
            believe this code falls under that category. It is assumed that
            the
            leading fence of the CAS publishing the forwarding pointer
            makes the
            initializing stores globally observable before publishing a
            pointer to
            the initialized data, hence assuming that any loads able to
            observe the
            new pointer would not rely on acquire or data dependent loads
            to
            correctly read the initialized data.

            Unfortunately, this is not reliable in the IRIW case, as per
            the litmus
            test "MP+sync+ctrl" as described in "Understanding POWER
            multiprocessors" (https://dl.acm.org/citation.cfm?id=1993520),
            as
            opposed to "MP+sync+addr" that gets away with it because of the
            data
            dependency (not IRIW). Similarly, an isync does the job too on
            the
            reader side as shown in MP+sync+ctrlisync. So while what I
            believe was
            the previous reasoning that the leading sync of the CAS would
            elide the
            necessity for acquire on the reader side without relying on
            data
            dependent loads (implicit consume), I think that assumption was
            wrong in
            the first place and that we do indeed need explicit acquire
            (even with
            the precious conservative CAS fencing) in this context to not
            rely on
            implicit consume semantics generating the required data
            dependent loads
            on the reader side. In practice though, the leading sync of the
            CAS has
            been enough to generate the correct machine code. Now, with the
            leading
            sync removed, we are increasing the possible holes in the
            generated
            machine code due to this flawed reasoning. So it would be nice
            to do
            something more sound instead that does not make such
            assumptions.

            > Anyway, I agree with that implicit consume is not good. And I
            think it would be good to treat both o->forwardee() the same
            way.
            > What about keeping memory_order_release for the CAS and using
            acquire for both o->forwardee()?
            > The case in which the CAS succeeds is safe because the
            current thread has created new_obj so it doesn't need memory
            barriers to access it.

            Sure, that sounds good to me.

            Thanks,
            /Erik

            > Thanks and best regards,
            > Martin
            >
            >
            > -----Original Message-----
            > From: Kim Barrett [mailto:kim.barrett at oracle.com]
            > Sent: Dienstag, 29. Mai 2018 01:54
            > To: Michihiro Horie <HORIE at jp.ibm.com>
            > Cc: Erik Osterlund <erik.osterlund at oracle.com>;
            david.holmes at oracle.com; Gustavo Bueno Romero
            <gromero at br.ibm.com>; hotspot-dev at openjdk.java.net;
            hotspot-gc-dev at openjdk.java.net;
            ppc-aix-port-dev at openjdk.java.net; Doerr, Martin
            <martin.doerr at sap.com>
            > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
            copy_to_survivor for ppc64
            >
            >> On May 28, 2018, at 4:12 AM, Michihiro Horie
            <HORIE at jp.ibm.com> wrote:
            >>
            >> Hi Erik,
            >>
            >> Thank you very much for your review.
            >>
            >> I understood that implicit consume should not be used in the
            shared code. Also, I believe performance degradation would be
            negligible even if we use acquire.
            >>
            >> New webrev uses memory_order_acq_rel:
            http://cr.openjdk.java.net/~mhorie/8154736/webrev.10
            > This is missing the acquire barrier on the else branch for
            the initial test, so fails to meet
            > the previously described minimal requirements for even
            possibly being sufficient.  Any
            > analysis of weakening the CAS barriers must consider that
            test and successor code.
            >
            > In the analysis, it’s not just the lexically nearby
            debugging / logging code that needs to be
            > considered; the forwardee is being returned to caller(s) that
            will presumably do something
            > with that object.
            >
            > Since the whole point of this discussion is performance, any
            proposed change should come
            > with performance information.
            >



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20180604/15d5db8d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20180604/15d5db8d/graycol-0001.gif>


More information about the ppc-aix-port-dev mailing list