RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
Doerr, Martin
martin.doerr at sap.com
Tue Jun 5 07:42:53 UTC 2018
Thanks for the contribution, and thanks everybody for discussing and reviewing.
I’ve pushed it.
Best regards,
Martin
From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Dienstag, 5. Juni 2018 00:42
To: Kim Barrett <kim.barrett at oracle.com>
Cc: Andrew Haley (aph at redhat.com) <aph at redhat.com>; david.holmes at oracle.com; Erik Österlund <erik.osterlund at oracle.com>; hotspot-gc-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
>> On Jun 1, 2018, at 11:08 AM, Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>> wrote:
>>
>> Hi Kim, Erik, and Martin,
>>
>> Thank you very much for reminding me that an acquire barrier in the else-statement for “!test_mark->is_marked()” is necessary under the criteria of not relying on the consume.
>>
>> I uploaded a new webrev : http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/
>> This change uses forwardee_acquire(), which would generate better code on ARM.
>>
>> Necessary barriers are located in all the paths in copy_to_survivor_space, and the returned new_obj can be safely handled in the caller sites.
>>
>> I measured SPECjbb2015 with the latest webrev. Critical-jOPS improved by 5%. Since my previous measurement with implicit consume showed 6% improvement, adding acquire barriers degraded the performance a little, but 5% is still good enough.
>
>Looks good.
Thanks a lot, Kim!
Best regards,
--
Michihiro,
IBM Research - Tokyo
[Inactive hide details for Kim Barrett ---2018/06/05 05:08:48---> On Jun 1, 2018, at 11:08 AM, Michihiro Horie <HORIE at jp.ibm.com]Kim Barrett ---2018/06/05 05:08:48---> On Jun 1, 2018, at 11:08 AM, Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>> wrote: >
From: Kim Barrett <kim.barrett at oracle.com<mailto:kim.barrett at oracle.com>>
To: Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
Cc: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "Andrew Haley (aph at redhat.com<mailto:aph at redhat.com>)" <aph at redhat.com<mailto:aph at redhat.com>>, "david.holmes at oracle.com<mailto:david.holmes at oracle.com>" <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>, "Erik Österlund" <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>>, "hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>" <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>
Date: 2018/06/05 05:08
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
________________________________
> On Jun 1, 2018, at 11:08 AM, Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>> wrote:
>
> Hi Kim, Erik, and Martin,
>
> Thank you very much for reminding me that an acquire barrier in the else-statement for “!test_mark->is_marked()” is necessary under the criteria of not relying on the consume.
>
> I uploaded a new webrev : http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/
> This change uses forwardee_acquire(), which would generate better code on ARM.
>
> Necessary barriers are located in all the paths in copy_to_survivor_space, and the returned new_obj can be safely handled in the caller sites.
>
> I measured SPECjbb2015 with the latest webrev. Critical-jOPS improved by 5%. Since my previous measurement with implicit consume showed 6% improvement, adding acquire barriers degraded the performance a little, but 5% is still good enough.
Looks good.
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> "Doerr, Martin" ---2018/05/30 16:18:09---Hi Erik, the current implementation works on PPC because of "MP+sync+addr".
>
> From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
> To: "Erik Österlund" <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>>, Kim Barrett <kim.barrett at oracle.com<mailto:kim.barrett at oracle.com>>, Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>, "Andrew Haley (aph at redhat.com<mailto:aph at redhat.com>)" <aph at redhat.com<mailto:aph at redhat.com>>
> Cc: "david.holmes at oracle.com<mailto:david.holmes at oracle.com>" <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>, "hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>" <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>
> Date: 2018/05/30 16:18
> Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
>
>
>
>
> Hi Erik,
>
> the current implementation works on PPC because of "MP+sync+addr".
> So we already rely on ordering of "load volatile field" + "implicit consume" on the reader's side. We have never seen any issues related to this with the compilers we have been using during the ~10 years the PPC implementation exists.
>
> PPC supports "MP+lwsync+addr" the same way, so Michihiro's proposal doesn't make it unreliable for PPC.
>
> But I'm ok with evaluating acquire barriers although they are not required by the PPC/ARM memory models.
> ARM/aarch64 will also be affected when the o->forwardee uses load_acquire. So somebody should check the impact. If it is not acceptable we may need to introduce explicit consume.
>
> Implicit consume is also bad in shared code because somebody may want to run it on DEC Alpha.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
> Sent: Dienstag, 29. Mai 2018 14:01
> To: Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; Kim Barrett <kim.barrett at oracle.com<mailto:kim.barrett at oracle.com>>; Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
> Cc: david.holmes at oracle.com<mailto:david.holmes at oracle.com>; Gustavo Bueno Romero <gromero at br.ibm.com<mailto:gromero at br.ibm.com>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
>
> Hi Martin and Michihiro,
>
> On 2018-05-29 12:30, Doerr, Martin wrote:
> > Hi Kim,
> >
> > I'm trying to understand how this is related to Michihiro's change. The else path of the initial test is not affected by it AFAICS.
> > So it sounds like a request to fix the current implementation in addition to what his original intend was.
>
> I think we are just trying to nail down the correct fencing and just go
> for that. And yes, this is arguably a pre-existing problem, but in a
> race involving the very same accesses that we are changing the fencing
> for. So it is not completely unrelated I suppose.
>
> In particular, hotspot has code that assumes that if you on the writer
> side issue a full fence before publishing a pointer to newly initialized
> data, then the initializing stores and their side effects should be
> globally "visible" across the system before the pointer to it is
> published, and hence elide the need for acquire on the loading side,
> without relying on retained data dependencies on the loader side. I
> believe this code falls under that category. It is assumed that the
> leading fence of the CAS publishing the forwarding pointer makes the
> initializing stores globally observable before publishing a pointer to
> the initialized data, hence assuming that any loads able to observe the
> new pointer would not rely on acquire or data dependent loads to
> correctly read the initialized data.
>
> Unfortunately, this is not reliable in the IRIW case, as per the litmus
> test "MP+sync+ctrl" as described in "Understanding POWER
> multiprocessors" (https://dl.acm.org/citation.cfm?id=1993520), as
> opposed to "MP+sync+addr" that gets away with it because of the data
> dependency (not IRIW). Similarly, an isync does the job too on the
> reader side as shown in MP+sync+ctrlisync. So while what I believe was
> the previous reasoning that the leading sync of the CAS would elide the
> necessity for acquire on the reader side without relying on data
> dependent loads (implicit consume), I think that assumption was wrong in
> the first place and that we do indeed need explicit acquire (even with
> the precious conservative CAS fencing) in this context to not rely on
> implicit consume semantics generating the required data dependent loads
> on the reader side. In practice though, the leading sync of the CAS has
> been enough to generate the correct machine code. Now, with the leading
> sync removed, we are increasing the possible holes in the generated
> machine code due to this flawed reasoning. So it would be nice to do
> something more sound instead that does not make such assumptions.
>
> > Anyway, I agree with that implicit consume is not good. And I think it would be good to treat both o->forwardee() the same way.
> > What about keeping memory_order_release for the CAS and using acquire for both o->forwardee()?
> > The case in which the CAS succeeds is safe because the current thread has created new_obj so it doesn't need memory barriers to access it.
>
> Sure, that sounds good to me.
>
> Thanks,
> /Erik
>
> > Thanks and best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: Kim Barrett [mailto:kim.barrett at oracle.com]
> > Sent: Dienstag, 29. Mai 2018 01:54
> > To: Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>
> > Cc: Erik Osterlund <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>>; david.holmes at oracle.com<mailto:david.holmes at oracle.com>; Gustavo Bueno Romero <gromero at br.ibm.com<mailto:gromero at br.ibm.com>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
> >
> >> On May 28, 2018, at 4:12 AM, Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>> wrote:
> >>
> >> Hi Erik,
> >>
> >> Thank you very much for your review.
> >>
> >> I understood that implicit consume should not be used in the shared code. Also, I believe performance degradation would be negligible even if we use acquire.
> >>
> >> New webrev uses memory_order_acq_rel: http://cr.openjdk.java.net/~mhorie/8154736/webrev.10
> > This is missing the acquire barrier on the else branch for the initial test, so fails to meet
> > the previously described minimal requirements for even possibly being sufficient. Any
> > analysis of weakening the CAS barriers must consider that test and successor code.
> >
> > In the analysis, it’s not just the lexically nearby debugging / logging code that needs to be
> > considered; the forwardee is being returned to caller(s) that will presumably do something
> > with that object.
> >
> > Since the whole point of this discussion is performance, any proposed change should come
> > with performance information.
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20180605/ff303a82/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20180605/ff303a82/image001-0001.gif>
More information about the ppc-aix-port-dev
mailing list