RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Kim Barrett kim.barrett at oracle.com
Mon Jun 4 20:08:40 UTC 2018


> On Jun 1, 2018, at 11:08 AM, Michihiro Horie <HORIE at jp.ibm.com> wrote:
> 
> Hi Kim, Erik, and Martin,
> 
> Thank you very much for reminding me that an acquire barrier in the else-statement for “!test_mark->is_marked()” is necessary under the criteria of not relying on the consume. 
> 
> I uploaded a new webrev : http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/
> This change uses forwardee_acquire(), which would generate better code on ARM. 
> 
> Necessary barriers are located in all the paths in copy_to_survivor_space, and the returned new_obj can be safely handled in the caller sites.
> 
> I measured SPECjbb2015 with the latest webrev. Critical-jOPS improved by 5%. Since my previous measurement with implicit consume showed 6% improvement, adding acquire barriers degraded the performance a little, but 5% is still good enough.

Looks good.

> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
> 
> "Doerr, Martin" ---2018/05/30 16:18:09---Hi Erik, the current implementation works on PPC because of "MP+sync+addr".
> 
> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: "Erik Österlund" <erik.osterlund at oracle.com>, Kim Barrett <kim.barrett at oracle.com>, Michihiro Horie <HORIE at jp.ibm.com>, "Andrew Haley (aph at redhat.com)" <aph at redhat.com>
> Cc: "david.holmes at oracle.com" <david.holmes at oracle.com>, "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
> Date: 2018/05/30 16:18
> Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
> 
> 
> 
> 
> Hi Erik,
> 
> the current implementation works on PPC because of "MP+sync+addr".
> So we already rely on ordering of "load volatile field" + "implicit consume" on the reader's side. We have never seen any issues related to this with the compilers we have been using during the ~10 years the PPC implementation exists.
> 
> PPC supports "MP+lwsync+addr" the same way, so Michihiro's proposal doesn't make it unreliable for PPC.
> 
> But I'm ok with evaluating acquire barriers although they are not required by the PPC/ARM memory models.
> ARM/aarch64 will also be affected when the o->forwardee uses load_acquire. So somebody should check the impact. If it is not acceptable we may need to introduce explicit consume.
> 
> Implicit consume is also bad in shared code because somebody may want to run it on DEC Alpha.
> 
> Thanks and best regards,
> Martin
> 
> 
> -----Original Message-----
> From: Erik Österlund [mailto:erik.osterlund at oracle.com] 
> Sent: Dienstag, 29. Mai 2018 14:01
> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: david.holmes at oracle.com; Gustavo Bueno Romero <gromero at br.ibm.com>; hotspot-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
> 
> Hi Martin and Michihiro,
> 
> On 2018-05-29 12:30, Doerr, Martin wrote:
> > Hi Kim,
> >
> > I'm trying to understand how this is related to Michihiro's change. The else path of the initial test is not affected by it AFAICS.
> > So it sounds like a request to fix the current implementation in addition to what his original intend was.
> 
> I think we are just trying to nail down the correct fencing and just go 
> for that. And yes, this is arguably a pre-existing problem, but in a 
> race involving the very same accesses that we are changing the fencing 
> for. So it is not completely unrelated I suppose.
> 
> In particular, hotspot has code that assumes that if you on the writer 
> side issue a full fence before publishing a pointer to newly initialized 
> data, then the initializing stores and their side effects should be 
> globally "visible" across the system before the pointer to it is 
> published, and hence elide the need for acquire on the loading side, 
> without relying on retained data dependencies on the loader side. I 
> believe this code falls under that category. It is assumed that the 
> leading fence of the CAS publishing the forwarding pointer makes the 
> initializing stores globally observable before publishing a pointer to 
> the initialized data, hence assuming that any loads able to observe the 
> new pointer would not rely on acquire or data dependent loads to 
> correctly read the initialized data.
> 
> Unfortunately, this is not reliable in the IRIW case, as per the litmus 
> test "MP+sync+ctrl" as described in "Understanding POWER 
> multiprocessors" (https://dl.acm.org/citation.cfm?id=1993520), as 
> opposed to "MP+sync+addr" that gets away with it because of the data 
> dependency (not IRIW). Similarly, an isync does the job too on the 
> reader side as shown in MP+sync+ctrlisync. So while what I believe was 
> the previous reasoning that the leading sync of the CAS would elide the 
> necessity for acquire on the reader side without relying on data 
> dependent loads (implicit consume), I think that assumption was wrong in 
> the first place and that we do indeed need explicit acquire (even with 
> the precious conservative CAS fencing) in this context to not rely on 
> implicit consume semantics generating the required data dependent loads 
> on the reader side. In practice though, the leading sync of the CAS has 
> been enough to generate the correct machine code. Now, with the leading 
> sync removed, we are increasing the possible holes in the generated 
> machine code due to this flawed reasoning. So it would be nice to do 
> something more sound instead that does not make such assumptions.
> 
> > Anyway, I agree with that implicit consume is not good. And I think it would be good to treat both o->forwardee() the same way.
> > What about keeping memory_order_release for the CAS and using acquire for both o->forwardee()?
> > The case in which the CAS succeeds is safe because the current thread has created new_obj so it doesn't need memory barriers to access it.
> 
> Sure, that sounds good to me.
> 
> Thanks,
> /Erik
> 
> > Thanks and best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: Kim Barrett [mailto:kim.barrett at oracle.com]
> > Sent: Dienstag, 29. Mai 2018 01:54
> > To: Michihiro Horie <HORIE at jp.ibm.com>
> > Cc: Erik Osterlund <erik.osterlund at oracle.com>; david.holmes at oracle.com; Gustavo Bueno Romero <gromero at br.ibm.com>; hotspot-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
> >
> >> On May 28, 2018, at 4:12 AM, Michihiro Horie <HORIE at jp.ibm.com> wrote:
> >>
> >> Hi Erik,
> >>
> >> Thank you very much for your review.
> >>
> >> I understood that implicit consume should not be used in the shared code. Also, I believe performance degradation would be negligible even if we use acquire.
> >>
> >> New webrev uses memory_order_acq_rel: http://cr.openjdk.java.net/~mhorie/8154736/webrev.10
> > This is missing the acquire barrier on the else branch for the initial test, so fails to meet
> > the previously described minimal requirements for even possibly being sufficient.  Any
> > analysis of weakening the CAS barriers must consider that test and successor code.
> >
> > In the analysis, it’s not just the lexically nearby debugging / logging code that needs to be
> > considered; the forwardee is being returned to caller(s) that will presumably do something
> > with that object.
> >
> > Since the whole point of this discussion is performance, any proposed change should come
> > with performance information.
> >




More information about the ppc-aix-port-dev mailing list