RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Erik Österlund erik.osterlund at oracle.com
Fri Jun 1 15:18:03 UTC 2018


Hi Michihiro,

Looks good to me.

Thanks,
/Erik

On 2018-06-01 17:08, Michihiro Horie wrote:
>
> Hi Kim, Erik, and Martin,
>
> Thank you very much for reminding me that an acquire barrier in the 
> else-statement for “!test_mark->is_marked()” is necessary under the 
> criteria of not relying on the consume.
>
> I uploaded a new webrev : 
> http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/ 
> <http://cr.openjdk.java.net/%7Emhorie/8154736/webrev.13/>
> This change uses forwardee_acquire(), which would generate better code 
> on ARM.
>
> Necessary barriers are located in all the paths in 
> copy_to_survivor_space, and the returned new_obj can be safely handled 
> in the caller sites.
>
> I measured SPECjbb2015 with the latest webrev. Critical-jOPS improved 
> by 5%. Since my previous measurement with implicit consume showed 6% 
> improvement, adding acquire barriers degraded the performance a 
> little, but 5% is still good enough.
>
>
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo
>
> Inactive hide details for "Doerr, Martin" ---2018/05/30 16:18:09---Hi 
> Erik, the current implementation works on PPC because of "Doerr, 
> Martin" ---2018/05/30 16:18:09---Hi Erik, the current implementation 
> works on PPC because of "MP+sync+addr".
>
> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: "Erik Österlund" <erik.osterlund at oracle.com>, Kim Barrett 
> <kim.barrett at oracle.com>, Michihiro Horie <HORIE at jp.ibm.com>, "Andrew 
> Haley (aph at redhat.com)" <aph at redhat.com>
> Cc: "david.holmes at oracle.com" <david.holmes at oracle.com>, 
> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>, 
> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
> Date: 2018/05/30 16:18
> Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
>
> ------------------------------------------------------------------------
>
>
>
> Hi Erik,
>
> the current implementation works on PPC because of "MP+sync+addr".
> So we already rely on ordering of "load volatile field" + "implicit 
> consume" on the reader's side. We have never seen any issues related 
> to this with the compilers we have been using during the ~10 years the 
> PPC implementation exists.
>
> PPC supports "MP+lwsync+addr" the same way, so Michihiro's proposal 
> doesn't make it unreliable for PPC.
>
> But I'm ok with evaluating acquire barriers although they are not 
> required by the PPC/ARM memory models.
> ARM/aarch64 will also be affected when the o->forwardee uses 
> load_acquire. So somebody should check the impact. If it is not 
> acceptable we may need to introduce explicit consume.
>
> Implicit consume is also bad in shared code because somebody may want 
> to run it on DEC Alpha.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Erik Österlund [mailto:erik.osterlund at oracle.com]
> Sent: Dienstag, 29. Mai 2018 14:01
> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett 
> <kim.barrett at oracle.com>; Michihiro Horie <HORIE at jp.ibm.com>
> Cc: david.holmes at oracle.com; Gustavo Bueno Romero 
> <gromero at br.ibm.com>; hotspot-dev at openjdk.java.net; 
> hotspot-gc-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
>
> Hi Martin and Michihiro,
>
> On 2018-05-29 12:30, Doerr, Martin wrote:
> > Hi Kim,
> >
> > I'm trying to understand how this is related to Michihiro's change. 
> The else path of the initial test is not affected by it AFAICS.
> > So it sounds like a request to fix the current implementation in 
> addition to what his original intend was.
>
> I think we are just trying to nail down the correct fencing and just go
> for that. And yes, this is arguably a pre-existing problem, but in a
> race involving the very same accesses that we are changing the fencing
> for. So it is not completely unrelated I suppose.
>
> In particular, hotspot has code that assumes that if you on the writer
> side issue a full fence before publishing a pointer to newly initialized
> data, then the initializing stores and their side effects should be
> globally "visible" across the system before the pointer to it is
> published, and hence elide the need for acquire on the loading side,
> without relying on retained data dependencies on the loader side. I
> believe this code falls under that category. It is assumed that the
> leading fence of the CAS publishing the forwarding pointer makes the
> initializing stores globally observable before publishing a pointer to
> the initialized data, hence assuming that any loads able to observe the
> new pointer would not rely on acquire or data dependent loads to
> correctly read the initialized data.
>
> Unfortunately, this is not reliable in the IRIW case, as per the litmus
> test "MP+sync+ctrl" as described in "Understanding POWER
> multiprocessors" (https://dl.acm.org/citation.cfm?id=1993520), as
> opposed to "MP+sync+addr" that gets away with it because of the data
> dependency (not IRIW). Similarly, an isync does the job too on the
> reader side as shown in MP+sync+ctrlisync. So while what I believe was
> the previous reasoning that the leading sync of the CAS would elide the
> necessity for acquire on the reader side without relying on data
> dependent loads (implicit consume), I think that assumption was wrong in
> the first place and that we do indeed need explicit acquire (even with
> the precious conservative CAS fencing) in this context to not rely on
> implicit consume semantics generating the required data dependent loads
> on the reader side. In practice though, the leading sync of the CAS has
> been enough to generate the correct machine code. Now, with the leading
> sync removed, we are increasing the possible holes in the generated
> machine code due to this flawed reasoning. So it would be nice to do
> something more sound instead that does not make such assumptions.
>
> > Anyway, I agree with that implicit consume is not good. And I think 
> it would be good to treat both o->forwardee() the same way.
> > What about keeping memory_order_release for the CAS and using 
> acquire for both o->forwardee()?
> > The case in which the CAS succeeds is safe because the current 
> thread has created new_obj so it doesn't need memory barriers to 
> access it.
>
> Sure, that sounds good to me.
>
> Thanks,
> /Erik
>
> > Thanks and best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: Kim Barrett [mailto:kim.barrett at oracle.com]
> > Sent: Dienstag, 29. Mai 2018 01:54
> > To: Michihiro Horie <HORIE at jp.ibm.com>
> > Cc: Erik Osterlund <erik.osterlund at oracle.com>; 
> david.holmes at oracle.com; Gustavo Bueno Romero <gromero at br.ibm.com>; 
> hotspot-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; 
> ppc-aix-port-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> >
> >> On May 28, 2018, at 4:12 AM, Michihiro Horie <HORIE at jp.ibm.com> wrote:
> >>
> >> Hi Erik,
> >>
> >> Thank you very much for your review.
> >>
> >> I understood that implicit consume should not be used in the shared 
> code. Also, I believe performance degradation would be negligible even 
> if we use acquire.
> >>
> >> New webrev uses memory_order_acq_rel: 
> http://cr.openjdk.java.net/~mhorie/8154736/webrev.10 
> <http://cr.openjdk.java.net/%7Emhorie/8154736/webrev.10>
> > This is missing the acquire barrier on the else branch for the 
> initial test, so fails to meet
> > the previously described minimal requirements for even possibly 
> being sufficient.  Any
> > analysis of weakening the CAS barriers must consider that test and 
> successor code.
> >
> > In the analysis, it’s not just the lexically nearby debugging / 
> logging code that needs to be
> > considered; the forwardee is being returned to caller(s) that will 
> presumably do something
> > with that object.
> >
> > Since the whole point of this discussion is performance, any 
> proposed change should come
> > with performance information.
> >
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20180601/35f71c70/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20180601/35f71c70/attachment.gif>


More information about the ppc-aix-port-dev mailing list