<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Michihiro,<br>
<br>
Looks good to me.<br>
<br>
Thanks,<br>
/Erik<br>
<br>
<div class="moz-cite-prefix">On 2018-06-01 17:08, Michihiro Horie
wrote:<br>
</div>
<blockquote
cite="mid:OFCF19FA4A.D56AE1C0-ON0025829F.00522EA7-4925829F.00532D73@notes.na.collabserv.com"
type="cite">
<p><font size="2">Hi Kim, Erik, and Martin,</font><br>
<br>
<font size="2">Thank you very much for reminding me that an
acquire barrier in the else-statement for
“!test_mark->is_marked()” is necessary under the criteria
of not relying on the consume. </font><br>
<br>
<font size="2">I uploaded a new webrev : </font><font size="2"><a
moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Emhorie/8154736/webrev.13/"><a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/">http://cr.openjdk.java.net/~mhorie/8154736/webrev.13/</a></a></font><br>
<font size="2">This change uses forwardee_acquire(), which would
generate better code on ARM.</font><font size="2"> </font><br>
<br>
<font size="2">Necessary barriers are located in all the paths
in copy_to_survivor_space, and the returned new_obj can be
safely handled in the caller sites.</font><br>
<br>
<font size="2">I measured SPECjbb2015 with the latest webrev.
Critical-jOPS improved by 5%. Since my previous measurement
with implicit consume showed 6% improvement, a</font><font
size="2">dding acquire barriers degraded the performance a
little, but 5% is still good enough.</font><br>
<br>
<br>
<font size="2">Best regards,</font><br>
<font size="2">--</font><br>
<font size="2">Michihiro,</font><br>
<font size="2">IBM Research - Tokyo</font><br>
<br>
<img src="cid:part2.03000201.06070403@oracle.com" alt="Inactive
hide details for "Doerr, Martin" ---2018/05/30
16:18:09---Hi Erik, the current implementation works on PPC
because of " border="0" height="16" width="16"><font size="2"
color="#424282">"Doerr, Martin" ---2018/05/30 16:18:09---Hi
Erik, the current implementation works on PPC because of
"MP+sync+addr".</font><br>
<br>
<font size="2" color="#5F5F5F">From: </font><font size="2">"Doerr,
Martin" <a class="moz-txt-link-rfc2396E" href="mailto:martin.doerr@sap.com"><martin.doerr@sap.com></a></font><br>
<font size="2" color="#5F5F5F">To: </font><font size="2">"Erik
Österlund" <a class="moz-txt-link-rfc2396E" href="mailto:erik.osterlund@oracle.com"><erik.osterlund@oracle.com></a>, Kim Barrett
<a class="moz-txt-link-rfc2396E" href="mailto:kim.barrett@oracle.com"><kim.barrett@oracle.com></a>, Michihiro Horie
<a class="moz-txt-link-rfc2396E" href="mailto:HORIE@jp.ibm.com"><HORIE@jp.ibm.com></a>, "Andrew Haley (<a class="moz-txt-link-abbreviated" href="mailto:aph@redhat.com">aph@redhat.com</a>)"
<a class="moz-txt-link-rfc2396E" href="mailto:aph@redhat.com"><aph@redhat.com></a></font><br>
<font size="2" color="#5F5F5F">Cc: </font><font size="2"><a class="moz-txt-link-rfc2396E" href="mailto:david.holmes@oracle.com">"david.holmes@oracle.com"</a>
<a class="moz-txt-link-rfc2396E" href="mailto:david.holmes@oracle.com"><david.holmes@oracle.com></a>,
<a class="moz-txt-link-rfc2396E" href="mailto:hotspot-gc-dev@openjdk.java.net">"hotspot-gc-dev@openjdk.java.net"</a>
<a class="moz-txt-link-rfc2396E" href="mailto:hotspot-gc-dev@openjdk.java.net"><hotspot-gc-dev@openjdk.java.net></a>,
<a class="moz-txt-link-rfc2396E" href="mailto:ppc-aix-port-dev@openjdk.java.net">"ppc-aix-port-dev@openjdk.java.net"</a>
<a class="moz-txt-link-rfc2396E" href="mailto:ppc-aix-port-dev@openjdk.java.net"><ppc-aix-port-dev@openjdk.java.net></a></font><br>
<font size="2" color="#5F5F5F">Date: </font><font size="2">2018/05/30
16:18</font><br>
<font size="2" color="#5F5F5F">Subject: </font><font size="2">RE:
RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor
for ppc64</font><br>
</p>
<hr style="color:#8091A5; " align="left" size="2"
noshade="noshade" width="100%"><br>
<br>
<br>
<tt><font size="2">Hi Erik,<br>
<br>
the current implementation works on PPC because of
"MP+sync+addr".<br>
So we already rely on ordering of "load volatile field" +
"implicit consume" on the reader's side. We have never seen
any issues related to this with the compilers we have been
using during the ~10 years the PPC implementation exists.<br>
<br>
PPC supports "MP+lwsync+addr" the same way, so Michihiro's
proposal doesn't make it unreliable for PPC.<br>
<br>
But I'm ok with evaluating acquire barriers although they are
not required by the PPC/ARM memory models.<br>
ARM/aarch64 will also be affected when the o->forwardee
uses load_acquire. So somebody should check the impact. If it
is not acceptable we may need to introduce explicit consume.<br>
<br>
Implicit consume is also bad in shared code because somebody
may want to run it on DEC Alpha.<br>
<br>
Thanks and best regards,<br>
Martin<br>
<br>
<br>
-----Original Message-----<br>
From: Erik Österlund [</font></tt><tt><font size="2"><a
moz-do-not-send="true"
href="mailto:erik.osterlund@oracle.com"><a class="moz-txt-link-freetext" href="mailto:erik.osterlund@oracle.com">mailto:erik.osterlund@oracle.com</a></a></font></tt><tt><font
size="2">] <br>
Sent: Dienstag, 29. Mai 2018 14:01<br>
To: Doerr, Martin <a class="moz-txt-link-rfc2396E" href="mailto:martin.doerr@sap.com"><martin.doerr@sap.com></a>; Kim Barrett
<a class="moz-txt-link-rfc2396E" href="mailto:kim.barrett@oracle.com"><kim.barrett@oracle.com></a>; Michihiro Horie
<a class="moz-txt-link-rfc2396E" href="mailto:HORIE@jp.ibm.com"><HORIE@jp.ibm.com></a><br>
Cc: <a class="moz-txt-link-abbreviated" href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>; Gustavo Bueno Romero
<a class="moz-txt-link-rfc2396E" href="mailto:gromero@br.ibm.com"><gromero@br.ibm.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:hotspot-dev@openjdk.java.net">hotspot-dev@openjdk.java.net</a>;
<a class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-dev@openjdk.java.net">hotspot-gc-dev@openjdk.java.net</a>;
<a class="moz-txt-link-abbreviated" href="mailto:ppc-aix-port-dev@openjdk.java.net">ppc-aix-port-dev@openjdk.java.net</a><br>
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
copy_to_survivor for ppc64<br>
<br>
Hi Martin and Michihiro,<br>
<br>
On 2018-05-29 12:30, Doerr, Martin wrote:<br>
> Hi Kim,<br>
><br>
> I'm trying to understand how this is related to
Michihiro's change. The else path of the initial test is not
affected by it AFAICS.<br>
> So it sounds like a request to fix the current
implementation in addition to what his original intend was.<br>
<br>
I think we are just trying to nail down the correct fencing
and just go <br>
for that. And yes, this is arguably a pre-existing problem,
but in a <br>
race involving the very same accesses that we are changing the
fencing <br>
for. So it is not completely unrelated I suppose.<br>
<br>
In particular, hotspot has code that assumes that if you on
the writer <br>
side issue a full fence before publishing a pointer to newly
initialized <br>
data, then the initializing stores and their side effects
should be <br>
globally "visible" across the system before the pointer to it
is <br>
published, and hence elide the need for acquire on the loading
side, <br>
without relying on retained data dependencies on the loader
side. I <br>
believe this code falls under that category. It is assumed
that the <br>
leading fence of the CAS publishing the forwarding pointer
makes the <br>
initializing stores globally observable before publishing a
pointer to <br>
the initialized data, hence assuming that any loads able to
observe the <br>
new pointer would not rely on acquire or data dependent loads
to <br>
correctly read the initialized data.<br>
<br>
Unfortunately, this is not reliable in the IRIW case, as per
the litmus <br>
test "MP+sync+ctrl" as described in "Understanding POWER <br>
multiprocessors" (</font></tt><tt><font size="2"><a
moz-do-not-send="true"
href="https://dl.acm.org/citation.cfm?id=1993520"><a class="moz-txt-link-freetext" href="https://dl.acm.org/citation.cfm?id=1993520">https://dl.acm.org/citation.cfm?id=1993520</a></a></font></tt><tt><font
size="2">), as <br>
opposed to "MP+sync+addr" that gets away with it because of
the data <br>
dependency (not IRIW). Similarly, an isync does the job too on
the <br>
reader side as shown in MP+sync+ctrlisync. So while what I
believe was <br>
the previous reasoning that the leading sync of the CAS would
elide the <br>
necessity for acquire on the reader side without relying on
data <br>
dependent loads (implicit consume), I think that assumption
was wrong in <br>
the first place and that we do indeed need explicit acquire
(even with <br>
the precious conservative CAS fencing) in this context to not
rely on <br>
implicit consume semantics generating the required data
dependent loads <br>
on the reader side. In practice though, the leading sync of
the CAS has <br>
been enough to generate the correct machine code. Now, with
the leading <br>
sync removed, we are increasing the possible holes in the
generated <br>
machine code due to this flawed reasoning. So it would be nice
to do <br>
something more sound instead that does not make such
assumptions.<br>
<br>
> Anyway, I agree with that implicit consume is not good.
And I think it would be good to treat both o->forwardee()
the same way.<br>
> What about keeping memory_order_release for the CAS and
using acquire for both o->forwardee()?<br>
> The case in which the CAS succeeds is safe because the
current thread has created new_obj so it doesn't need memory
barriers to access it.<br>
<br>
Sure, that sounds good to me.<br>
<br>
Thanks,<br>
/Erik<br>
<br>
> Thanks and best regards,<br>
> Martin<br>
><br>
><br>
> -----Original Message-----<br>
> From: Kim Barrett [</font></tt><tt><font size="2"><a
moz-do-not-send="true" href="mailto:kim.barrett@oracle.com"><a class="moz-txt-link-freetext" href="mailto:kim.barrett@oracle.com">mailto:kim.barrett@oracle.com</a></a></font></tt><tt><font
size="2">]<br>
> Sent: Dienstag, 29. Mai 2018 01:54<br>
> To: Michihiro Horie <a class="moz-txt-link-rfc2396E" href="mailto:HORIE@jp.ibm.com"><HORIE@jp.ibm.com></a><br>
> Cc: Erik Osterlund <a class="moz-txt-link-rfc2396E" href="mailto:erik.osterlund@oracle.com"><erik.osterlund@oracle.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>; Gustavo Bueno Romero
<a class="moz-txt-link-rfc2396E" href="mailto:gromero@br.ibm.com"><gromero@br.ibm.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:hotspot-dev@openjdk.java.net">hotspot-dev@openjdk.java.net</a>;
<a class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-dev@openjdk.java.net">hotspot-gc-dev@openjdk.java.net</a>;
<a class="moz-txt-link-abbreviated" href="mailto:ppc-aix-port-dev@openjdk.java.net">ppc-aix-port-dev@openjdk.java.net</a>; Doerr, Martin
<a class="moz-txt-link-rfc2396E" href="mailto:martin.doerr@sap.com"><martin.doerr@sap.com></a><br>
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
copy_to_survivor for ppc64<br>
><br>
>> On May 28, 2018, at 4:12 AM, Michihiro Horie
<a class="moz-txt-link-rfc2396E" href="mailto:HORIE@jp.ibm.com"><HORIE@jp.ibm.com></a> wrote:<br>
>><br>
>> Hi Erik,<br>
>><br>
>> Thank you very much for your review.<br>
>><br>
>> I understood that implicit consume should not be used
in the shared code. Also, I believe performance degradation
would be negligible even if we use acquire.<br>
>><br>
>> New webrev uses memory_order_acq_rel: </font></tt><tt><font
size="2"><a moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Emhorie/8154736/webrev.10">http://cr.openjdk.java.net/~mhorie/8154736/webrev.10</a></font></tt><tt><font
size="2"><br>
> This is missing the acquire barrier on the else branch
for the initial test, so fails to meet<br>
> the previously described minimal requirements for even
possibly being sufficient. Any<br>
> analysis of weakening the CAS barriers must consider that
test and successor code.<br>
><br>
> In the analysis, it’s not just the lexically nearby
debugging / logging code that needs to be<br>
> considered; the forwardee is being returned to caller(s)
that will presumably do something<br>
> with that object.<br>
><br>
> Since the whole point of this discussion is performance,
any proposed change should come<br>
> with performance information.<br>
><br>
<br>
<br>
</font></tt><br>
<br>
</blockquote>
<br>
</body>
</html>