Hi Erik,

Thank you very much for your review.

I understood that implicit consume should not be used in the shared code. Also, I believe performance degradation would be negligible even if we use acquire.

New webrev uses memory_order_acq_rel: http://cr.openjdk.java.net/~mhorie/8154736/webrev.10


Best regards,
--
Michihiro,
IBM Research - Tokyo

Inactive hide details for Erik Osterlund ---2018/05/28 15:48:32---Hi Michihiro, In your analysis, you state that the failing CAErik Osterlund ---2018/05/28 15:48:32---Hi Michihiro, In your analysis, you state that the failing CAS path today already relies on implicit

From: Erik Osterlund <erik.osterlund@oracle.com>
To: Michihiro Horie <HORIE@jp.ibm.com>
Cc: Kim Barrett <kim.barrett@oracle.com>, "hotspot-dev@openjdk.java.net" <hotspot-dev@openjdk.java.net>, "ppc-aix-port-dev@openjdk.java.net" <ppc-aix-port-dev@openjdk.java.net>, Gustavo Bueno Romero <gromero@br.ibm.com>, "david.holmes@oracle.com" <david.holmes@oracle.com>, "hotspot-gc-dev@openjdk.java.net" <hotspot-gc-dev@openjdk.java.net>
Date: 2018/05/28 15:48
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64





Hi Michihiro,

In your analysis, you state that the failing CAS path today already relies on implicit consume ordering as reading forwardee() after the failed CAS is missing acquire and hence accesses into the new reloaded forwardee would rely on (implicit) data dependencies to the reloaded forwardee.

That part of the analysis seems wrong to me. Since today even a failed CAS has acquire semantics (and stronger), and the reloaded forwardee always has the same value as was observed in the failed cas (in this context), all data dependency requirements to the reloaded forwardee are therefore no longer needed or relied upon.

We do not use implicit consume in the shared C++ code. If you find any instances of that, it is a bug and should be purged with fire. Even explicit consume is currently strongly discouraged. Implicit consume is unreliable, especially in a project with many platforms.

If you insist on using more fragile semantics that are known to be unreliable, I would like to at least know what measurable performance difference you observe between the semantics Kim proposed, compared to the elided acquire variant you insist on. My gut feeling tells me that double sync is very intrusive, but an isync scheduled almost immediately after an lwsync, should be significantly less intrusive.

Thanks,
/Erik

On 28 May 2018, at 03:28, Michihiro Horie <HORIE@jp.ibm.com> wrote: