RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Wed May 10 06:21:18 UTC 2017

On 10/05/17 05:04, David Holmes wrote:
> I'm somewhat confused by this description. Outside of Aarch64 the 
> general approach, for C1 and Unsafe at least, is that a volatile-read is 
> a load-acquire() (or a fence-load-acquire if you want the IRIW support) 
> and a volatile write is a release-store-fence (or just release-store 
> with IRIW support). Does Aarch64 not follow this pattern?

No.  AArch64 has its own sequentially-consistent load and store
instructions which are designed to provide just enough for volatiles
but no more.  These are preferable to using fences, but that's hard in
C1 because shared code inserts fences, regardless of the target
machine.  This is wrong, but it's legacy code.

> I'm trying to see if the issue here is the original code generation or a 
> subtle incompatibility between the ld-acq/st-rel instructions and 
> explicit DMB.

I wouldn't be surprised.  The problem is that the approach taken in
HotSpot is much too naive.  There is not an exact correspondence
between real processors' fence instructions and what we need for
Hotspot.  The best mappings are here:

https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

The ones we need for volatiles are the Seq Cst set.

As you can see, the choices of instruction sequences are different for
different processors.  C1 (and all the compilers) should delegate this
to the back ends, but instead they try map volatile accesses onto
acquire/release/fence.

PPC has special code which is #ifdef'd in the shared code in the
compilers, so I'm sure it gets this right.

Andrew.