[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Sun Dec 7 22:58:06 UTC 2014

On 6/12/2014 7:29 AM, Martin Buchholz wrote:
> On Thu, Dec 4, 2014 at 5:36 PM, David Holmes <david.holmes at oracle.com> wrote:
>> Martin,
>>
>> On 2/12/2014 6:46 AM, Martin Buchholz wrote:
>
>> Is this finalized then? You can only make one commit per CR.
>
> Right.  I'd like to commit and then perhaps do another round of clarifications.
>
>> I still find this entire comment block to be misguided and misplaced:
>>
>> !     // Fences, also known as memory barriers, or membars.
>> !     // See hotspot sources for more details:
>> !     // orderAccess.hpp memnode.hpp unsafe.cpp
>> !     //
>> !     // One way of implementing Java language-level volatile variables
>> using
>> !     // fences (but there is often a better way without) is by:
>> !     // translating a volatile store into the sequence:
>> !     // - storeFence()
>> !     // - relaxed store
>> !     // - fullFence()
>> !     // and translating a volatile load into the sequence:
>> !     // - if (CPU_NOT_MULTIPLE_COPY_ATOMIC) fullFence()
>> !     // - relaxed load
>> !     // - loadFence()
>> !     // The full fence on volatile stores ensures the memory model
>> guarantee of
>> !     // sequential consistency on most platforms.  On some platforms (ppc)
>> we
>> !     // need an additional full fence between volatile loads as well (see
>> !     // hotspot's CPU_NOT_MULTIPLE_COPY_ATOMIC).
>
> Even I think this comment is marginal - I will delete it.  But
> consider this a plea for better documentation of the hotspot
> internals.

Okay, but Unsafe.java is not the place to document anything about hotspot.

>> why do want this description here - it has no relevance to the API itself,
>> nor to how volatiles are implemented in the VM. And as I said in the bug
>> report CPU_NOT_MULTIPLE_COPY_ATOMIC exists only for platforms that want to
>> implement IRIW (none of our platforms are multiple-copy-atomic, but only PPC
>> sets this so that it employs IRIW).
>
> I believe the comment _does_ reflect hotspot's current implementation
> (entirely from exploring the sources).
> I believe it's correct to say "all of the platforms are
> multiple-copy-atomic except PPC".

Here is the definition of multi-copy atomicity from the ARM architecture 
manual:

"In a multiprocessing system, writes to a memory location are multi-copy 
atomic if the following conditions are both true:
• All writes to the same location are serialized, meaning they are 
observed in the same order by all observers, although some observers 
might not observe all of the writes.
• A read of a location does not return the value of a write until all 
observers observe that write."

The first condition is met by Total-Store-Order (TSO) systems like x86 
and sparc; and not by relaxed-memory-order (RMO) systems like ARM and 
PPC. However the second condition is not met simply by having TSO. If 
the local processor can see a write from the local store buffer prior to 
it being visible to other processors, then we do not have multi-copy 
atomicity and I believe that is true for x86 and sparc. Hence none of 
our supported platforms are multi-copy-atomic as far as I can see.

> I believe hotspot must implement IRIW correctly to fulfil the promise
> of sequential consistency for standard Java, so on ppc volatile reads
> get a full fence, which leads us back to the ppc pointer chasing
> performance problem that started all of this.

Note that nothing in the JSR-133 cookbook allows for IRIW, even on x86 
and sparc. The key feature needed for IRIW is a load barrier that forces 
global memory synchronization to ensure that all processors see writes 
at the same time. I'm not even sure we can force that on x86 and sparc! 
Such a load barrier negates the need for some store barriers as defined 
in the cookbook.

My understanding, which could be wrong, is that the JMM implies 
linearizability of volatile accesses, which in turn provides the IRIW 
property. It is also my understanding that linearizability is a 
necessary property for current proof systems to be applicable. However 
absence of proof is not proof of absence, and it doesn't follow that 
code that doesn't rely on IRIW is incorrect if IRIW is not ensured on a 
system. As has been stated many times now, in the literature no 
practical lock-free algorithm seems to rely on IRIW. So I still hope 
that IRIW can somehow be removed because implementing it will impact 
everything related to the JMM in hotspot.

David
-----