[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Mon Dec 8 08:42:11 UTC 2014

On Sun, Dec 7, 2014 at 2:58 PM, David Holmes <david.holmes at oracle.com> wrote:

>> I believe the comment _does_ reflect hotspot's current implementation
>> (entirely from exploring the sources).
>> I believe it's correct to say "all of the platforms are
>> multiple-copy-atomic except PPC".

... current hotspot sources don't contain ARM support.

> Here is the definition of multi-copy atomicity from the ARM architecture
> manual:
>
> "In a multiprocessing system, writes to a memory location are multi-copy
> atomic if the following conditions are both true:
> • All writes to the same location are serialized, meaning they are observed
> in the same order by all observers, although some observers might not
> observe all of the writes.
> • A read of a location does not return the value of a write until all
> observers observe that write."

The hotspot sources give

"""
// To assure the IRIW property on processors that are not multiple copy
// atomic, sync instructions must be issued between volatile reads to
// assure their ordering, instead of after volatile stores.
// (See "A Tutorial Introduction to the ARM and POWER Relaxed Memory Models"
// by Luc Maranget, Susmit Sarkar and Peter Sewell, INRIA/Cambridge)
#ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
"""

and the referenced paper gives

"""
on POWER and ARM, two threads can observe writes to different
locations in different orders, even in
the absence of any thread-local reordering. In other words, the
architectures are not multiple-copy atomic [Col92].
"""

which strongly suggests that x86 and sparc are OK.

> The first condition is met by Total-Store-Order (TSO) systems like x86 and
> sparc; and not by relaxed-memory-order (RMO) systems like ARM and PPC.
> However the second condition is not met simply by having TSO. If the local
> processor can see a write from the local store buffer prior to it being
> visible to other processors, then we do not have multi-copy atomicity and I
> believe that is true for x86 and sparc. Hence none of our supported
> platforms are multi-copy-atomic as far as I can see.
>
>> I believe hotspot must implement IRIW correctly to fulfil the promise
>> of sequential consistency for standard Java, so on ppc volatile reads
>> get a full fence, which leads us back to the ppc pointer chasing
>> performance problem that started all of this.
>
>
> Note that nothing in the JSR-133 cookbook allows for IRIW, even on x86 and
> sparc. The key feature needed for IRIW is a load barrier that forces global
> memory synchronization to ensure that all processors see writes at the same
> time. I'm not even sure we can force that on x86 and sparc! Such a load
> barrier negates the need for some store barriers as defined in the cookbook.
>
> My understanding, which could be wrong, is that the JMM implies
> linearizability of volatile accesses, which in turn provides the IRIW
> property. It is also my understanding that linearizability is a necessary
> property for current proof systems to be applicable. However absence of
> proof is not proof of absence, and it doesn't follow that code that doesn't
> rely on IRIW is incorrect if IRIW is not ensured on a system. As has been
> stated many times now, in the literature no practical lock-free algorithm
> seems to rely on IRIW. So I still hope that IRIW can somehow be removed
> because implementing it will impact everything related to the JMM in
> hotspot.