RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException
Erik Österlund
erik.osterlund at oracle.com
Mon Jun 5 11:52:02 UTC 2017
Hi Andrew,
On 2017-06-03 11:00, Andrew Haley wrote:
> On 30/05/17 11:57, Erik Österlund wrote:
>
>> The issue is not whether an algorithm depends on IRIW or not. The
>> issue is that we have to explicitly reason about IRIW to prove that
>> it works. The lack of IRIW violates seq_cst and by extension
>> linearization points that rely in seq_cst, and by extension
>> algorithms that rely on linearization points. By breaking the very
>> building blocks that were used to reason about algorithms and their
>> correctness, we rely on chance for it to work...
> I've been thinking about this some more, and one thing you said has
> been baffling me. I don't think that I have come across anywhere in
> HotSpot that uses seq_cst internally, and I don't think there is even
> any support for it in OrderAccess. The only thing that I know of
> where we actually need seq.cst is in the C++ interpreter and JNI's
> support for volatile fields, and there we simulate seq.cst by using
> release_fence; store; full fence.
Yes, I miss sequential consistency in hotspot too. In fact, my mouse pad
has "sequential consistency <3" printed on it. :) For atomic operations
such as CAS, we do use what we refer to as conservative memory ordering,
which appears to be closely related to SC.
Other than that, the main idea with SC is on an abstract level to have
(at least) the guarantees of the weaker acquire/release as well as the
effect of a full fence between any pair of SC accesses so that they can
agree upon some total ordering of SC accesses (let's not dig too far
into details and interactions between SC and acquire/release on the same
atomic objects just yet). In hotspot, rather than having such semantics,
we have instead elected to accomplish that either with interleaved
OrderAccess::fence() calls where this is required. In the cases you
mentioned, the fence is guarded with if statements checking whether
"support_IRIW_for_not_multiple_copy_atomic_cpu" is set to control the
convention of where the fence is placed.
I believe and hope that if we would have had an SC memory ordering, it
would have been used more.
> But we could do that with the
> seq.cst C++11 functions instead.
Or build our own SC bindings. In fact, my new Access API does come with
sequential consistency as one memory ordering constraint. Hopefully that
will not be shot down...
The benefit of rolling our own SC mappings is that we control the ABI
and conventions. Some immediate benefits are:
1) It is compliant with our dynamically generated code interacting with
our runtime. For example, the most popular bindings for SC on
non-multiple copy atomic CPUs seems to be either what is commonly
referred to as "leading sync" or "trailing sync" conventions where a
full fence is placed either before or after every SC access to ensure
every two SC accesses would be interleaved with a full fence. Both the
leading sync and trailing sync conventions were proven correct [1]. But
subsequently, the proof was proven wrong and it turns out that the
trailing sync bindings are not always safe as it may violate e.g. IRIW
when weaker acquire accesses are combined with SC accesses on the same
locations [2]. Yet the currently recommended ARMv7 bindings appear to be
trailing sync [3], because it has been determined to be faster.
Recently, both trailing sync and leading sync bindings have been
proposed that correct the currently known flaws of C++11 seq_cst [4].
These bindings are not necessarily compliant with lock-free races
between our runtime and our dynamically generated code.
2) The trailing sync and leading sync conventions used by C++11
compilers are not ABI compliant. And there is no real way for us to know
if a leading sync or trailing sync convention is used by the C++
compiler without full on selling our souls to uncontracted compiler
internals. Any subsequent compiler upgrade might break the shady
contract we thought we had with our dynamically generated code. This is
a fundamental issue with relying on C++11 atomics.
3) We can be compliant with our own JMM and not have to think about the
interactions and differences between our JMM and the evolving C++11
memory model when we write synchronized code.
4) Rather than having trailing sync on some architectures and leading
sync on others leading to observably different behaviour when combined
with dynamically generated code, I would ideally like to have the same
conventions on all machines so that any such behavioural difference for
generated code looks the same on all platforms. This is not the case for
C++11 atomics. Currently recommended bindings use leading sync for PPC
and trailing sync for ARMv7 [3]. And they will behave slightly differently.
5) The ARM manuals talk about "visibility" of stores as parts of its
contract - a property that is too specific to fit into the C++11 memory
model, yet might be useful for a JVM that needs to interact with
dynamically generated code. In particular, if some ARMv8 stlr
instruction guarantees "visibility" (which I think can be better thought
of as not reordering with any subsequent accesses in program order as a
memory model should preferably talk about constraints between memory
access orderings), then this is equivalent to always having a trailing
fence() on SC stores, and hence unlike e.g. a C++11 leading sync SC
store binding for PPC. I would argue we do want this stronger property
as our current generated code occasionally depends on that.
6) We can decide upon what properties we think are important
independently of which direction C++ decides to go. They can e.g. decide
to buy weaker consistency for a failing CAS in the name of micro
optimization, and we can elect not to buy that bag of problems.
Hope you bought my car sales man pitch.
> Of course I have a motive for digging into this: I'm the lead of the
> AArch64-port project. That processor (uniquely?) has real support for
> seq.cst, and it'd be nice to use it without throwing a full fence at
> every volatile store.
I understand your motivation. At least I think we both want some kind of
SC semantics in hotspot in one shape or another. :)
Thanks,
/Erik
[1] "Clarifying and Compiling C/C++ Concurrency: from C++11 to POWER",
M. Betty et. al., POPL'12
[2] "Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7
Trailing-Sync Compiler Mappings", Y. A. Manerka, 2016
[3] https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
[4] "Repairing sequential consistency in C/C++ 11", O. Lahav, PLDI'17
More information about the jdk10-dev
mailing list