RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException

Mon Jun 5 11:52:02 UTC 2017

Hi Andrew,

On 2017-06-03 11:00, Andrew Haley wrote:
> On 30/05/17 11:57, Erik Österlund wrote:
>
>> The issue is not whether an algorithm depends on IRIW or not. The
>> issue is that we have to explicitly reason about IRIW to prove that
>> it works.  The lack of IRIW violates seq_cst and by extension
>> linearization points that rely in seq_cst, and by extension
>> algorithms that rely on linearization points. By breaking the very
>> building blocks that were used to reason about algorithms and their
>> correctness, we rely on chance for it to work...
> I've been thinking about this some more, and one thing you said has
> been baffling me.  I don't think that I have come across anywhere in
> HotSpot that uses seq_cst internally, and I don't think there is even
> any support for it in OrderAccess.  The only thing that I know of
> where we actually need seq.cst is in the C++ interpreter and JNI's
> support for volatile fields, and there we simulate seq.cst by using
> release_fence; store; full fence.

Yes, I miss sequential consistency in hotspot too. In fact, my mouse pad 
has "sequential consistency <3" printed on it. :) For atomic operations 
such as CAS, we do use what we refer to as conservative memory ordering, 
which appears to be closely related to SC.

Other than that, the main idea with SC is on an abstract level to have 
(at least) the guarantees of the weaker acquire/release as well as the 
effect of a full fence between any pair of SC accesses so that they can 
agree upon some total ordering of SC accesses (let's not dig too far 
into details and interactions between SC and acquire/release on the same 
atomic objects just yet). In hotspot, rather than having such semantics, 
we have instead elected to accomplish that either with interleaved 
OrderAccess::fence() calls where this is required. In the cases you 
mentioned, the fence is guarded with if statements checking whether 
"support_IRIW_for_not_multiple_copy_atomic_cpu" is set to control the 
convention of where the fence is placed.

I believe and hope that if we would have had an SC memory ordering, it 
would have been used more.

> But we could do that with the
> seq.cst C++11 functions instead.

Or build our own SC bindings. In fact, my new Access API does come with 
sequential consistency as one memory ordering constraint. Hopefully that 
will not be shot down...

The benefit of rolling our own SC mappings is that we control the ABI 
and conventions. Some immediate benefits are:

1) It is compliant with our dynamically generated code interacting with 
our runtime. For example, the most popular bindings for SC on 
non-multiple copy atomic CPUs seems to be either what is commonly 
referred to as "leading sync" or "trailing sync" conventions where a 
full fence is placed either before or after every SC access to ensure 
every two SC accesses would be interleaved with a full fence. Both the 
leading sync and trailing sync conventions were proven correct [1]. But 
subsequently, the proof was proven wrong and it turns out that the 
trailing sync bindings are not always safe as it may violate e.g. IRIW 
when weaker acquire accesses are combined with SC accesses on the same 
locations [2]. Yet the currently recommended ARMv7 bindings appear to be 
trailing sync [3], because it has been determined to be faster. 
Recently, both trailing sync and leading sync bindings have been 
proposed that correct the currently known flaws of C++11 seq_cst [4]. 
These bindings are not necessarily compliant with lock-free races 
between our runtime and our dynamically generated code.

2) The trailing sync and leading sync conventions used by C++11 
compilers are not ABI compliant. And there is no real way for us to know 
if a leading sync or trailing sync convention is used by the C++ 
compiler without full on selling our souls to uncontracted compiler 
internals. Any subsequent compiler upgrade might break the shady 
contract we thought we had with our dynamically generated code. This is 
a fundamental issue with relying on C++11 atomics.

3) We can be compliant with our own JMM and not have to think about the 
interactions and differences between our JMM and the evolving C++11 
memory model when we write synchronized code.

4) Rather than having trailing sync on some architectures and leading 
sync on others leading to observably different behaviour when combined 
with dynamically generated code, I would ideally like to have the same 
conventions on all machines so that any such behavioural difference for 
generated code looks the same on all platforms. This is not the case for 
C++11 atomics. Currently recommended bindings use leading sync for PPC 
and trailing sync for ARMv7 [3]. And they will behave slightly differently.

5) The ARM manuals talk about "visibility" of stores as parts of its 
contract - a property that is too specific to fit into the C++11 memory 
model, yet might be useful for a JVM that needs to interact with 
dynamically generated code. In particular, if some ARMv8 stlr 
instruction guarantees "visibility" (which I think can be better thought 
of as not reordering with any subsequent accesses in program order as a 
memory model should preferably talk about constraints between memory 
access orderings), then this is equivalent to always having a trailing 
fence() on SC stores, and hence unlike e.g. a C++11 leading sync SC 
store binding for PPC. I would argue we do want this stronger property 
as our current generated code occasionally depends on that.

6) We can decide upon what properties we think are important 
independently of which direction C++ decides to go. They can e.g. decide 
to buy weaker consistency for a failing CAS in the name of micro 
optimization, and we can elect not to buy that bag of problems.

Hope you bought my car sales man pitch.

> Of course I have a motive for digging into this: I'm the lead of the
> AArch64-port project.  That processor (uniquely?) has real support for
> seq.cst, and it'd be nice to use it without throwing a full fence at
> every volatile store.

I understand your motivation. At least I think we both want some kind of 
SC semantics in hotspot in one shape or another. :)

Thanks,
/Erik

[1] "Clarifying and Compiling C/C++ Concurrency: from C++11 to POWER", 
M. Betty et. al., POPL'12
[2] "Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 
Trailing-Sync Compiler Mappings", Y. A. Manerka, 2016
[3] https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
[4] "Repairing sequential consistency in C/C++ 11", O. Lahav, PLDI'17