RFR (S) 8182397: Race in field updates when creating ArrayKlasses can lead to crash

Tue Jul 25 08:29:15 UTC 2017

Hi Andrew,

On 2017-07-24 19:40, Andrew Haley wrote:
> On 24/07/17 17:44, Erik Osterlund wrote:
>> Sorry for jumping in to the conversation a bit late.
>> Since David Holmes is on vacation, I have to take his place in
>> questioning the elision of acquire on the reader side being
>> okay. Because I know he would if he was here.
>>
>> In hotspot we almost *never* elide the acquire on the reader side,
>> relying on dependent loads.
> I'm sure we do in plenty of places, it's just undocumented or maybe
> even unknown to the original author.

That should be purged from the code. With fire.

> We certainly do this a lot in
> the code that the JIT compilers generate, as you note below.

That is more okay when we are in control of the code generation.
Consume-like semantics have the following levels of support:

1) No required support: Machines with strong enough memory ordering to 
guarantee all normal loads and stores have acquire/release semantics 
require no support.

2) Compiler support: Some machines can rely on dependent loads, but what 
looks like a dependent load in the C++ source code might not translate 
to dependent loads in the compiled machine code. In order to enforce 
that, the data dependency chains from the source code must be retained 
in the generated machine code. The compiler needs to be explicitly told 
that it needs to keep them and not perform some otherwise perfectly 
valid compiler optimizations that would break the dependencies from the 
source code in the generated machine code.
That is what consume semantics do in C++11 atomics. They tell the 
compiler to keep the dependency chain from the source code in the 
generated machine code. Without corresponding compiler support, we do 
not have any guarantees that the generated machine code will retain the 
data dependencies. It probably will in this case, but "probably" is not 
a particularly attractive guarantee, and not something I would ever rely 
upon. Without a contract with the compiler that it will retain the data 
dependency like C++11 consume semantics, we can not rely on such data 
dependencies, and must therefore use acquire instead until we start 
supporting such semantics in OrderAccess. And we certainly do not 
support it today.

3) Hardware reordering: Some machines like the alpha also requires 
fencing even on dependent loads, and should use that instead.

As for the JIT-compiled code - it is fine to use consume-like semantics 
if we control the code generation and know that the generated machine 
code will respect the data dependencies that we wanted it to. But since 
we do not control the C++ code generation, and do not have access to 
tools like C++11 consume semantics, we can not just elide acquire and 
hope for the best. Instead we have to use acquire until OrderAccess 
starts supporting consume (if it ever will).

What we perhaps could do is to add shared code to OrderAccess for 
load_consume, implement them as load_acquire, so that we at least 
remember where in the code we could have gotten away with dependent 
loads and exploit it, and later on plug it into C++11 consume so that 
C++ compilers that support such contracts can utilize it on a platform 
by platform basis.

>> This would be an exception to that rule, and I do not see why this
>> is warranted. In order to utilize the dependent load properties in
>> shared code, we need memory ordering semantics for this. That memory
>> ordering semantics is called consume in C++11, and has arguably
>> turned out to be a terrible idea. Hotspot does not have a consume
>> memory ordering like C++11, that allows utilizing dependent loads,
>> and I do not think we intend to introduce it any time soon unless
>> there are very compelling reasons to do so. In its absence, we
>> should use load_acquire instead, and not micro optimize it away.
>>
>> AFAIK, compiler implementers are not happy with C++11 consume memory
>> ordering semantics and implement it with acquire instead as there
>> are too many problems with consume.
> I strongly disagree.  Consume is fine now: I think you're dealing with
> out-of-date information.  It took a while to figure out how to specify
> it, that's all.

Perhaps. Last time I looked, compilers would conservatively emit an 
acquire when asked for consume. Perhaps that has been improved upon 
since the specification was improved. Nevertheless, I doubt "just treat 
it like a normal load" was the answer to that, which is what we are 
doing here.

>> To elide acquire in an acquire release pair in shared code, very
>> strong motivation is needed. Like for example how constructors
>> publish final object references with release but relies on dependent
>> load in JIT compiled code to elide acquire, because using acquire
>> semantics on all reference loads is too expensive, so it is
>> warranted here.
> Right.  So we already depend on that property, so...

Yes - where we can enforce the generated code respects the data 
dependency. Where we can not enforce it, we use acquire.

>> Therefore, I am wondering if perhaps we want to add an acquire on
>> the load side anyway, so we do not have to introduce consume memory
>> ordering semantics or something similar in the shared code
>> base. Using a normal load without any memory ordering semantics in
>> shared code to act like a load consume rings alarm clocks in my
>> head.
> I don't think it's that much of a big deal, really.  Putting in an
> acquire fence every time we read a java mirror strikes me as madness
> when consume is all we need on any target.  Even Power PC doesn't need
> a fence instruction there.  At least make sure that targets can turn
> that off: we should not pay the price for something we don't need.

Conversely, those targets would pay the price of compromised correctness 
- a price higher than performance regressions. Performance can only be 
optimized within the legal bounds of correctness. Therefore we can only 
elide fences if we have well defined memory ordering semantics for doing 
that and somehow control that the generated code adheres to those 
semantics and respects them. We simply do not have consume semantics 
today, and therefore to ensure correctness, we must use acquire, until 
the day comes when we support acquire. Doing anything else jeopardizes 
correctness. And doing that for a premature optimization seems like 
madness to me. If we are going to knowingly compromise the correctness 
of the synchronization, I want to at least know that this would cause a 
significant enough regression to motivate doing anything about it and 
finding ways to safely elide it, rather than assuming by default that 
eliding acquire (without anything well defined like compiler consume 
support to take its place) in the name of performance is the right thing 
to do, without measuring what that performance impact would be. And that 
was a surprisingly long sentence.

Thanks,
/Erik