RFR (S) 8182397: Race in field updates when creating ArrayKlasses can lead to crash
Erik Österlund
erik.osterlund at oracle.com
Tue Jul 25 08:29:15 UTC 2017
Hi Andrew,
On 2017-07-24 19:40, Andrew Haley wrote:
> On 24/07/17 17:44, Erik Osterlund wrote:
>> Sorry for jumping in to the conversation a bit late.
>> Since David Holmes is on vacation, I have to take his place in
>> questioning the elision of acquire on the reader side being
>> okay. Because I know he would if he was here.
>>
>> In hotspot we almost *never* elide the acquire on the reader side,
>> relying on dependent loads.
> I'm sure we do in plenty of places, it's just undocumented or maybe
> even unknown to the original author.
That should be purged from the code. With fire.
> We certainly do this a lot in
> the code that the JIT compilers generate, as you note below.
That is more okay when we are in control of the code generation.
Consume-like semantics have the following levels of support:
1) No required support: Machines with strong enough memory ordering to
guarantee all normal loads and stores have acquire/release semantics
require no support.
2) Compiler support: Some machines can rely on dependent loads, but what
looks like a dependent load in the C++ source code might not translate
to dependent loads in the compiled machine code. In order to enforce
that, the data dependency chains from the source code must be retained
in the generated machine code. The compiler needs to be explicitly told
that it needs to keep them and not perform some otherwise perfectly
valid compiler optimizations that would break the dependencies from the
source code in the generated machine code.
That is what consume semantics do in C++11 atomics. They tell the
compiler to keep the dependency chain from the source code in the
generated machine code. Without corresponding compiler support, we do
not have any guarantees that the generated machine code will retain the
data dependencies. It probably will in this case, but "probably" is not
a particularly attractive guarantee, and not something I would ever rely
upon. Without a contract with the compiler that it will retain the data
dependency like C++11 consume semantics, we can not rely on such data
dependencies, and must therefore use acquire instead until we start
supporting such semantics in OrderAccess. And we certainly do not
support it today.
3) Hardware reordering: Some machines like the alpha also requires
fencing even on dependent loads, and should use that instead.
As for the JIT-compiled code - it is fine to use consume-like semantics
if we control the code generation and know that the generated machine
code will respect the data dependencies that we wanted it to. But since
we do not control the C++ code generation, and do not have access to
tools like C++11 consume semantics, we can not just elide acquire and
hope for the best. Instead we have to use acquire until OrderAccess
starts supporting consume (if it ever will).
What we perhaps could do is to add shared code to OrderAccess for
load_consume, implement them as load_acquire, so that we at least
remember where in the code we could have gotten away with dependent
loads and exploit it, and later on plug it into C++11 consume so that
C++ compilers that support such contracts can utilize it on a platform
by platform basis.
>> This would be an exception to that rule, and I do not see why this
>> is warranted. In order to utilize the dependent load properties in
>> shared code, we need memory ordering semantics for this. That memory
>> ordering semantics is called consume in C++11, and has arguably
>> turned out to be a terrible idea. Hotspot does not have a consume
>> memory ordering like C++11, that allows utilizing dependent loads,
>> and I do not think we intend to introduce it any time soon unless
>> there are very compelling reasons to do so. In its absence, we
>> should use load_acquire instead, and not micro optimize it away.
>>
>> AFAIK, compiler implementers are not happy with C++11 consume memory
>> ordering semantics and implement it with acquire instead as there
>> are too many problems with consume.
> I strongly disagree. Consume is fine now: I think you're dealing with
> out-of-date information. It took a while to figure out how to specify
> it, that's all.
Perhaps. Last time I looked, compilers would conservatively emit an
acquire when asked for consume. Perhaps that has been improved upon
since the specification was improved. Nevertheless, I doubt "just treat
it like a normal load" was the answer to that, which is what we are
doing here.
>> To elide acquire in an acquire release pair in shared code, very
>> strong motivation is needed. Like for example how constructors
>> publish final object references with release but relies on dependent
>> load in JIT compiled code to elide acquire, because using acquire
>> semantics on all reference loads is too expensive, so it is
>> warranted here.
> Right. So we already depend on that property, so...
Yes - where we can enforce the generated code respects the data
dependency. Where we can not enforce it, we use acquire.
>> Therefore, I am wondering if perhaps we want to add an acquire on
>> the load side anyway, so we do not have to introduce consume memory
>> ordering semantics or something similar in the shared code
>> base. Using a normal load without any memory ordering semantics in
>> shared code to act like a load consume rings alarm clocks in my
>> head.
> I don't think it's that much of a big deal, really. Putting in an
> acquire fence every time we read a java mirror strikes me as madness
> when consume is all we need on any target. Even Power PC doesn't need
> a fence instruction there. At least make sure that targets can turn
> that off: we should not pay the price for something we don't need.
Conversely, those targets would pay the price of compromised correctness
- a price higher than performance regressions. Performance can only be
optimized within the legal bounds of correctness. Therefore we can only
elide fences if we have well defined memory ordering semantics for doing
that and somehow control that the generated code adheres to those
semantics and respects them. We simply do not have consume semantics
today, and therefore to ensure correctness, we must use acquire, until
the day comes when we support acquire. Doing anything else jeopardizes
correctness. And doing that for a premature optimization seems like
madness to me. If we are going to knowingly compromise the correctness
of the synchronization, I want to at least know that this would cause a
significant enough regression to motivate doing anything about it and
finding ways to safely elide it, rather than assuming by default that
eliding acquire (without anything well defined like compiler consume
support to take its place) in the name of performance is the right thing
to do, without measuring what that performance impact would be. And that
was a surprisingly long sentence.
Thanks,
/Erik
More information about the hotspot-dev
mailing list