RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException
Erik Österlund
erik.osterlund at oracle.com
Tue May 30 10:57:52 UTC 2017
On 2017-05-30 10:50, Andrew Haley wrote:
> On 29/05/17 13:20, Erik Österlund wrote:
>
>> As a conclusion, by using volatile in addition to OrderAccess you
>> rely on standardized compiler semantics (at least for
>> volatile-to-volatile re-orderings and re-loading, but not for
>> volatile-to-nonvolatile, but that's another can of worms), and
>> regrettably if you rely on OrderAccess memory model doing what it
>> says it will do, then it should indeed work without volatile, but to
>> make that work, OrderAccess relies on non-standardized
>> compiler-specific barriers. In practice it should work well on all
>> our supported compilers without volatile. And if it didn't, it would
>> indeed be a bug in OrderAccess that needs to be solved in
>> OrderAccess.
> Right. And, target by target, we can rework OrderAccess to use real
> C++11 atomics, and everything will be better. Eventually.
I do not completely disagree, but I see drawbacks with that too. I am
not convinced C++11 is a silver bullet.
Note that we lose some explicit control that might end up biting us. And
when it does, it will be even harder to detect as we have sold ourselves
to the C++11 atomic silver bullet, abstracting away the generated code.
For example, C++11 atomic accesses were designed to play nicely with
other C++11 atomic accesses. Both the load-side and store-side have to
look in very specific ways for e.g. the seq_cst semantics to hold. For
example depending if you want seq_cst to have IRIW constraints or not,
some PPC compiler could choose to have the sync instruction on either
the load side or the store side. Since all seq_cst accesses are
controlled by C++11 and go through their compiler, they can make that
choice as the accesses stay inside of their "ABI". But the choice needs
to be consistent with the choice we make in the JVM and our hand crafted
assembly. That is, our hand crafted assembly code has to go by the same
"ABI". And we can no longer guarantee it does as we have lost control
over what instructions are generated.
One concrete example that comes to mind is the JNIFastGetField
optimization on ARMv7.
The memory model of ARMv7 does not respect causality between loads and
stores. Therefore, in theory (and maybe in practice), problems can arise
when three threads are involved in a synchronization dance where
consistent causality chains are assumed.
In the JNIFastGetField optimization we do the following with hand coded
assembly:
1) load safepoint counter (written by VM thread)
2) speculatively load primitive value from object (possibly clobbered by
a GC thread)
3) load safepoint counter again (written by VM thread) and check it did
not change
These loads are all normal loads in hand coded assembly.
Now for this synchronization to work, it is assumed that if the store to
the safepoint counter observed at 1 happens-before the store observed by
the speculatively read primitive value from 2). Due to the lack of
causality in the memory model, this is explicitly not guaranteed to hold
with normal loads and stores on ARMv7, and hence unless we had proper
synchronization in the runtime, we could observe clobbered values from
these optimized JNI getters and think they are okay. But since our
OrderAccess::fence translates to dmb sy specifically (which is
conservative), the store will bubble up to the top level of the
hierarchical memory model of ARMv7, and therefore we can break the
pathological causality chain issue in the JVM by issuing
OrderAccess::fence when storing the safepoint counter values. That way,
our hand crafted assembly will work with normal loads in the fast path.
If OrderAccess::fence translated to anything else than dmb sy, this
would break. So if we went with the proposed C++11 mappings from
https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html then e.g. a
seq_cst store on ARMv7 would translate into dmb ish; str; dmb ish, and
that would not suffice to break the causality chain.
Of course, it might be that the OS layer saved us from the above
pathology anyway, either intentionally or unintentionally, but that is
besides the point I am trying to make. The point is that the hand coded
assembly and other dynamically generated code has to emit accesses that
are compatible with the machine code generated by OrderAccess in the
runtime. And when we give that control away to C++11 and abstract away
the generated machine code, things might go horribly wrong in the most
unexpected and obscure ways that I think would be a nightmare to debug.
Having said that, I am not convinced C++11 is not a good idea either. I
would just like to balance out the view that C++11 is a synchronization
silver bullet for the JVM that is simply a superior solution without any
pitfalls and that doing anything else is wrong. There are things to be
considered there as well, like the extent of possible ABI incompatibilities.
> It's important that we do so because racy accesses are undefined
> behaviour in C++11. (And, arguably, before that, but I'm not going to
> go there.)
What paragraph are we referring to here that would break OrderAccess in
C++11?
>> Personally though, I am a helmet-on-synchronization kind of person,
>> so I would take precaution anyway and use volatile whenever
>> possible, because 1) it makes the code more readable, and 2) it
>> provides one extra layer of safety that is more standardized. It
>> seems that over the years it has happened multiple times that we
>> assumed OrderAccess is bullet proof, and then realized that it
>> wasn't and observed a crash that would never have happened if the
>> code was written in a helmet-on-synchronization way. At least that's
>> how I feel about it.
> I have no problem with that. What I *do* have a problem with is the
> use of volatile to fix bugs that really need to be corrected with
> proper barriers.
I think you misunderstood me here. I did not propose to use volatile so
we don't have to fix bugs in OrderAccess. Conversely, I said if we find
such issues, we should definitely fix them in OrderAccess. But despite
that, I personally like the pragmatic safety approach, and would use
volatile in my lock-free code anyway to make it a) more readable, and b)
provide an extra level of safety against our increasingly aggressive
compilers. It's like wearing a helmet when biking. You don't expect to
fall and should not fall, but why take risks if you don't have to and
there is an easy way of preventing disaster if that happens. At least
that's how I think about it myself.
>> Now one might argue that by using C++11 atomics that are
>> standardized, all these problems would go away as we would rely in
>> standardized primitives and then just trust the compiler.
> And I absolutely do argue that. In fact, it is the only correct way
> to go with C++11 compilers. IMO.
Not entirely convinced that statement is true as I think I mentioned
previously.
>> But then there could arise problems when the C++ compiler decides to
>> be less conservative than we want, e.g. by not doing fence in
>> sequentially consistent loads to optimize for non-multiple copy
>> atomic CPUs arguing that IRIW issues that violate sequential
>> consistency are non-issues in practice.
> A C++ compiler will not decide to do that. C++ compiler authors know
> well enough what sequential consistency means. Besides, if there is
> any idiom in the JVM that actually requires IRIW we should remove it
> as soon as possible.
With the risk of going slightly off topic... C++ compiler authors have
indeed done that in the past. And I have a huge problem with this. I
think the exposed model semantics need to be respected. If the model
says seq_cst, then the generated code should be seq_cst and not "almost"
seq_cst. You can't expect users of a memory model to have to know that
it intentionally (rather than accidentally) violates the guarantees
because it was considered close enough and that nobody should be able to
observe the difference in practice. (don't get me started on that one)
The issue is not whether an algorithm depends on IRIW or not. The issue
is that we have to explicitly reason about IRIW to prove that it works.
The lack of IRIW violates seq_cst and by extension linearizaiton points
that rely in seq_cst, and by extension algorithms that rely on
linearization points. By breaking the very building blocks that were
used to reason about algorithms and their correctness, we rely on chance
for it to work. The algorithm may or may not work. It probably does work
without IRIW constraints in the vast majority of cases. But we have to
explicitly reason about that expanded state machine of possible races
caused by IRIW issues to actually know that it works rather than leaving
it to chance. Reasoning about this extended state machine can take a lot
of work and puts the bar unreasonably high for writing synchronized code
in my opinion. And I think the alternative of leaving it to chance
(albeit with good odds) seems like an unfortunate choice.
>> That makes those loads "almost" sequentially consistent, which might
>> be good enough. But it feels good to have a choice here to be more
>> conservative. To have the synchronization helmet on.
> I have no real problem with that. Using volatile has the problem,
> from my point of view, that it might conceal bugs that would be
> revealed on a weakly-ordered machine that you or I then have to fix,
> but I can live with it.
I do not see how using volatile has anything to do with weakly ordered
machines. We use it where it is compiler reorderings specifically that
need to be prevented.
If it is not just a compiler reordering that needed to be prevented,
then of course the use of volatile is incorrect and a bug.
Either way, relying on C++11 atomics might also conceal bugs that would
be revealed on a weakly-ordered machine due to conflicting ABIs between
the statically generated C++ code and the dynamically generated code, as
previously mentioned.
Thanks,
/Erik
More information about the jdk10-dev
mailing list