(XS) RFR: 8046210: Missing memory barrier when reading init_lock
Doerr, Martin
martin.doerr at sap.com
Mon Sep 1 10:03:58 UTC 2014
Hi David,
thanks for posting this bug.
PPC64 processors indeed do such kind of load reordering across predicted branches.
load x, compare and branch depending on x, is sufficient to order this load with respect to succeeding stores,
but succeeding loads may get executed speculatively and reorder with the load x.
The cheapest way to order the load of x with respect to succeeding loads on PPC64 is:
load x, compare and trap or branch depending on x, then issue an isync instruction which prevents speculative loads.
This is what we implemented in OrderAccess::load_acquire (orderAccess_linux_ppc.inline.hpp).
Best regards,
Martin
-----Original Message-----
From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes
Sent: Montag, 1. September 2014 06:49
To: hotspot-runtime-dev at openjdk.java.net
Subject: (XS) RFR: 8046210: Missing memory barrier when reading init_lock
The bug for this is marked confidential so I won't bother with the link.
The init_lock is used to guard the initialization of class. It is
created by the thread that will do the initialization and cleared (so it
can be gc'd) once initialization is complete. Concurrent attempts to
initialize the class will block on the init_lock, then after
initialization is complete that will be reflected in the class's state
and no thread will attempt to use the init_lock. Because there is an
inherent race with updating the state such that the init_lock is not
needed, and clearing the init_lock field, this has to be done under a
strong memory barrier - which it is eg:
this_k->set_init_state (fully_initialized);
this_k->fence_and_clear_init_lock();
where we have:
void InstanceKlass::fence_and_clear_init_lock() {
// make sure previous stores are all done, notably the init_state.
OrderAccess::storestore();
java_lang_Class::set_init_lock(java_mirror(), NULL);
assert(!is_not_initialized(), "class must be initialized now");
}
However we overlooked a potential problem with the code that reads the
init_lock and the initialization state. Such code typically has this form:
oop init_lock = this_k->init_lock();
ObjectLocker ol(init_lock, THREAD, init_lock != NULL);
<read initialization state>
if (initialized) return;
else do _initialization();
If this is called when initialization is already complete then init_lock
will be NULL and so the ObjectLocker in fact is a noop, and we then read
the initialization state, see it was initialized and return as there is
nothing to do.
What this overlooked was that the compiler and/or hardware might reorder
the loads of the init_lock and the state field, such that we read a
state that indicates initialization is not complete, but we then read
the init_lock as NULL because the initialization has now in fact
completed. This can lead to different failure modes but most typically a
crash because we will call waitUninterruptibly() on the init_lock object
(which is in fact NULL). (Given the 'distance' between the two loads
this was surprising, but we were seeing such crashes on
relaxed-memory-ordering hardware.)
The fix is much simpler than this explanation :) simply add a loadload()
barrier between the reading of init_lock and the reading of state. As
this can happen in a few places it is internalized into the int_lock()
accessor (just as the fence is internalized into
fence_and_clear_init_lock().
webrev: http://cr.openjdk.java.net/~dholmes/8046210/webrev/
Dan Daugherty observed that as it is only the non-NULL value of
init_lock that can potentially lead to a problem, the barrier could be
conditional on that ie:
if (init_lock == NULL)
OrderAccess::loadload();
I haven't done that because my suspicion is that the presence of the
loadload(), even in a conditional branch, will actually effect the same
barrier. For architectures where loadload() is simply a compiler-barrier
(eg x86) I suspect the presence of the condition will prevent the
compiler from doing a reorder as if the branch to the loadload() is
followed then any optimistic load of the initialization state would have
to be thrown away and a second load issued. Similar logic holds for the
hardware case - will the hardware issue reordered loads based on branch
prediction only to have to reissue the load if the prediction was wrong?
I suspect not, but it is speculation on my part. I'd be interested to
hear from any of our optimization experts on that count. It may also be
that (for x86 at least) the branch logic would be more expensive than
unconditionally doing the atomic store to a stack variable?
Thanks,
David
More information about the hotspot-runtime-dev
mailing list