(XS) RFR: 8046210: Missing memory barrier when reading init_lock

Wed Sep 3 01:30:17 UTC 2014

Thanks Dan!

Pushing to hs-rt.

David

On 2/09/2014 11:06 PM, Daniel D. Daugherty wrote:
>  > webrev: http://cr.openjdk.java.net/~dholmes/8046210/webrev/
>
> src/share/vm/oops/instanceKlass.cpp
>      No comments.
>
> Thumbs up!
>
> Dan
>
>
> On 9/1/14 2:14 PM, David Holmes wrote:
>> It was very remiss of me not to mention that Bill Pittore did all the
>> investigation that resulted in discovering this missing barrier.
>>
>> David
>>
>> On 1/09/2014 2:48 PM, David Holmes wrote:
>>> The bug for this is marked confidential so I won't bother with the link.
>>>
>>> The init_lock is used to guard the initialization of class. It is
>>> created by the thread that will do the initialization and cleared (so it
>>> can be gc'd) once initialization is complete. Concurrent attempts to
>>> initialize the class will block on the init_lock, then after
>>> initialization is complete that will be reflected in the class's state
>>> and no thread will attempt to use the init_lock. Because there is an
>>> inherent race with updating the state such that the init_lock is not
>>> needed, and clearing the init_lock field, this has to be done under a
>>> strong memory barrier - which it is eg:
>>>
>>>    this_k->set_init_state (fully_initialized);
>>>    this_k->fence_and_clear_init_lock();
>>>
>>> where we have:
>>>
>>> void InstanceKlass::fence_and_clear_init_lock() {
>>>     // make sure previous stores are all done, notably the init_state.
>>>     OrderAccess::storestore();
>>>     java_lang_Class::set_init_lock(java_mirror(), NULL);
>>>     assert(!is_not_initialized(), "class must be initialized now");
>>> }
>>>
>>> However we overlooked a potential problem with the code that reads the
>>> init_lock and the initialization state. Such code typically has this
>>> form:
>>>
>>>   oop init_lock = this_k->init_lock();
>>>   ObjectLocker ol(init_lock, THREAD, init_lock != NULL);
>>>   <read initialization state>
>>>   if (initialized) return;
>>>   else do _initialization();
>>>
>>> If this is called when initialization is already complete then init_lock
>>> will be NULL and so the ObjectLocker in fact is a noop, and we then read
>>> the initialization state, see it was initialized and return as there is
>>> nothing to do.
>>>
>>> What this overlooked was that the compiler and/or hardware might reorder
>>> the loads of the init_lock and the state field, such that we read a
>>> state that indicates initialization is not complete, but we then read
>>> the init_lock as NULL because the initialization has now in fact
>>> completed. This can lead to different failure modes but most typically a
>>> crash because we will call waitUninterruptibly() on the init_lock object
>>> (which is in fact NULL). (Given the 'distance' between the two loads
>>> this was surprising, but we were seeing such crashes on
>>> relaxed-memory-ordering hardware.)
>>>
>>> The fix is much simpler than this explanation :) simply add a loadload()
>>> barrier between the reading of init_lock and the reading of state. As
>>> this can happen in a few places it is internalized into the int_lock()
>>> accessor (just as the fence is internalized into
>>> fence_and_clear_init_lock().
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8046210/webrev/
>>>
>>> Dan Daugherty observed that as it is only the non-NULL value of
>>> init_lock that can potentially lead to a problem, the barrier could be
>>> conditional on that ie:
>>>
>>> if (init_lock == NULL)
>>>    OrderAccess::loadload();
>>>
>>> I haven't done that because my suspicion is that the presence of the
>>> loadload(), even in a conditional branch, will actually effect the same
>>> barrier. For architectures where loadload() is simply a compiler-barrier
>>> (eg x86) I suspect the presence of the condition will prevent the
>>> compiler from doing a reorder as if the branch to the loadload() is
>>> followed then any optimistic load of the initialization state would have
>>> to be thrown away and a second load issued. Similar logic holds for the
>>> hardware case - will the hardware issue reordered loads based on branch
>>> prediction only to have to reissue the load if the prediction was wrong?
>>> I suspect not, but it is speculation on my part. I'd be interested to
>>> hear from any of our optimization experts on that count. It may also be
>>> that (for x86 at least) the branch logic would be more expensive than
>>> unconditionally doing the atomic store to a stack variable?
>>>
>>> Thanks,
>>> David
>