(XS) RFR: 8046210: Missing memory barrier when reading init_lock

Tue Sep 2 13:06:38 UTC 2014

 > webrev: http://cr.openjdk.java.net/~dholmes/8046210/webrev/

src/share/vm/oops/instanceKlass.cpp
     No comments.

Thumbs up!

Dan

On 9/1/14 2:14 PM, David Holmes wrote:
> It was very remiss of me not to mention that Bill Pittore did all the 
> investigation that resulted in discovering this missing barrier.
>
> David
>
> On 1/09/2014 2:48 PM, David Holmes wrote:
>> The bug for this is marked confidential so I won't bother with the link.
>>
>> The init_lock is used to guard the initialization of class. It is
>> created by the thread that will do the initialization and cleared (so it
>> can be gc'd) once initialization is complete. Concurrent attempts to
>> initialize the class will block on the init_lock, then after
>> initialization is complete that will be reflected in the class's state
>> and no thread will attempt to use the init_lock. Because there is an
>> inherent race with updating the state such that the init_lock is not
>> needed, and clearing the init_lock field, this has to be done under a
>> strong memory barrier - which it is eg:
>>
>>    this_k->set_init_state (fully_initialized);
>>    this_k->fence_and_clear_init_lock();
>>
>> where we have:
>>
>> void InstanceKlass::fence_and_clear_init_lock() {
>>     // make sure previous stores are all done, notably the init_state.
>>     OrderAccess::storestore();
>>     java_lang_Class::set_init_lock(java_mirror(), NULL);
>>     assert(!is_not_initialized(), "class must be initialized now");
>> }
>>
>> However we overlooked a potential problem with the code that reads the
>> init_lock and the initialization state. Such code typically has this 
>> form:
>>
>>   oop init_lock = this_k->init_lock();
>>   ObjectLocker ol(init_lock, THREAD, init_lock != NULL);
>>   <read initialization state>
>>   if (initialized) return;
>>   else do _initialization();
>>
>> If this is called when initialization is already complete then init_lock
>> will be NULL and so the ObjectLocker in fact is a noop, and we then read
>> the initialization state, see it was initialized and return as there is
>> nothing to do.
>>
>> What this overlooked was that the compiler and/or hardware might reorder
>> the loads of the init_lock and the state field, such that we read a
>> state that indicates initialization is not complete, but we then read
>> the init_lock as NULL because the initialization has now in fact
>> completed. This can lead to different failure modes but most typically a
>> crash because we will call waitUninterruptibly() on the init_lock object
>> (which is in fact NULL). (Given the 'distance' between the two loads
>> this was surprising, but we were seeing such crashes on
>> relaxed-memory-ordering hardware.)
>>
>> The fix is much simpler than this explanation :) simply add a loadload()
>> barrier between the reading of init_lock and the reading of state. As
>> this can happen in a few places it is internalized into the int_lock()
>> accessor (just as the fence is internalized into
>> fence_and_clear_init_lock().
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8046210/webrev/
>>
>> Dan Daugherty observed that as it is only the non-NULL value of
>> init_lock that can potentially lead to a problem, the barrier could be
>> conditional on that ie:
>>
>> if (init_lock == NULL)
>>    OrderAccess::loadload();
>>
>> I haven't done that because my suspicion is that the presence of the
>> loadload(), even in a conditional branch, will actually effect the same
>> barrier. For architectures where loadload() is simply a compiler-barrier
>> (eg x86) I suspect the presence of the condition will prevent the
>> compiler from doing a reorder as if the branch to the loadload() is
>> followed then any optimistic load of the initialization state would have
>> to be thrown away and a second load issued. Similar logic holds for the
>> hardware case - will the hardware issue reordered loads based on branch
>> prediction only to have to reissue the load if the prediction was wrong?
>> I suspect not, but it is speculation on my part. I'd be interested to
>> hear from any of our optimization experts on that count. It may also be
>> that (for x86 at least) the branch logic would be more expensive than
>> unconditionally doing the atomic store to a stack variable?
>>
>> Thanks,
>> David