(XS) RFR: 8046210: Missing memory barrier when reading init_lock

Mon Sep 1 11:17:22 UTC 2014

Hi Martin,

On 1/09/2014 8:03 PM, Doerr, Martin wrote:
> Hi David,
>
> thanks for posting this bug.
>
> PPC64 processors indeed do such kind of load reordering across predicted branches.
>
> load x, compare and branch depending on x, is sufficient to order this load with respect to succeeding stores,
> but succeeding loads may get executed speculatively and reorder with the load x.
>
> The cheapest way to order the load of x with respect to succeeding loads on PPC64 is:
> load x, compare and trap or branch depending on x, then issue an isync instruction which prevents speculative loads.
> This is what we implemented in OrderAccess::load_acquire (orderAccess_linux_ppc.inline.hpp).

Yes we know this pattern too. In this case putting a load_acquire at the 
point of the actual load is a bit more awkward, hence the separation. Do 
you have concerns about issuing a separate loadload() barrier here?

Thanks,
David

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes
> Sent: Montag, 1. September 2014 06:49
> To: hotspot-runtime-dev at openjdk.java.net
> Subject: (XS) RFR: 8046210: Missing memory barrier when reading init_lock
>
> The bug for this is marked confidential so I won't bother with the link.
>
> The init_lock is used to guard the initialization of class. It is
> created by the thread that will do the initialization and cleared (so it
> can be gc'd) once initialization is complete. Concurrent attempts to
> initialize the class will block on the init_lock, then after
> initialization is complete that will be reflected in the class's state
> and no thread will attempt to use the init_lock. Because there is an
> inherent race with updating the state such that the init_lock is not
> needed, and clearing the init_lock field, this has to be done under a
> strong memory barrier - which it is eg:
>
>     this_k->set_init_state (fully_initialized);
>     this_k->fence_and_clear_init_lock();	
>
> where we have:
>
> void InstanceKlass::fence_and_clear_init_lock() {
>      // make sure previous stores are all done, notably the init_state.
>      OrderAccess::storestore();
>      java_lang_Class::set_init_lock(java_mirror(), NULL);
>      assert(!is_not_initialized(), "class must be initialized now");
> }
>
> However we overlooked a potential problem with the code that reads the
> init_lock and the initialization state. Such code typically has this form:
>
>    oop init_lock = this_k->init_lock();
>    ObjectLocker ol(init_lock, THREAD, init_lock != NULL);
>    <read initialization state>
>    if (initialized) return;
>    else do _initialization();
>
> If this is called when initialization is already complete then init_lock
> will be NULL and so the ObjectLocker in fact is a noop, and we then read
> the initialization state, see it was initialized and return as there is
> nothing to do.
>
> What this overlooked was that the compiler and/or hardware might reorder
> the loads of the init_lock and the state field, such that we read a
> state that indicates initialization is not complete, but we then read
> the init_lock as NULL because the initialization has now in fact
> completed. This can lead to different failure modes but most typically a
> crash because we will call waitUninterruptibly() on the init_lock object
> (which is in fact NULL). (Given the 'distance' between the two loads
> this was surprising, but we were seeing such crashes on
> relaxed-memory-ordering hardware.)
>
> The fix is much simpler than this explanation :) simply add a loadload()
> barrier between the reading of init_lock and the reading of state. As
> this can happen in a few places it is internalized into the int_lock()
> accessor (just as the fence is internalized into
> fence_and_clear_init_lock().
>
> webrev: http://cr.openjdk.java.net/~dholmes/8046210/webrev/
>
> Dan Daugherty observed that as it is only the non-NULL value of
> init_lock that can potentially lead to a problem, the barrier could be
> conditional on that ie:
>
> if (init_lock == NULL)
>     OrderAccess::loadload();
>
> I haven't done that because my suspicion is that the presence of the
> loadload(), even in a conditional branch, will actually effect the same
> barrier. For architectures where loadload() is simply a compiler-barrier
> (eg x86) I suspect the presence of the condition will prevent the
> compiler from doing a reorder as if the branch to the loadload() is
> followed then any optimistic load of the initialization state would have
> to be thrown away and a second load issued. Similar logic holds for the
> hardware case - will the hardware issue reordered loads based on branch
> prediction only to have to reissue the load if the prediction was wrong?
> I suspect not, but it is speculation on my part. I'd be interested to
> hear from any of our optimization experts on that count. It may also be
> that (for x86 at least) the branch logic would be more expensive than
> unconditionally doing the atomic store to a stack variable?
>
> Thanks,
> David
>