Request for review: 6922246: Deadlock during intensive class loading in parallel

Wed Oct 20 14:36:42 PDT 2010


Paul Hohensee wrote:
>  Atomic::cmpxchg() guarantees a membar, even if the actual hw cas
> instruction (assuming there is one: ppc used 
> load-locked/store-conditional)
> doesn't guarantee it.  So use Atomic::cmpxchg() with confidence. :)
>
Great! thanks.
Coleen
> Paul
>
> On 10/20/10 4:43 PM, Tom Rodriguez wrote:
>> On Oct 20, 2010, at 7:24 AM, Coleen Phillimore wrote:
>>
>>> Thank you for the discussion.   I am reimplementing this with a CAS 
>>> as Tom suggested.  I guess it's not such a simple fix, anyway, see 
>>> below:
>>>
>>> David Holmes wrote:
>>>> Igor Veresov said the following on 10/20/10 18:31:
>>>>> On 10/20/10 12:16 AM, David Holmes wrote:
>>>>>> Igor Veresov said the following on 10/20/10 16:52:
>>>>>>> There's no question a store-store barrier is needed after 291 for
>>>>>>> non-TSO if we were to make it lock-free (with the current code 
>>>>>>> monitor
>>>>>>> lock should contain such barrier anyway).
>>>>>> There's no requirement for a storestore barrier when acquiring a 
>>>>>> lock.
>>>>>> You will happen to get that on sparc and x86 if the lock acquisition
>>>>>> involves a CAS, but that is not true on other archs. In general 
>>>>>> terms it
>>>>>> is considered safe to move stores into a critical section at 
>>>>>> which time
>>>>>> it can be reordered with other stores inside the critical section.
>>>>> Yes, that's what I meant. Basically a lock should guarantee that 
>>>>> all stores before the lock precede stores in the critical section, 
>>>>> which is essentially a storestore barrier.
>>>> But my point is that locks in general don't guarantee this.
>>>>
>>> Does a CAS guarantee this?  If there isn't a guarantee, I will add 
>>> an Atomic::storestore(), plus make _method_data volatile.
>> It should.
>>
>>>>>> Right. If it were not safe/correct for more than one thread to 
>>>>>> create a
>>>>>> MDO, and have all but one thrown away, then we'd have a different
>>>>>> problem to solve. Which reminds we, if another thread does beat 
>>>>>> us to
>>>>>> storing the MDO then we need to delete the one we created.
>>>>> Yes, perhaps we can try to deallocate. Although I suppose the race 
>>>>> should occur infrequently and GC will take care of the unused copy 
>>>>> anyway.
>>>> Ah! Is this a GC'able oop we are creating? I had assumed it was a 
>>>> C/C++ object.
>>>>
>>> Right now the extra MDO would get GC'ed because nothing references 
>>> it and it's in the PermGen.  But we're moving everything out of the 
>>> PermGen. When it's not allocated in the PermGen, this extra MDO will 
>>> get deallocated when the memory region associated with it's 
>>> classloader is deallocated.  I think we should revisit this thing 
>>> though at that point because it would be leaked with the 
>>> bootclassloader.
>> Freeing won't be easy in the new world since we're using pointer 
>> bumping.  We might actually need to use a lock for this so that we 
>> don't allocate wasted space, though maybe it's rare enough that we 
>> could ignore it.
>>
>> tom
>>
>>> Thanks,
>>> Coleen
>>>> David
>>>>
>>>>> igor
>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>>> igor
>>>>>>>
>>>>>>> On 10/19/10 3:59 PM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> coleen phillimore said the following on 10/20/10 08:11:
>>>>>>>>> Summary: allocate methodData before taking out the 
>>>>>>>>> MethodData_lock
>>>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/6922246/
>>>>>>>>> bug link at http://bugs.sun.com/view_bug.do?bug_id=6922246
>>>>>>>>>
>>>>>>>>> Thanks to Ramki for the diagnosis and suggestion for the fix.
>>>>>>>> This is a variation on the unsafe double-checked-locking idiom.
>>>>>>>>
>>>>>>>> 286 if (method->method_data() == NULL) {
>>>>>>>> 291 methodDataOop method_data = oopFactory::new_methodData(method,
>>>>>>>> CHECK);
>>>>>>>> 292 MutexLocker ml(MethodData_lock, THREAD);
>>>>>>>> 295 if (method->method_data() == NULL) {
>>>>>>>> 296 method->set_method_data(method_data);
>>>>>>>>
>>>>>>>> In general terms any stores associated with the method_data 
>>>>>>>> construction
>>>>>>>> are not guaranteed to be visible by the time set_method_data is 
>>>>>>>> called.
>>>>>>>> A second thread could then see a non-null method->method_data() 
>>>>>>>> but
>>>>>>>> access uninitialized method_data values.
>>>>>>>>
>>>>>>>> A general solution would need memory barriers before 286 and 
>>>>>>>> after 291
>>>>>>>> to ensure correctness on all platforms. A simpler, though perhaps
>>>>>>>> heavier solution would be to use a MutexUnlocker around the 
>>>>>>>> allocation:
>>>>>>>>
>>>>>>>> MutexLocker ml(MethodData_lock, THREAD);
>>>>>>>> if (method->method_data() == NULL) {
>>>>>>>> methodDataOop method_data = NULL;
>>>>>>>> { // need new block for scope
>>>>>>>> MutexUnLocker ml(MethodData_lock, THREAD);
>>>>>>>> method_data = oopFactory::new_methodData(method, CHECK);
>>>>>>>> }
>>>>>>>> // someone might have beaten us to it
>>>>>>>> if (method->method_data() == NULL) {
>>>>>>>> method->set_method_data(method_data);
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> A lock-less solution as Tom proposed would also require suitable
>>>>>>>> memory-barriers.
>>>>>>>>
>>>>>>>> David