Request for review: 6922246: Deadlock during intensive class loading in parallel

Wed Oct 20 14:12:11 PDT 2010

  Atomic::cmpxchg() guarantees a membar, even if the actual hw cas
instruction (assuming there is one: ppc used load-locked/store-conditional)
doesn't guarantee it.  So use Atomic::cmpxchg() with confidence. :)

Paul

On 10/20/10 4:43 PM, Tom Rodriguez wrote:
> On Oct 20, 2010, at 7:24 AM, Coleen Phillimore wrote:
>
>> Thank you for the discussion.   I am reimplementing this with a CAS as Tom suggested.  I guess it's not such a simple fix, anyway, see below:
>>
>> David Holmes wrote:
>>> Igor Veresov said the following on 10/20/10 18:31:
>>>> On 10/20/10 12:16 AM, David Holmes wrote:
>>>>> Igor Veresov said the following on 10/20/10 16:52:
>>>>>> There's no question a store-store barrier is needed after 291 for
>>>>>> non-TSO if we were to make it lock-free (with the current code monitor
>>>>>> lock should contain such barrier anyway).
>>>>> There's no requirement for a storestore barrier when acquiring a lock.
>>>>> You will happen to get that on sparc and x86 if the lock acquisition
>>>>> involves a CAS, but that is not true on other archs. In general terms it
>>>>> is considered safe to move stores into a critical section at which time
>>>>> it can be reordered with other stores inside the critical section.
>>>> Yes, that's what I meant. Basically a lock should guarantee that all stores before the lock precede stores in the critical section, which is essentially a storestore barrier.
>>> But my point is that locks in general don't guarantee this.
>>>
>> Does a CAS guarantee this?  If there isn't a guarantee, I will add an Atomic::storestore(), plus make _method_data volatile.
> It should.
>
>>>>> Right. If it were not safe/correct for more than one thread to create a
>>>>> MDO, and have all but one thrown away, then we'd have a different
>>>>> problem to solve. Which reminds we, if another thread does beat us to
>>>>> storing the MDO then we need to delete the one we created.
>>>> Yes, perhaps we can try to deallocate. Although I suppose the race should occur infrequently and GC will take care of the unused copy anyway.
>>> Ah! Is this a GC'able oop we are creating? I had assumed it was a C/C++ object.
>>>
>> Right now the extra MDO would get GC'ed because nothing references it and it's in the PermGen.  But we're moving everything out of the PermGen. When it's not allocated in the PermGen, this extra MDO will get deallocated when the memory region associated with it's classloader is deallocated.  I think we should revisit this thing though at that point because it would be leaked with the bootclassloader.
> Freeing won't be easy in the new world since we're using pointer bumping.  We might actually need to use a lock for this so that we don't allocate wasted space, though maybe it's rare enough that we could ignore it.
>
> tom
>
>> Thanks,
>> Coleen
>>> David
>>>
>>>> igor
>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>> igor
>>>>>>
>>>>>> On 10/19/10 3:59 PM, David Holmes wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> coleen phillimore said the following on 10/20/10 08:11:
>>>>>>>> Summary: allocate methodData before taking out the MethodData_lock
>>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/6922246/
>>>>>>>> bug link at http://bugs.sun.com/view_bug.do?bug_id=6922246
>>>>>>>>
>>>>>>>> Thanks to Ramki for the diagnosis and suggestion for the fix.
>>>>>>> This is a variation on the unsafe double-checked-locking idiom.
>>>>>>>
>>>>>>> 286 if (method->method_data() == NULL) {
>>>>>>> 291 methodDataOop method_data = oopFactory::new_methodData(method,
>>>>>>> CHECK);
>>>>>>> 292 MutexLocker ml(MethodData_lock, THREAD);
>>>>>>> 295 if (method->method_data() == NULL) {
>>>>>>> 296 method->set_method_data(method_data);
>>>>>>>
>>>>>>> In general terms any stores associated with the method_data construction
>>>>>>> are not guaranteed to be visible by the time set_method_data is called.
>>>>>>> A second thread could then see a non-null method->method_data() but
>>>>>>> access uninitialized method_data values.
>>>>>>>
>>>>>>> A general solution would need memory barriers before 286 and after 291
>>>>>>> to ensure correctness on all platforms. A simpler, though perhaps
>>>>>>> heavier solution would be to use a MutexUnlocker around the allocation:
>>>>>>>
>>>>>>> MutexLocker ml(MethodData_lock, THREAD);
>>>>>>> if (method->method_data() == NULL) {
>>>>>>> methodDataOop method_data = NULL;
>>>>>>> { // need new block for scope
>>>>>>> MutexUnLocker ml(MethodData_lock, THREAD);
>>>>>>> method_data = oopFactory::new_methodData(method, CHECK);
>>>>>>> }
>>>>>>> // someone might have beaten us to it
>>>>>>> if (method->method_data() == NULL) {
>>>>>>> method->set_method_data(method_data);
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> A lock-less solution as Tom proposed would also require suitable
>>>>>>> memory-barriers.
>>>>>>>
>>>>>>> David