[Fwd: Re: Request for Review (XS): 6704010: Internal Error(src/share/vm/interpreter/interpreterRuntime.cpp:1106)]

Thu Dec 2 03:29:52 PST 2010

Thank you for the review and for pointing out this special problem. I
must confess I haven't thought about it initially.

But I've checked it now and found that 'SignatureHandlerLibrary_lock'
is exclusively used in  'SignatureHandlerLibrary::add()', and I don't
think that during native error handling new signature handlers will be
created.

Nevertheless I've slightly adjusted the change as suggested by Coleen
to get the assertion out of the lock scope:

http://www.progdoc.de/webrev/6704010.v2/

Volker

On Thu, Dec 2, 2010 at 3:46 AM, David Holmes <David.Holmes at oracle.com> wrote:
> Moving the assertion into a locked region may potentially cause a deadlock
> if the assertion fails and we go to fatal error handling. We'd need to check
> that the lock in question is not held in a nested fashion with another lock
> that might be needed in the fatal error path.
>
> That said there are already potential deadlocks in fatal error handling with
> some debug/tracing options, so maybe that's a better trade-off than getting
> the false failure.
>
> Just a thought.
>
> David Holmes
>
> Coleen Phillimore said the following on 12/02/10 09:08:
>>
>> Volker,
>> This is great.  This was on my bug list.  It looks good to me and I'll be
>> happy to push it for you.
>> Thanks!
>> Coleen
>>
>> Vladimir Kozlov wrote:
>>>
>>> Sending to runtime alias since it is RT bug.
>>>
>>> -------- Original Message --------
>>> Subject: Re: Request for Review (XS): 6704010: Internal Error
>>>  (src/share/vm/interpreter/interpreterRuntime.cpp:1106)
>>> Date: Wed, 1 Dec 2010 19:44:40 +0100
>>> From: Volker Simonis <volker.simonis at gmail.com>
>>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>>
>>> http://www.progdoc.de/webrev/6704010/
>>>
>>> This problem occurs very rarely and it is therefore hardly possible to
>>> reliably reproduce it. However after looking at it for a while I think
>>> I found the cause of the problem:
>>>
>>> The SignatureHandlerLibrary class uses two static GrowableArrays,
>>> '_handlers' and '_fingerprints', to store method fingerprints along
>>> with their corresponding signature handlers. But GrowableArrays are
>>> are NOT synchronized in any way if accessed from multiple threads. To
>>> avoid races it is therefore necessary to synchronize different threads
>>> which access the same GrowableArray. This is done for most of the code
>>> in 'SignatureHandlerLibrary::add()' which is protected by 'MutexLocker
>>> mu(SignatureHandlerLibrary_lock)'.
>>>
>>> However the assertion at the end of the method is outside the scope of
>>> this MutexLocker which can lead to an illegal view on the
>>> corresponding GrowableArray data structures. Here'S what may happen in
>>> detail:
>>>
>>>  -thread one enters the section protected by the MutexLocker and adds
>>> another fingerprint/handler to the growable arrays by calling
>>> 'GrowableArray::append()'. This can lead to an implicit call to
>>> 'GrowableArray::grow()' if the arrays size exhausted. In the 'grow()'
>>> method a new data array will be allocated, the contents of the old
>>> array will be copied into the new one and then (1) the storage
>>> occupied by the old array will be freed and (2) the address of the new
>>> storage will be assigned to the internal data pointer. Between (1) and
>>> (2) there's a small (like once every six month or so:) chance that the
>>> thread will be unscheduled and another thread is allocating the memory
>>> which has just been freed.
>>>
>>> - thread two is just in the assertion code and trying to query the
>>> GrowableArrays '_handlers' and '_fingerprints' by calling
>>> 'GrowableArray::find()'. If this happens right in the interval
>>> between the points (1) and (2) described above, the 'find()' method
>>> may return incorrect results which can spuriously trigger the
>>> assertion.
>>>
>>> Fixing this can be easily done by enclosing the assertion into a
>>> region which is protected by the same Mutex like the main body of
>>> 'SignatureHandlerLibrary::add()'.
>>>
>>> As I wrote, I couldn't reproduce this in the HostSpot, so no
>>> regression test. I was however able to reproduce the described
>>> scenario with a standalone GrowableArray class and a small test
>>> program consisting of one writer thread which appends a constant value
>>> to a GrowableArray and ten reader threads which are constantly trying
>>> to find that special value in the array on a dual core x86_64 Linux
>>> box.
>>>
>>> As a last side note, the problem could also be made even more
>>> unlikely, by changing the implementation of   'GrowableArray::grow()'
>>> from:
>>>
>>> template<class E> void GrowableArray<E>::grow(int j) {
>>>    int old_max = _max;
>>>    ...
>>>    if (on_C_heap() && _data != NULL) {
>>>      FreeHeap(_data);
>>>    }
>>>    _data = newData;
>>> }
>>>
>>> to:
>>>
>>> template<class E> void GrowableArray<E>::grow(int j) {
>>>    int old_max = _max;
>>>    ...
>>>    _old_data = _data;
>>>    _data = newData;
>>>    if (on_C_heap() && _old_data != NULL) {
>>>      FreeHeap(_old_data);
>>>    }
>>> }
>>>
>>> On the other hand that's still not 100% safe ('_data' is not volatile,
>>> other memory models, ..) and I think GrowableArray is definitely not
>>> intended for parallel access and therefore such a change may
>>> unnecessarily hide other problems where this precondition is violated.
>>> So maybe nulling out the '_data' array before freeing it may be a good
>>> idea in the debug build to catch such errors!
>>>
>>> Regards,
>>> Volker
>