RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately
David Holmes
david.holmes at oracle.com
Thu May 15 01:48:25 UTC 2014
On 14/05/2014 11:18 PM, Aleksej Efimov wrote:
> David, Vitaly,
>
> I totally agree with Vitaly's explanation (Vitaly - thank you for that)
> and code in shmemBase.c (the usage of enterMutex() function for
> sending/receiving bytes through shared memory connection) illustrates on
> how the connection shutdown event is used as a "cancellation token".
Thanks for clarifying that.
So if we were to encounter an abandoned mutex the code would presently
have acquired the mutex but return an error, thus preventing a
subsequent release, and preventing other threads from acquiring (but
allowing current thread to recurisevely acquire. So this could both hang
and cause data corruption.
The new code will still return an error but release the mutex. So no
more hangs (other than by conditions caused by data corruption) but more
opportunity for data corruption.
Obviously it depends on exactly what happens in the critical sections
guarded by this mutex, but in general I don't agree with this rationale
for making the change:
204 /* If the mutex is abandoned we want to return an error
205 * and also release the mutex so that we don't cause
206 * other threads to be blocked. If a mutex was abandoned
207 * we are in scary state. Data may be corrupted or inconsistent
208 * but it is still better to let other threads run (and possibly
209 * crash) than having them blocked (on the premise that a crash
210 * is always easier to debug than a hang).
Considering something has to have gone drastically wrong for the mutex
to become abandoned, I'm more inclined to consider this a fatal error
and abort.
But I'll let the serviceability folk chime in here.
Thanks,
David
> Thank you,
> -Aleksej
>
>
> On 05/14/2014 01:05 PM, David Holmes wrote:
>> On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:
>>> In windows, you acquire a mutex by waiting on it using one of the wait
>>> functions, one of them employed in the code in question. If
>>> WaitForMultipleObjects succeeds and returns the index of the mutex,
>>> current thread has ownership now.
>>
>> Yes I understand the basic mechanics :)
>>
>>> It's also common to use multi wait functions where the event is a
>>> "cancelation token", e.g. manual reset event; this allows someone to
>>> cancel waiting on mutex acquisition and return from the wait function.
>>> Presumably that's the case here, but I'll let Aleksej confirm; just
>>> wanted to throw this out there in the meantime :).
>>
>> Ah I see - yes cancellable lock acquisition would make sense.
>>
>> Thanks,
>> David
>>
>>> Sent from my phone
>>>
>>> On May 13, 2014 6:46 PM, "David Holmes" <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>> Hi Aleksej,
>>>
>>> Thanks for the doc references regarding abandonment.
>>>
>>> Let me rephrase my question. What is this logic trying to achieve by
>>> waiting on both a mutex and an event? Do we already own the mutex
>>> when this function is called?
>>>
>>> David
>>>
>>> On 13/05/2014 11:19 PM, Aleksej Efimov wrote:
>>>
>>> David,
>>>
>>> The Windows has a different terminology for mutex objects (much
>>> differs
>>> from the POSIX one). This one link gave me some understanding of
>>> it [1].
>>>
>>> Here is the MSDN [1] description of what "abandoned mutex" is:
>>> " If a thread terminates without releasing its ownership of a
>>> mutex
>>> object, the mutex object is considered to be abandoned. A
>>> waiting thread
>>> can acquire ownership of an abandoned mutex object, but the wait
>>> function will return*WAIT_ABANDONED*to indicate that the mutex
>>> object is
>>> abandoned. An abandoned mutex object indicates that an error has
>>> occurred and that any shared resource being protected by the
>>> mutex
>>> object is in an undefined state. If the thread proceeds as
>>> though the
>>> mutex object had not been abandoned, it is no longer considered
>>> abandoned after the thread releases its ownership. This restores
>>> normal
>>> behavior if a handle to the mutex object is subsequently
>>> specified in a
>>> wait function."
>>>
>>>
>>> What does it mean to wait on mutex and ownership of the mutex
>>> object:
>>> "Any thread with a handle to a mutex object can use one of
>>> thewait
>>> functions
>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx
>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>to
>>> request ownership of the mutex object. If the mutex object is
>>> owned by
>>> another thread, the wait function blocks the requesting thread
>>> until the
>>> owning thread releases the mutex object using the*ReleaseMutex*
>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx
>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__function."
>>>
>>> How we can release mutex and wait on already owned mutex:
>>> " After a thread obtains ownership of a mutex, it can specify
>>> the same
>>> mutex in repeated calls to the wait-functions
>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx
>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>__without
>>> blocking its execution. This prevents a thread from deadlocking
>>> itself
>>> while waiting for a mutex that it already owns. To release its
>>> ownership
>>> under such circumstances, the thread must call*ReleaseMutex*
>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx
>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__once
>>> for each time that the mutex satisfied the conditions of a wait
>>> function."
>>>
>>> [1]
>>> http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms684266(v=vs.85).aspx
>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms684266(v=vs.85).aspx>
>>>
>>> -Aleksej
>>>
>>> On 05/13/2014 04:00 PM, David Holmes wrote:
>>>
>>> I don't understand this one at all. What is an "abandoned
>>> mutex"? For
>>> that matter why does the code wait on a mutex and an event?
>>> Do we
>>> already own the mutex? If so what does it mean to wait on
>>> it? If not
>>> then how can we release it?
>>>
>>> ???
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 13/05/2014 8:57 PM, Alan Bateman wrote:
>>>
>>>
>>> This is debugger's shared memory transport so cc'ing
>>> serviceability-dev
>>> as that is there this code is maintained.
>>>
>>> Is there a test case or any outline of the conditions
>>> that cause this? I
>>> think that would be useful to understand the issue
>>> further.
>>>
>>> -Alan
>>>
>>> On 13/05/2014 11:46, Aleksej Efimov wrote:
>>>
>>> Hi,
>>>
>>> Can I have a review for 8032901 bug [1] fix [2].
>>> There is a possible
>>> case when 'WaitForMultipleObjects' function can
>>> return the
>>> WAIT_ABANDONED_0 [3] error value.
>>> In such case it's better to release the mutex and
>>> return error value.
>>> This will prevent other threads to be blocked on
>>> abandoned mutex.
>>>
>>> Thank you,
>>> Aleksej
>>>
>>> [1]
>>> https://bugs.openjdk.java.net/__browse/JDK-8032901
>>> <https://bugs.openjdk.java.net/browse/JDK-8032901>
>>> [2]
>>> http://cr.openjdk.java.net/~__aefimov/8032901/9/webrev.00/
>>> <http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/>
>>> [3]
>>> http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687025(v=vs.85).aspx
>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx>
>>>
>>>
>>>
>>>
>
More information about the core-libs-dev
mailing list