RFR: 6498581: ThreadInterruptTest3 produces wrong output on Windows

Yumin Qi yumin.qi at oracle.com
Wed Feb 19 21:24:49 UTC 2014


Karen,

   Thanks. I will work with compiler folks on the intrinsic modification.

Yumin



On 2/19/2014 1:15 PM, Karen Kinnear wrote:
> Yumin,
>
> The code fix looks good.
>
> The question is whether to leave in the intrinsic or not since it could cause inconsistent data
> to be returned.
>
> One way to deal with this would be to modify the intrinsic to only optimize the case in
> which the thread is not interrupted, i.e.
> t = Thread.current() && (!TLS._osthread._interrupted)
>
> I think that would remove the inconsistency. Yes, it would only optimize a subset, but the
> not interrupted subset is probably the one that makes the difference.
>
> Can you work with the compiler folks to make that modification if it makes sense to you?
>
> thanks,
> Karen
>
> On Jan 17, 2014, at 2:01 AM, David Holmes wrote:
>
>> Hi Yumin,
>>
>> No need to bother the compiler folks with this (still cc'd but can be dropped on any reply). We just need to decide if we can tolerate this inconsistency between the intrinsic and non-intrinsic forms. Otherwise we may choose to disable the intrinsic on Windows - I really don't see that it can be adding much value as Thread.currentThread().isInterrupted(false) is hardly a performance critical action. But we would need benchmarking to confirm that.
>>
>> David
>>
>> On 17/01/2014 4:10 PM, Yumin Qi wrote:
>>> Now I got it:
>>>
>>> //------------------------inline_native_isInterrupted------------------
>>> // private native boolean java.lang.Thread.isInterrupted(boolean
>>> ClearInterrupted);
>>> bool LibraryCallKit::inline_native_isInterrupted() {
>>>    // Add a fast path to t.isInterrupted(clear_int):
>>>    //   (t == Thread.current() && (!TLS._osthread._interrupted ||
>>> !clear_int))
>>>    //   ? TLS._osthread._interrupted : /*slow path:*/
>>> t.isInterrupted(clear_int)
>>>    // So, in the common case that the interrupt bit is false,
>>>    // we avoid making a call into the VM.  Even if the interrupt bit
>>>    // is true, if the clear_int argument is false, we avoid the VM call.
>>>    // However, if the receiver is not currentThread, we must call the VM,
>>>    // because there must be some locking done around the operation.
>>>
>>>    // We only go to the fast case code if we pass two guards.
>>>    // Paths which do not pass are accumulated in the slow_region.
>>>
>>> this code does not work both before and after the fix then.
>>> Maybe we should not go with fast path then. CC to compiler team.
>>>
>>> Thanks
>>> Yumin
>>>
>>>
>>> On 1/16/2014 3:11 PM, David Holmes wrote:
>>>> On 17/01/2014 7:24 AM, Yumin Qi wrote:
>>>>> Dan,
>>>>>
>>>>>
>>>>> On 1/16/2014 12:48 PM, Daniel D. Daugherty wrote:
>>>>>> On 1/16/14 1:42 PM, Yumin Qi wrote:
>>>>>>> David,
>>>>>>>
>>>>>>>   Thanks for the detail analysis so far and supply suggested fix.
>>>>>>>
>>>>>>> On 1/15/2014 10:21 PM, David Holmes wrote:
>>>>>>>> Hi Yumin,
>>>>>>>>
>>>>>>>> Unfortunately the complexities of this continue to expose
>>>>>>>> themselves :(
>>>>>>>>
>>>>>>>> On 15/01/2014 11:00 AM, Yumin Qi wrote:
>>>>>>>>> Can I have your codereview of the change
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~minqi/6498581/webrev00
>>>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/6498581/webrev00/>
>>>>>>>>>
>>>>>>>>> Summary: There is race condition between os::interrupt and
>>>>>>>>> os::is_interrupted on Windows.  See bug comments in detail. When
>>>>>>>>> thread
>>>>>>>>> sleep check if it gets interrupted, it may see interrupted but not
>>>>>>>>> really interrupted so cause spurious waking up (early return from
>>>>>>>>> sleep). Fix by checking if a real interrupt sent to thread interrupt
>>>>>>>>> event can prevent such false return. On windows we can get away the
>>>>>>>>> interrupted field but to keep consistent with other platforms, I
>>>>>>>>> choose
>>>>>>>>> to leave it as it is.
>>>>>>>>>
>>>>>>>>> Contributed-By: David Holmes
>>>>>>>> Not really. :) I suggested using WaitForSingleObject on the
>>>>>>>> interrupt event to detect if the event was signalled or not, and
>>>>>>>> hence whether the thread was interrupted or not. I had envisaged it
>>>>>>>> replacing use of the interrupted field altogether on windows - but I
>>>>>>>> had overlooked the fact that the _interrupted field of osThread is
>>>>>>>> maintained in shared code, and further that there is an intrinsic
>>>>>>>> for Thread.isInterrupted that uses that field!
>>>>>>>>
>>>>>>> I can add myself after you, the main contributor for this fix.
>>>>>>>> The fix you have provided in os::is_interrupted deals nicely with
>>>>>>>> the race between setting/clearing the field and setting/clearing the
>>>>>>>> event. Because of the strong ordering in os::interrupt:
>>>>>>>>
>>>>>>>> 1. osthread->set_interrupted(true);
>>>>>>>> 2. OrderAccess::release();
>>>>>>>> 3. SetEvent(osthread->interrupt_event());
>>>>>>>>
>>>>>>>> we can require both conditions to be met to define that the thread
>>>>>>>> is indeed interrupted. If we have set the field but not the event
>>>>>>>> then is_interrupted will return false without touching the field or
>>>>>>>> the event. If the thread then blocks it will either unblock
>>>>>>>> immediately if setEvent has now been called, or will unblock when
>>>>>>>> setEvent is eventually called. That all seems to work fine.
>>>>>>>>
>>>>>>>> The intrinsic is a fast-path that inlines the access to the
>>>>>>>> _interrupted field, but only for the current thread and only when
>>>>>>>> not trying to also clear the interrupted state (ie only for
>>>>>>>> Thread.currentThread().isInterrupted()**). That seems mostly
>>>>>>>> harmless even if the thread is concurrently halfway through being
>>>>>>>> interrupted (ie the field is set but not the event) as any interrupt
>>>>>>>> responsive operation will at some point call the actual
>>>>>>>> os::is_interrupted() runtime code.
>>>>>>>>
>>>>>>>> The only glitch I see is that it would now be possible for the
>>>>>>>> following code to throw the exception:
>>>>>>>>
>>>>>>>> boolean a = Thread.currentThread().isInterrupted();
>>>>>>>> boolean b = Thread.interrupted();
>>>>>>>> if (a && !b)
>>>>>>>>   throw new Error("inconsistent Thread interrupt state observed");
>>>>>>>>
>>>>>>> Hope no one will write code like this --- but there is other
>>>>>>> possibility that multiple interrupting actions sent to the thread so
>>>>>>> such check can not guarantee it returns consistent result. I see the
>>>>>>> problem here but think such coding is very rare.  With we return real
>>>>>>> status of Event, 'a' get false means the Event is not set yet ----
>>>>>>> and os::is_interrupted returns false is right.
>>>>>> I think you may have missed David's point here. The earlier check:
>>>>>>
>>>>>>     boolean a = Thread.currentThread().isInterrupted();
>>>>>>
>>>>>> is returning 'true' and a later check:
>>>>>>
>>>>>>     boolean b = Thread.interrupted();
>>>>>>
>>>>>> is returning 'false'. If variable 'a' is 'true', then 'b' must
>>>>>> also be 'true' to be consistent.
>>>>>>
>>>>> 1) a = true, b = true
>>>>> a: it will not clear interrupted field. So if we get a 'true', means
>>>>>      interrupted field is true and Event signaled. But from code:
>>>>>
>>>>>    if (interrupted && clear_interrupted) {
>>>>>      osthread->set_interrupted(false);
>>>>>      ResetEvent(osthread->interrupt_event());
>>>>>    } // Otherwise leave the interrupted state alone
>>>>>
>>>>>    We did not do this since clear_interrupted is false.
>>>> You are missing the point. If Thread.currentThread().isInterrupted()
>>>> is compiled at runtime using the intrinisc then it will return true if
>>>> the field is set even if the Event is not. At that point the
>>>> non-intrinsic Thread.interrupted would return false. This would give
>>>> the appearance that the thread was interrupted then not interrupted
>>>> but that is impossible based on the specs for those methods.
>>>>
>>>> David
>>>> -----
>>>>
>>>>
>>>>> b: now we query with clear_interrupted is true.
>>>>>     Note the Event still signaled ( We assume no other interrupt actions
>>>>> from other threads), the os::is_interrupted will return 'true' since
>>>>>     field is 'true' and Event in signaled status.
>>>>>     We do the clear before return 'true' so next if we have call say, c,
>>>>> will get 'false'.
>>>>>
>>>>> 2) a = false b= true
>>>>> There is a situation like
>>>>> a:  get a 'false' ---- the interrupt is going on and SetEvent not called
>>>>> yet, but field already set to interrupted so get a 'false'.
>>>>> b: When b  called, interrupt action done, so  now  field is interrupted
>>>>> and Event  signaled. We get a 'true'.
>>>>>
>>>>> (Assume only one interrupt)
>>>>>
>>>>> But, anyway, a = false and b = true should not be a problem I think. We
>>>>> treat interrupted as both field and Event set.
>>>>> It can not get a is true and b is false in the situation.
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>> I don't think this is a realistic situation given for the issue to
>>>>>>>> arise you need the interrupting thread to be effectively descheduled
>>>>>>>> after setting the field but before setting the event. But still it
>>>>>>>> is possible and could be observable. I'm inclined to accept this
>>>>>>>> possibility but I think Karen should make that call. I also think we
>>>>>>>> would need to document this possibility just in case it does arise.
>>>>>>>>
>>>>>>>> ** I've no idea why this intrinsic exists as it is a fairly uncommon
>>>>>>>> usage. I guess at some point it was thought that this would be
>>>>>>>> common, but most interrupt responsive code uses Thread.interrupted
>>>>>>>> to clear the interrupt state as well as query it.
>>>>>>>>
>>>>>>> agree.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Tests: vm.quick.testlist, specjbb2005, original test case, JPRT
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Yumin
>>>>>>>>>
>>>>>>>>>



More information about the hotspot-runtime-dev mailing list