RFR: 6498581: ThreadInterruptTest3 produces wrong output on Windows

Tue Feb 25 15:37:14 PST 2014

I think Vladimir has a good point - the way I read this, Windows is the only platform
that has this issue, so making this conditional for Windows makes a lot of sense. Thanks
for doing this Yumin.

thanks,
Karen

On Feb 25, 2014, at 5:55 PM, Vladimir Kozlov wrote:

> Repeating my question from other thread.
> 
> Why you want to remove it on all platforms and not for Windows only?
> 
> thanks,
> Vladimir
> 
> On 2/25/14 2:38 PM, Yumin Qi wrote:
>> Hi, Karen and all
>> 
>>   I have modified inline_native_isInterrupted to eliminate second fast
>> path. Please check webrev at
>> 
>>   http://cr.openjdk.java.net/~minqi/6498581/webrev01/
>> 
>>   Also need review from compiler team if the changes is OK.
>> 
>> Thanks
>> Yumin
>> 
>> 
>> On 2/19/2014 1:15 PM, Karen Kinnear wrote:
>>> Yumin,
>>> 
>>> The code fix looks good.
>>> 
>>> The question is whether to leave in the intrinsic or not since it
>>> could cause inconsistent data
>>> to be returned.
>>> 
>>> One way to deal with this would be to modify the intrinsic to only
>>> optimize the case in
>>> which the thread is not interrupted, i.e.
>>> t = Thread.current() && (!TLS._osthread._interrupted)
>>> 
>>> I think that would remove the inconsistency. Yes, it would only
>>> optimize a subset, but the
>>> not interrupted subset is probably the one that makes the difference.
>>> 
>>> Can you work with the compiler folks to make that modification if it
>>> makes sense to you?
>>> 
>>> thanks,
>>> Karen
>>> 
>>> On Jan 17, 2014, at 2:01 AM, David Holmes wrote:
>>> 
>>>> Hi Yumin,
>>>> 
>>>> No need to bother the compiler folks with this (still cc'd but can be
>>>> dropped on any reply). We just need to decide if we can tolerate this
>>>> inconsistency between the intrinsic and non-intrinsic forms.
>>>> Otherwise we may choose to disable the intrinsic on Windows - I
>>>> really don't see that it can be adding much value as
>>>> Thread.currentThread().isInterrupted(false) is hardly a performance
>>>> critical action. But we would need benchmarking to confirm that.
>>>> 
>>>> David
>>>> 
>>>> On 17/01/2014 4:10 PM, Yumin Qi wrote:
>>>>> Now I got it:
>>>>> 
>>>>> //------------------------inline_native_isInterrupted------------------
>>>>> // private native boolean java.lang.Thread.isInterrupted(boolean
>>>>> ClearInterrupted);
>>>>> bool LibraryCallKit::inline_native_isInterrupted() {
>>>>>   // Add a fast path to t.isInterrupted(clear_int):
>>>>>   //   (t == Thread.current() && (!TLS._osthread._interrupted ||
>>>>> !clear_int))
>>>>>   //   ? TLS._osthread._interrupted : /*slow path:*/
>>>>> t.isInterrupted(clear_int)
>>>>>   // So, in the common case that the interrupt bit is false,
>>>>>   // we avoid making a call into the VM.  Even if the interrupt bit
>>>>>   // is true, if the clear_int argument is false, we avoid the VM
>>>>> call.
>>>>>   // However, if the receiver is not currentThread, we must call
>>>>> the VM,
>>>>>   // because there must be some locking done around the operation.
>>>>> 
>>>>>   // We only go to the fast case code if we pass two guards.
>>>>>   // Paths which do not pass are accumulated in the slow_region.
>>>>> 
>>>>> this code does not work both before and after the fix then.
>>>>> Maybe we should not go with fast path then. CC to compiler team.
>>>>> 
>>>>> Thanks
>>>>> Yumin
>>>>> 
>>>>> 
>>>>> On 1/16/2014 3:11 PM, David Holmes wrote:
>>>>>> On 17/01/2014 7:24 AM, Yumin Qi wrote:
>>>>>>> Dan,
>>>>>>> 
>>>>>>> 
>>>>>>> On 1/16/2014 12:48 PM, Daniel D. Daugherty wrote:
>>>>>>>> On 1/16/14 1:42 PM, Yumin Qi wrote:
>>>>>>>>> David,
>>>>>>>>> 
>>>>>>>>>  Thanks for the detail analysis so far and supply suggested fix.
>>>>>>>>> 
>>>>>>>>> On 1/15/2014 10:21 PM, David Holmes wrote:
>>>>>>>>>> Hi Yumin,
>>>>>>>>>> 
>>>>>>>>>> Unfortunately the complexities of this continue to expose
>>>>>>>>>> themselves :(
>>>>>>>>>> 
>>>>>>>>>> On 15/01/2014 11:00 AM, Yumin Qi wrote:
>>>>>>>>>>> Can I have your codereview of the change
>>>>>>>>>>> 
>>>>>>>>>>> http://cr.openjdk.java.net/~minqi/6498581/webrev00
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Eminqi/6498581/webrev00/>
>>>>>>>>>>> 
>>>>>>>>>>> Summary: There is race condition between os::interrupt and
>>>>>>>>>>> os::is_interrupted on Windows.  See bug comments in detail. When
>>>>>>>>>>> thread
>>>>>>>>>>> sleep check if it gets interrupted, it may see interrupted but
>>>>>>>>>>> not
>>>>>>>>>>> really interrupted so cause spurious waking up (early return from
>>>>>>>>>>> sleep). Fix by checking if a real interrupt sent to thread
>>>>>>>>>>> interrupt
>>>>>>>>>>> event can prevent such false return. On windows we can get
>>>>>>>>>>> away the
>>>>>>>>>>> interrupted field but to keep consistent with other platforms, I
>>>>>>>>>>> choose
>>>>>>>>>>> to leave it as it is.
>>>>>>>>>>> 
>>>>>>>>>>> Contributed-By: David Holmes
>>>>>>>>>> Not really. :) I suggested using WaitForSingleObject on the
>>>>>>>>>> interrupt event to detect if the event was signalled or not, and
>>>>>>>>>> hence whether the thread was interrupted or not. I had
>>>>>>>>>> envisaged it
>>>>>>>>>> replacing use of the interrupted field altogether on windows -
>>>>>>>>>> but I
>>>>>>>>>> had overlooked the fact that the _interrupted field of osThread is
>>>>>>>>>> maintained in shared code, and further that there is an intrinsic
>>>>>>>>>> for Thread.isInterrupted that uses that field!
>>>>>>>>>> 
>>>>>>>>> I can add myself after you, the main contributor for this fix.
>>>>>>>>>> The fix you have provided in os::is_interrupted deals nicely with
>>>>>>>>>> the race between setting/clearing the field and
>>>>>>>>>> setting/clearing the
>>>>>>>>>> event. Because of the strong ordering in os::interrupt:
>>>>>>>>>> 
>>>>>>>>>> 1. osthread->set_interrupted(true);
>>>>>>>>>> 2. OrderAccess::release();
>>>>>>>>>> 3. SetEvent(osthread->interrupt_event());
>>>>>>>>>> 
>>>>>>>>>> we can require both conditions to be met to define that the thread
>>>>>>>>>> is indeed interrupted. If we have set the field but not the event
>>>>>>>>>> then is_interrupted will return false without touching the
>>>>>>>>>> field or
>>>>>>>>>> the event. If the thread then blocks it will either unblock
>>>>>>>>>> immediately if setEvent has now been called, or will unblock when
>>>>>>>>>> setEvent is eventually called. That all seems to work fine.
>>>>>>>>>> 
>>>>>>>>>> The intrinsic is a fast-path that inlines the access to the
>>>>>>>>>> _interrupted field, but only for the current thread and only when
>>>>>>>>>> not trying to also clear the interrupted state (ie only for
>>>>>>>>>> Thread.currentThread().isInterrupted()**). That seems mostly
>>>>>>>>>> harmless even if the thread is concurrently halfway through being
>>>>>>>>>> interrupted (ie the field is set but not the event) as any
>>>>>>>>>> interrupt
>>>>>>>>>> responsive operation will at some point call the actual
>>>>>>>>>> os::is_interrupted() runtime code.
>>>>>>>>>> 
>>>>>>>>>> The only glitch I see is that it would now be possible for the
>>>>>>>>>> following code to throw the exception:
>>>>>>>>>> 
>>>>>>>>>> boolean a = Thread.currentThread().isInterrupted();
>>>>>>>>>> boolean b = Thread.interrupted();
>>>>>>>>>> if (a && !b)
>>>>>>>>>>  throw new Error("inconsistent Thread interrupt state observed");
>>>>>>>>>> 
>>>>>>>>> Hope no one will write code like this --- but there is other
>>>>>>>>> possibility that multiple interrupting actions sent to the
>>>>>>>>> thread so
>>>>>>>>> such check can not guarantee it returns consistent result. I see
>>>>>>>>> the
>>>>>>>>> problem here but think such coding is very rare.  With we return
>>>>>>>>> real
>>>>>>>>> status of Event, 'a' get false means the Event is not set yet ----
>>>>>>>>> and os::is_interrupted returns false is right.
>>>>>>>> I think you may have missed David's point here. The earlier check:
>>>>>>>> 
>>>>>>>>    boolean a = Thread.currentThread().isInterrupted();
>>>>>>>> 
>>>>>>>> is returning 'true' and a later check:
>>>>>>>> 
>>>>>>>>    boolean b = Thread.interrupted();
>>>>>>>> 
>>>>>>>> is returning 'false'. If variable 'a' is 'true', then 'b' must
>>>>>>>> also be 'true' to be consistent.
>>>>>>>> 
>>>>>>> 1) a = true, b = true
>>>>>>> a: it will not clear interrupted field. So if we get a 'true', means
>>>>>>>     interrupted field is true and Event signaled. But from code:
>>>>>>> 
>>>>>>>   if (interrupted && clear_interrupted) {
>>>>>>>     osthread->set_interrupted(false);
>>>>>>>     ResetEvent(osthread->interrupt_event());
>>>>>>>   } // Otherwise leave the interrupted state alone
>>>>>>> 
>>>>>>>   We did not do this since clear_interrupted is false.
>>>>>> You are missing the point. If Thread.currentThread().isInterrupted()
>>>>>> is compiled at runtime using the intrinisc then it will return true if
>>>>>> the field is set even if the Event is not. At that point the
>>>>>> non-intrinsic Thread.interrupted would return false. This would give
>>>>>> the appearance that the thread was interrupted then not interrupted
>>>>>> but that is impossible based on the specs for those methods.
>>>>>> 
>>>>>> David
>>>>>> -----
>>>>>> 
>>>>>> 
>>>>>>> b: now we query with clear_interrupted is true.
>>>>>>>    Note the Event still signaled ( We assume no other interrupt
>>>>>>> actions
>>>>>>> from other threads), the os::is_interrupted will return 'true' since
>>>>>>>    field is 'true' and Event in signaled status.
>>>>>>>    We do the clear before return 'true' so next if we have call
>>>>>>> say, c,
>>>>>>> will get 'false'.
>>>>>>> 
>>>>>>> 2) a = false b= true
>>>>>>> There is a situation like
>>>>>>> a:  get a 'false' ---- the interrupt is going on and SetEvent not
>>>>>>> called
>>>>>>> yet, but field already set to interrupted so get a 'false'.
>>>>>>> b: When b  called, interrupt action done, so  now  field is
>>>>>>> interrupted
>>>>>>> and Event  signaled. We get a 'true'.
>>>>>>> 
>>>>>>> (Assume only one interrupt)
>>>>>>> 
>>>>>>> But, anyway, a = false and b = true should not be a problem I
>>>>>>> think. We
>>>>>>> treat interrupted as both field and Event set.
>>>>>>> It can not get a is true and b is false in the situation.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Dan
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> I don't think this is a realistic situation given for the issue to
>>>>>>>>>> arise you need the interrupting thread to be effectively
>>>>>>>>>> descheduled
>>>>>>>>>> after setting the field but before setting the event. But still it
>>>>>>>>>> is possible and could be observable. I'm inclined to accept this
>>>>>>>>>> possibility but I think Karen should make that call. I also
>>>>>>>>>> think we
>>>>>>>>>> would need to document this possibility just in case it does
>>>>>>>>>> arise.
>>>>>>>>>> 
>>>>>>>>>> ** I've no idea why this intrinsic exists as it is a fairly
>>>>>>>>>> uncommon
>>>>>>>>>> usage. I guess at some point it was thought that this would be
>>>>>>>>>> common, but most interrupt responsive code uses Thread.interrupted
>>>>>>>>>> to clear the interrupt state as well as query it.
>>>>>>>>>> 
>>>>>>>>> agree.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Yumin
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>>> Tests: vm.quick.testlist, specjbb2005, original test case, JPRT
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Yumin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>