RFR: 6498581: ThreadInterruptTest3 produces wrong output on Windows

Thu Jan 16 12:48:18 PST 2014

On 1/16/14 1:42 PM, Yumin Qi wrote:
> David,
>
>   Thanks for the detail analysis so far and supply suggested fix.
>
> On 1/15/2014 10:21 PM, David Holmes wrote:
>> Hi Yumin,
>>
>> Unfortunately the complexities of this continue to expose themselves :(
>>
>> On 15/01/2014 11:00 AM, Yumin Qi wrote:
>>> Can I have your codereview of the change
>>>
>>> http://cr.openjdk.java.net/~minqi/6498581/webrev00
>>> <http://cr.openjdk.java.net/%7Eminqi/6498581/webrev00/>
>>>
>>> Summary: There is race condition between os::interrupt and
>>> os::is_interrupted on Windows.  See bug comments in detail. When thread
>>> sleep check if it gets interrupted, it may see interrupted but not
>>> really interrupted so cause spurious waking up (early return from
>>> sleep). Fix by checking if a real interrupt sent to thread interrupt
>>> event can prevent such false return. On windows we can get away the
>>> interrupted field but to keep consistent with other platforms, I choose
>>> to leave it as it is.
>>>
>>> Contributed-By: David Holmes
>>
>> Not really. :) I suggested using WaitForSingleObject on the interrupt 
>> event to detect if the event was signalled or not, and hence whether 
>> the thread was interrupted or not. I had envisaged it replacing use 
>> of the interrupted field altogether on windows - but I had overlooked 
>> the fact that the _interrupted field of osThread is maintained in 
>> shared code, and further that there is an intrinsic for 
>> Thread.isInterrupted that uses that field!
>>
> I can add myself after you, the main contributor for this fix.
>> The fix you have provided in os::is_interrupted deals nicely with the 
>> race between setting/clearing the field and setting/clearing the 
>> event. Because of the strong ordering in os::interrupt:
>>
>>  1. osthread->set_interrupted(true);
>>  2. OrderAccess::release();
>>  3. SetEvent(osthread->interrupt_event());
>>
>> we can require both conditions to be met to define that the thread is 
>> indeed interrupted. If we have set the field but not the event then 
>> is_interrupted will return false without touching the field or the 
>> event. If the thread then blocks it will either unblock immediately 
>> if setEvent has now been called, or will unblock when setEvent is 
>> eventually called. That all seems to work fine.
>>
>> The intrinsic is a fast-path that inlines the access to the 
>> _interrupted field, but only for the current thread and only when not 
>> trying to also clear the interrupted state (ie only for 
>> Thread.currentThread().isInterrupted()**). That seems mostly harmless 
>> even if the thread is concurrently halfway through being interrupted 
>> (ie the field is set but not the event) as any interrupt responsive 
>> operation will at some point call the actual os::is_interrupted() 
>> runtime code.
>>
>> The only glitch I see is that it would now be possible for the 
>> following code to throw the exception:
>>
>> boolean a = Thread.currentThread().isInterrupted();
>> boolean b = Thread.interrupted();
>> if (a && !b)
>>   throw new Error("inconsistent Thread interrupt state observed");
>>
> Hope no one will write code like this --- but there is other 
> possibility that multiple interrupting actions sent to the thread so 
> such check can not guarantee it returns consistent result.  I see the 
> problem here but think such coding is very rare.  With we return real 
> status of Event, 'a' get false means the Event is not set yet ---- and 
> os::is_interrupted returns false is right.

I think you may have missed David's point here. The earlier check:

     boolean a = Thread.currentThread().isInterrupted();

is returning 'true' and a later check:

     boolean b = Thread.interrupted();

is returning 'false'. If variable 'a' is 'true', then 'b' must
also be 'true' to be consistent.

Dan

>
>> I don't think this is a realistic situation given for the issue to 
>> arise you need the interrupting thread to be effectively descheduled 
>> after setting the field but before setting the event. But still it is 
>> possible and could be observable. I'm inclined to accept this 
>> possibility but I think Karen should make that call. I also think we 
>> would need to document this possibility just in case it does arise.
>>
>> ** I've no idea why this intrinsic exists as it is a fairly uncommon 
>> usage. I guess at some point it was thought that this would be 
>> common, but most interrupt responsive code uses Thread.interrupted to 
>> clear the interrupt state as well as query it.
>>
> agree.
>
> Thanks
> Yumin
>> Thanks,
>> David
>>
>>>
>>> Tests: vm.quick.testlist, specjbb2005, original test case, JPRT
>>>
>>> Thanks
>>> Yumin
>>>
>>>
>