A race problem about select in a small time window
Sean Chou
zhouyx at linux.vnet.ibm.com
Tue Feb 19 08:30:55 UTC 2013
Hi Rob,
Is there any progress ?
On Tue, Jan 15, 2013 at 5:10 AM, Rob McKenna <rob.mckenna at oracle.com> wrote:
> Apologies folks, I managed to overlook this completely. Sean, its on my
> radar and I'll get back to you soon.
>
> -Rob
>
>
> On 21/12/12 15:54, Alan Bateman wrote:
>
>>
>> I don't have cycles to look at this one (too much going on for M6) but
>> Rob McKenna (cc'ed) might.
>>
>> On 17/12/2012 08:56, Sean Chou wrote:
>>
>>> Hello ,
>>>
>>> This is the detail problem, there is a small time window in which a 3
>>> threads race makes select() always return 0 without blocking.
>>>
>>> I wrote a testcase(http://cr.openjdk.**java.net/~zhouyx/OJDK-714/**
>>> webrev0.2/ <http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/> <
>>> http://cr.openjdk.java.net/%**7Ezhouyx/OJDK-714/webrev0.2/<http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>>)
>>> which needs to modify the lib code to reproduce, because the time windows
>>> is small.
>>>
>>> The reproduce scenario is described in follow, use Tx for thread x:
>>>
>>> 1. T1 (the user code) is selecting a channel(suppose C), it just returns
>>> from native select function, and niolib select method is checking if the
>>> returned channel is interested in the event, then 2 happens;
>>> 2. T2 is closing channel C, it just set the open variable to false but
>>> not yet closed the channel actually, and then 3 happens;
>>> 3. T3 set the interedOps of the channel to 0. // 0 means the channel is
>>> not interested in anything, the channel will be put into cancel list
>>> normally.
>>>
>>> In this senario, T1 returns from select, and return 0 which means no
>>> channel is selected(because the channel C returned from native invocation
>>> has nothing insterested in, it is not returned to application). Then T1
>>> goes to invoke select again(usually in a loop, this is how select is
>>> designed to be used). In normal case, select method checks if any channels
>>> those should be cancelled and remove them from the set to be selected.
>>> Then, goes to native select function.
>>>
>>> The problem is: select method first checks if the channel is closed, if
>>> it is closed, select method doesn't put it into cancel list.
>>>
>>> In above senario, channel C is in close state, but not closed indeed,
>>> and setInteredOps to 0(which means cancel). So select method doesn't put C
>>> into cancel list(due to the problem) which means the native select set
>>> still contains channel C . So the native select always return C and nio
>>> select always return 0. Until the channel is finally closed.
>>>
>>>
>>> The testcase: http://cr.openjdk.java.net/~**zhouyx/OJDK-714/webrev0.2/<http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/><
>>> http://cr.openjdk.java.net/%**7Ezhouyx/OJDK-714/webrev0.2/<http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
>>> >
>>>
>>> A working fix: http://cr.openjdk.java.net/~**zhouyx/OJDK-714/webrev_fix/<http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/><
>>> http://cr.openjdk.java.net/%**7Ezhouyx/OJDK-714/webrev_fix/<http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/>
>>> >
>>>
>>>
>>> Please have a look.
>>>
>>>
>>>
>>
>
--
Best Regards,
Sean Chou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20130219/c1b150a4/attachment.html
More information about the nio-dev
mailing list