A race problem about select in a small time window

Sean Chou zhouyx at linux.vnet.ibm.com
Sun Jan 13 21:28:16 PST 2013


Hello Rob,

Do you have some time to take a look at this issue ?


On Fri, Dec 21, 2012 at 11:54 PM, Alan Bateman <Alan.Bateman at oracle.com>wrote:

>
> I don't have cycles to look at this one (too much going on for M6) but Rob
> McKenna (cc'ed) might.
>
>
> On 17/12/2012 08:56, Sean Chou wrote:
>
>> Hello ,
>>
>> This is the detail problem, there is a small time window in which a 3
>> threads race makes select() always return 0 without blocking.
>>
>> I wrote a testcase(http://cr.openjdk.**java.net/~zhouyx/OJDK-714/**
>> webrev0.2/ <http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/> <
>> http://cr.openjdk.java.net/%**7Ezhouyx/OJDK-714/webrev0.2/<http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>>)
>> which needs to modify the lib code to reproduce, because the time windows
>> is small.
>>
>>
>> The reproduce scenario is described in follow, use Tx for thread x:
>>
>> 1. T1 (the user code) is selecting a channel(suppose C), it just returns
>> from native select function, and niolib select method is checking if the
>> returned channel is interested in the event, then 2 happens;
>> 2. T2 is closing channel C, it just set the open variable to false but
>> not yet closed the channel actually, and then 3 happens;
>> 3. T3 set the interedOps of the channel to 0. // 0 means the channel is
>> not interested in anything, the channel will be put into cancel list
>> normally.
>>
>> In this senario, T1 returns from select, and return 0 which means no
>> channel is selected(because the channel C returned from native invocation
>> has nothing insterested in, it is not returned to application). Then T1
>> goes to invoke select again(usually in a loop, this is how select is
>> designed to be used). In normal case, select method checks if any channels
>> those should be cancelled and remove them from the set to be selected.
>> Then, goes to native select function.
>>
>> The problem is: select method first checks if the channel is closed, if
>> it is closed, select method doesn't put it into cancel list.
>>
>> In above senario, channel C is in close state, but not closed indeed, and
>> setInteredOps to 0(which means cancel). So select method doesn't put C into
>> cancel list(due to the problem) which means the native select set still
>> contains channel C . So the native select always return C and nio select
>> always return 0. Until the channel is finally closed.
>>
>>
>> The testcase: http://cr.openjdk.java.net/~**zhouyx/OJDK-714/webrev0.2/<http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/><
>> http://cr.openjdk.java.net/%**7Ezhouyx/OJDK-714/webrev0.2/<http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
>> >
>>
>> A working fix: http://cr.openjdk.java.net/~**zhouyx/OJDK-714/webrev_fix/<http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/><
>> http://cr.openjdk.java.net/%**7Ezhouyx/OJDK-714/webrev_fix/<http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/>
>> >
>>
>>
>> Please have a look.
>>
>>
>>
>


-- 
Best Regards,
Sean Chou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20130114/274a0685/attachment.html 


More information about the nio-dev mailing list