A race problem about select in a small time window

Rob McKenna rob.mckenna at oracle.com
Mon Jan 14 13:10:45 PST 2013


Apologies folks, I managed to overlook this completely. Sean, its on my 
radar and I'll get back to you soon.

     -Rob

On 21/12/12 15:54, Alan Bateman wrote:
>
> I don't have cycles to look at this one (too much going on for M6) but 
> Rob McKenna (cc'ed) might.
>
> On 17/12/2012 08:56, Sean Chou wrote:
>> Hello ,
>>
>> This is the detail problem, there is a small time window in which a 3 
>> threads race makes select() always return 0 without blocking.
>>
>> I wrote a 
>> testcase(http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/ 
>> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>) which 
>> needs to modify the lib code to reproduce, because the time windows 
>> is small.
>>
>> The reproduce scenario is described in follow, use Tx for thread x:
>>
>> 1. T1 (the user code) is selecting a channel(suppose C), it just 
>> returns from native select function, and niolib select method is 
>> checking if the returned channel is interested in the event, then 2 
>> happens;
>> 2. T2 is closing channel C, it just set the open variable to false 
>> but not yet closed the channel actually, and then 3 happens;
>> 3. T3 set the interedOps of the channel to 0. // 0 means the channel 
>> is not interested in anything, the channel will be put into cancel 
>> list normally.
>>
>> In this senario, T1 returns from select, and return 0 which means no 
>> channel is selected(because the channel C returned from native 
>> invocation has nothing insterested in, it is not returned to 
>> application). Then T1 goes to invoke select again(usually in a loop, 
>> this is how select is designed to be used). In normal case, select 
>> method checks if any channels those should be cancelled and remove 
>> them from the set to be selected. Then, goes to native select function.
>>
>> The problem is: select method first checks if the channel is closed, 
>> if it is closed, select method doesn't put it into cancel list.
>>
>> In above senario, channel C is in close state, but not closed indeed, 
>> and setInteredOps to 0(which means cancel). So select method doesn't 
>> put C into cancel list(due to the problem) which means the native 
>> select set still contains channel C . So the native select always 
>> return C and nio select always return 0. Until the channel is finally 
>> closed.
>>
>>
>> The testcase: http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/ 
>> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
>>
>> A working fix: 
>> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/ 
>> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/>
>>
>>
>> Please have a look.
>>
>>
>



More information about the nio-dev mailing list