A race problem about select in a small time window
Rob McKenna
rob.mckenna at oracle.com
Mon Feb 25 22:18:51 UTC 2013
Apologies for the delay Sean,
Just to be clear: it appears we're neglecting to add the channel to the
EPollArrayWrapper idle set as we're depending on ch.isOpen() to
determine whether we need to or not.
It makes sense to me that anything with u.events == 0 should indeed be
put into the idle list regardless of whether it "isOpen". It will always
be removed later by release, and both release and updateRegistrations
are synced on updateList.
I'll try to get ahold of Alan tomorrow just to run it past him (in any
case, you'll need him to review this as I'm not a reviewer) and to
figure out if I'm missing something with the ch.isOpen().
-Rob
On 19/02/13 08:30, Sean Chou wrote:
> Hi Rob,
>
> Is there any progress ?
>
>
> On Tue, Jan 15, 2013 at 5:10 AM, Rob McKenna <rob.mckenna at oracle.com
> <mailto:rob.mckenna at oracle.com>> wrote:
>
> Apologies folks, I managed to overlook this completely. Sean, its
> on my radar and I'll get back to you soon.
>
> -Rob
>
>
> On 21/12/12 15:54, Alan Bateman wrote:
>
>
> I don't have cycles to look at this one (too much going on for
> M6) but Rob McKenna (cc'ed) might.
>
> On 17/12/2012 08:56, Sean Chou wrote:
>
> Hello ,
>
> This is the detail problem, there is a small time window
> in which a 3 threads race makes select() always return 0
> without blocking.
>
> I wrote a
> testcase(http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>) which
> needs to modify the lib code to reproduce, because the
> time windows is small.
>
> The reproduce scenario is described in follow, use Tx for
> thread x:
>
> 1. T1 (the user code) is selecting a channel(suppose C),
> it just returns from native select function, and niolib
> select method is checking if the returned channel is
> interested in the event, then 2 happens;
> 2. T2 is closing channel C, it just set the open variable
> to false but not yet closed the channel actually, and then
> 3 happens;
> 3. T3 set the interedOps of the channel to 0. // 0 means
> the channel is not interested in anything, the channel
> will be put into cancel list normally.
>
> In this senario, T1 returns from select, and return 0
> which means no channel is selected(because the channel C
> returned from native invocation has nothing insterested
> in, it is not returned to application). Then T1 goes to
> invoke select again(usually in a loop, this is how select
> is designed to be used). In normal case, select method
> checks if any channels those should be cancelled and
> remove them from the set to be selected. Then, goes to
> native select function.
>
> The problem is: select method first checks if the channel
> is closed, if it is closed, select method doesn't put it
> into cancel list.
>
> In above senario, channel C is in close state, but not
> closed indeed, and setInteredOps to 0(which means cancel).
> So select method doesn't put C into cancel list(due to the
> problem) which means the native select set still contains
> channel C . So the native select always return C and nio
> select always return 0. Until the channel is finally closed.
>
>
> The testcase:
> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev0.2/>
>
> A working fix:
> http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/
> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/> <http://cr.openjdk.java.net/%7Ezhouyx/OJDK-714/webrev_fix/>
>
>
> Please have a look.
>
>
>
>
>
>
>
> --
> Best Regards,
> Sean Chou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20130225/2fb88ad9/attachment.html
More information about the nio-dev
mailing list