Problems persist in KQueueSelectorProvider (Mac) in 7u6 ea

Alan Bateman Alan.Bateman at oracle.com
Mon Aug 13 01:11:11 PDT 2012


On 11/08/2012 00:36, David M. Lloyd wrote:
> We're consistently seeing issues under load on Mac with 
> KQueueSelectorProvider.
>
> There are two possibly related symptoms: the KQueueSelectorImpl is 
> going into a mode where select() does not block, despite the continued 
> emptiness of the selected key set; and FileDispatcherImpl#preClose0 is 
> hanging, presumably in dup2(), trying to close a socket.
>
> My current hypothesis that some evil race condition exists and is 
> being tripped between kqueue and dup2 (a relatively rare way to close 
> a socket, at least until NIO came along I guess).  My thought though 
> is that sockets should not be preclosed this way: instead it would be 
> better to use shutdown(fd, SHUT_RDWR), which would effectively 
> preclose the socket and hopefully dodge this issue.
>
> I'm hopefully going to have time to try out a patch which does this, 
> but I'm taking a couple weeks off starting tonight so I may not have 
> time, so we shall see.
The kqueue Selector that is in 7u6 was contributed by Apple via the 
macosx-port project. I believe (but can't be sure) that it's the same as 
what they have in their jdk6 and jdk5. It would be interesting to know 
if run into this issue with jdk6 (I don't know if it is possible for you 
to try that).

I'm also interested to know where you tried 7u4 or 7u5. While we 
included the kqueue Selector in those releases, it wasn't used by 
default because the kqueue Selector was failing several of the OpenJDK 
Selector tests. We fixed those issues in 8 and 7u6 as part of restoring 
it to be the default Selector.

To get another thing out of way, do you have any native code in this 
server that might be using file descriptors? (mentioning it on the off 
chance that somehow a non-socket has been registered).

I think the most interesting issue from the above is the hang in 
preClose0. That is a dup2 so it's the same as close (and by the way, 
dup2 is used because it is not safe to close a socket and release the 
file descriptor in multi-threaded environments like this, we have been 
doing the same in classic networking since jdk1.4). I think we need to 
get a stack trace to know why preClose0 is hanging, I assume hanging in 
the kernel. We are aware of problems closing files when there is 
concurrent I/O, these cause a hang in the kernel in either dup2 or 
close. This is why a number of tests are currently excluded on Mac (same 
issues are observed with Apple's jdk6 too). If we can understand why 
dup2 is hanging in the kernel then it may explain the spin too.

I see Jason's reply on setInterest and there is indeed a problem there. 
The specification is that changing the interest set is effective at the 
next select operation but this Selector is doing it asynchronously. This 
needs to be changed to batch the changes to the next select as is done 
in the other Selector implementations. I will create a bug for that.

-Alan




More information about the nio-dev mailing list