Problems persist in KQueueSelectorProvider (Mac) in 7u6 ea
Alan Bateman
Alan.Bateman at oracle.com
Mon Aug 13 01:11:11 PDT 2012
On 11/08/2012 00:36, David M. Lloyd wrote:
> We're consistently seeing issues under load on Mac with
> KQueueSelectorProvider.
>
> There are two possibly related symptoms: the KQueueSelectorImpl is
> going into a mode where select() does not block, despite the continued
> emptiness of the selected key set; and FileDispatcherImpl#preClose0 is
> hanging, presumably in dup2(), trying to close a socket.
>
> My current hypothesis that some evil race condition exists and is
> being tripped between kqueue and dup2 (a relatively rare way to close
> a socket, at least until NIO came along I guess). My thought though
> is that sockets should not be preclosed this way: instead it would be
> better to use shutdown(fd, SHUT_RDWR), which would effectively
> preclose the socket and hopefully dodge this issue.
>
> I'm hopefully going to have time to try out a patch which does this,
> but I'm taking a couple weeks off starting tonight so I may not have
> time, so we shall see.
The kqueue Selector that is in 7u6 was contributed by Apple via the
macosx-port project. I believe (but can't be sure) that it's the same as
what they have in their jdk6 and jdk5. It would be interesting to know
if run into this issue with jdk6 (I don't know if it is possible for you
to try that).
I'm also interested to know where you tried 7u4 or 7u5. While we
included the kqueue Selector in those releases, it wasn't used by
default because the kqueue Selector was failing several of the OpenJDK
Selector tests. We fixed those issues in 8 and 7u6 as part of restoring
it to be the default Selector.
To get another thing out of way, do you have any native code in this
server that might be using file descriptors? (mentioning it on the off
chance that somehow a non-socket has been registered).
I think the most interesting issue from the above is the hang in
preClose0. That is a dup2 so it's the same as close (and by the way,
dup2 is used because it is not safe to close a socket and release the
file descriptor in multi-threaded environments like this, we have been
doing the same in classic networking since jdk1.4). I think we need to
get a stack trace to know why preClose0 is hanging, I assume hanging in
the kernel. We are aware of problems closing files when there is
concurrent I/O, these cause a hang in the kernel in either dup2 or
close. This is why a number of tests are currently excluded on Mac (same
issues are observed with Apple's jdk6 too). If we can understand why
dup2 is hanging in the kernel then it may explain the spin too.
I see Jason's reply on setInterest and there is indeed a problem there.
The specification is that changing the interest set is effective at the
next select operation but this Selector is doing it asynchronously. This
needs to be changed to batch the changes to the next select as is done
in the other Selector implementations. I will create a bug for that.
-Alan
More information about the nio-dev
mailing list