8198928: (so) SocketChannel connect may deadlock if closed at around same time that connect fails
Alan Bateman
Alan.Bateman at oracle.com
Sun Mar 4 20:22:19 UTC 2018
This is a deadlock found with one of the HTTP client tests. When
establishing a connection with a SocketChannel then the channel is
specified to be closed if the connection cannot be established. If the
SocketChannel is configured non-blocking and connect (or finishConnect)
is closing the channel due to a connect failure then it can potentially
deadlock with another thread that attempts to close the channel at the
same time. The deadlock arises because connect/finishConnect invoke
close (closeLock) while holding the read and write locks. Another thread
closing the channel acquires the clockLock, the actual close
implementation then attempts to acquire the read and write locks in
order to coordinate with threads doing non-blocking operations on the
change.
Historical note: The implementation of connect/finishConnect has always
called close while holding the read and write locks but it hasn't been
problem in the past. The recent changes to to separate the blocking and
non-blocking code paths have changed the locking during close which is
why we are only seeing this now.
The proposed change is to connect/finishConnect so that these methods
don't close the channel until after they've released the read and write
locks. The changes mean there is a window where some other thread might
call finishConnect or some other operation so a few other small changes
are needed to deal with that. It would be possible to introduce a
special state for this case this adds complexity so I've decided against
that.
The webrev with the proposed changes and test are here:
http://cr.openjdk.java.net/~alanb/8198928/webrev/index.html
In terms of behavior changes then an observer polling
isConnectionPending will now observe the connection is pending even if
the channel is configured blocking. Previously, isConnectionPending
would never return true if connect was called when configured blocking.
Another observer polling with getRemoteAddress will now observe the
remote address while the connection is pending. This was always true
with socket channels configured non-blocking, not so with channels
configured blocking. The spec for getRemoteAddress is that the remote
address is only returned when the channel is connected but it can be
useful to observe it while a connection is established. I will create a
separate JIRA issue to track this as it's a long standing spec vs.
implementation difference.
As I mentioned, if the connection cannot be established, there is a
small window for other threads to call finishConnect or some other
operation before the channel is closed. This is mostly benign except on
platforms where the connection error isn't available after it has been
obtained. To deal with this requires checkConnect to check for POLLHUP,
which is should have done anyway. While in the area I did some cleanup
on the Windows implementation of checkConnect.
All tests are passing with these changes, including the HTTP client
tests where this issue was initially observed as a test timing out
intermittently.
-Alan
More information about the nio-dev
mailing list