8198928: (so) SocketChannel connect may deadlock if closed at around same time that connect fails

Alan Bateman Alan.Bateman at oracle.com
Sun Mar 4 20:22:19 UTC 2018


This is a deadlock found with one of the HTTP client tests. When 
establishing a connection with a SocketChannel then the channel is 
specified to be closed if the connection cannot be established. If the 
SocketChannel is configured non-blocking and connect (or finishConnect) 
is closing the channel due to a connect failure then it can potentially 
deadlock with another thread that attempts to close the channel at the 
same time. The deadlock arises because connect/finishConnect invoke 
close (closeLock) while holding the read and write locks. Another thread 
closing the channel acquires the clockLock, the actual close 
implementation then attempts to acquire the read and write locks in 
order to coordinate with threads doing non-blocking operations on the 
change.

Historical note: The implementation of connect/finishConnect has always 
called close while holding the read and write locks but it hasn't been 
problem in the past. The recent changes to to separate the blocking and 
non-blocking code paths have changed the locking during close which is 
why we are only seeing this now.

The proposed change is to connect/finishConnect so that these methods 
don't close the channel until after they've released the read and write 
locks. The changes mean there is a window where some other thread might 
call finishConnect or some other operation so a few other small changes 
are needed to deal with that. It would be possible to introduce a 
special state for this case this adds complexity so I've decided against 
that.

The webrev with the proposed changes and test are here:
   http://cr.openjdk.java.net/~alanb/8198928/webrev/index.html

In terms of behavior changes then an observer polling 
isConnectionPending will now observe the connection is pending even if 
the channel is configured blocking. Previously, isConnectionPending 
would never return true if connect was called when configured blocking.

Another observer polling with getRemoteAddress will now observe the 
remote address while the connection is pending. This was always true 
with socket channels configured non-blocking, not so with channels 
configured blocking. The spec for getRemoteAddress is that the remote 
address is only returned when the channel is connected but it can be 
useful to observe it while a connection is established. I will create a 
separate JIRA issue to track this as it's a long standing spec vs. 
implementation difference.

As I mentioned, if the connection cannot be established, there is a 
small window for other threads to call finishConnect or some other 
operation before the channel is closed. This is mostly benign except on 
platforms where the connection error isn't available after it has been 
obtained. To deal with this requires checkConnect to check for POLLHUP, 
which is should have done anyway. While in the area I did some cleanup 
on the Windows implementation of checkConnect.

All tests are passing with these changes, including the HTTP client 
tests where this issue was initially observed as a test timing out 
intermittently.

-Alan



More information about the nio-dev mailing list