UNIX domain sockets bug on Windows

Mike Hearn mike at hydraulic.software
Thu Nov 10 09:06:02 UTC 2022


Hi,

A customer reported that our app doesn't work on most but not all of
their Windows machines. We cannot reproduce it but the offending line
is simple:

   HttpClient.newBuilder().followRedirects(HttpClient.Redirect.NORMAL).build()

The stack trace shows clearly that this is a bug in either Windows or
Java. It attempts to open a UNIX domain socket as part of initializing
the sockets subsystem, and this goes wrong:

  Caused by: java.io.IOException: Unable to establish loopback connection
    at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:101)
    at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:67)
    at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
    at java.base/sun.nio.ch.PipeImpl.<init>(PipeImpl.java:195)
    at java.base/sun.nio.ch.WEPollSelectorImpl.<init>(WEPollSelectorImpl.java:78)
    at java.base/sun.nio.ch.WEPollSelectorProvider.openSelector(WEPollSelectorProvider.java:33)
    at java.base/java.nio.channels.Selector.open(Selector.java:295)
    ... 11 more
  Caused by: java.net.SocketException: Invalid argument: connect
    at java.base/sun.nio.ch.UnixDomainSockets.connect0(Native Method)
    at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:148)
    at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:144)
    at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:851)
    at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:285)
    at java.base/sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:131)
    at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:83)
    ... 17 more

In July the change to use a UNIX domain socket in PipeImpl was rolled
back in JDK17 (JDK-8280233) due to the discovery of a race condition
in winsock that Microsoft apparently aren't going to fix any time soon
(or if they are, they won't backport it via Windows Update). But then
it's mentioned that the rollback is temporary and in JDK-8280944 the
selector wakeup code (which seems to be what's breaking here) is
switched to use domain sockets again, because the specific race in
winsock wouldn't hit that codepath,

This bug seems to be slightly different, as the failure isn't a hang
but rather always an exception. Therefore:

1. I am curious if anyone knows what might cause this specific error
with "Invalid argument: connect"? Is it possible that some Windows
users have UNIX domain sockets disabled by their admins, for example?
The machines that fail are joined to a managed Windows network.

2. From reading the code it appears that issues may have been found
during the initial implementation, because the pre-rollback code has a
"noUnixDomainSockets" flag that's set if there's a failure to open or
bind the server side of the socket, such that it will fall back to
TCP. But the exception I'm seeing comes from trying to open the client
side of the pipe (sc1 = SocketChannel.open(sa)), and there's no
fallback code there. Maybe this code needs to be made more robust
against UNIX domain sockets not working?

3. Overall it looks plausible that there are more bugs in Microsoft's
domain socket implementation which only appear on some machines and
not others. It's worth considering to not re-enable this code and
continue with the TCP based wakeup implementation, or at the very
least, provide a system property that allows the old codepath to be
used.

We bundle a lightly forked copy of JDK17 so my plan for now is to
include commit de739ff6 by Azul which backports the rollback to 17u,
and see if that works. But if anyone has better insight that would let
us unfork again, that would be appreciated.

thanks,
-mike


More information about the nio-dev mailing list