UNIX domain sockets bug on Windows
Alan Bateman
Alan.Bateman at oracle.com
Thu Nov 10 09:53:57 UTC 2022
I haven't seen any recent issues reported in this area. It's possible
this is related to something installed or some configuration on these
systems. If something were fundamentally broken then I would expect we
would have had many bug reports.
Do you know which releases of Windows this is? At one point we issues
with 3rd party network stacks that allowed creating AF_UNIX sockets on
Windows Server 2016 but didn't work correctly. This is why the code now
checks that the transport protocol is from MS. I guess I would also try
to find out if they have some security or other config on these systems.
If you can do some debugging on their systems then it would help to see
if there is a JDK issue or not.
-Alan
On 10/11/2022 09:06, Mike Hearn wrote:
> Hi,
>
> A customer reported that our app doesn't work on most but not all of
> their Windows machines. We cannot reproduce it but the offending line
> is simple:
>
> HttpClient.newBuilder().followRedirects(HttpClient.Redirect.NORMAL).build()
>
> The stack trace shows clearly that this is a bug in either Windows or
> Java. It attempts to open a UNIX domain socket as part of initializing
> the sockets subsystem, and this goes wrong:
>
> Caused by: java.io.IOException: Unable to establish loopback connection
> at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:101)
> at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:67)
> at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
> at java.base/sun.nio.ch.PipeImpl.<init>(PipeImpl.java:195)
> at java.base/sun.nio.ch.WEPollSelectorImpl.<init>(WEPollSelectorImpl.java:78)
> at java.base/sun.nio.ch.WEPollSelectorProvider.openSelector(WEPollSelectorProvider.java:33)
> at java.base/java.nio.channels.Selector.open(Selector.java:295)
> ... 11 more
> Caused by: java.net.SocketException: Invalid argument: connect
> at java.base/sun.nio.ch.UnixDomainSockets.connect0(Native Method)
> at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:148)
> at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:144)
> at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:851)
> at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:285)
> at java.base/sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:131)
> at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:83)
> ... 17 more
>
> In July the change to use a UNIX domain socket in PipeImpl was rolled
> back in JDK17 (JDK-8280233) due to the discovery of a race condition
> in winsock that Microsoft apparently aren't going to fix any time soon
> (or if they are, they won't backport it via Windows Update). But then
> it's mentioned that the rollback is temporary and in JDK-8280944 the
> selector wakeup code (which seems to be what's breaking here) is
> switched to use domain sockets again, because the specific race in
> winsock wouldn't hit that codepath,
>
> This bug seems to be slightly different, as the failure isn't a hang
> but rather always an exception. Therefore:
>
> 1. I am curious if anyone knows what might cause this specific error
> with "Invalid argument: connect"? Is it possible that some Windows
> users have UNIX domain sockets disabled by their admins, for example?
> The machines that fail are joined to a managed Windows network.
>
> 2. From reading the code it appears that issues may have been found
> during the initial implementation, because the pre-rollback code has a
> "noUnixDomainSockets" flag that's set if there's a failure to open or
> bind the server side of the socket, such that it will fall back to
> TCP. But the exception I'm seeing comes from trying to open the client
> side of the pipe (sc1 = SocketChannel.open(sa)), and there's no
> fallback code there. Maybe this code needs to be made more robust
> against UNIX domain sockets not working?
>
> 3. Overall it looks plausible that there are more bugs in Microsoft's
> domain socket implementation which only appear on some machines and
> not others. It's worth considering to not re-enable this code and
> continue with the TCP based wakeup implementation, or at the very
> least, provide a system property that allows the old codepath to be
> used.
>
> We bundle a lightly forked copy of JDK17 so my plan for now is to
> include commit de739ff6 by Azul which backports the rollback to 17u,
> and see if that works. But if anyone has better insight that would let
> us unfork again, that would be appreciated.
>
> thanks,
> -mike
More information about the nio-dev
mailing list