UNIX domain sockets bug on Windows
Michael McMahon
michael.x.mcmahon at oracle.com
Thu Nov 10 14:50:27 UTC 2022
Considering that the PipeImpl is able to open and bind the server side
of the pipe
and the step that fails is opening a client to the server this suggests
that security software is the most likely explanation and it is
definitely not that
UNIX domain sockets are somehow disabled on the system. If that were
the case, then the initial creation of the server side would fail, and
it would
fall back to TCP at that point.
- Michael.
On 10/11/2022 13:51, Mike Hearn wrote:
> Thanks. Yes, it's possible it's an issue created by e.g. security
> software that's trying to intercept connections and doesn't understand
> domain sockets. They reported that it happens on Windows 11 for many
> people but the one person who used Windows 10 didn't see the issue.
> I'm not sure how to interpret that, as presumably any AV scanner would
> be the same, so I wonder if there's some interaction between Windows
> 11 and something else which isn't showing up more widely because it
> requires some other condition to hold true first.
>
> We can't debug on their systems unfortunately, nor reproduce the issue
> locally. For now the stack trace is all we've got. I'll ask about
> security software. I'm posting here in the hope that someone else
> recognizes the failure, or if not, that this thread is useful to
> future searchers.
>
>
> On Thu, 10 Nov 2022 at 10:54, Alan Bateman<Alan.Bateman at oracle.com> wrote:
>>
>> I haven't seen any recent issues reported in this area. It's possible
>> this is related to something installed or some configuration on these
>> systems. If something were fundamentally broken then I would expect we
>> would have had many bug reports.
>>
>> Do you know which releases of Windows this is? At one point we issues
>> with 3rd party network stacks that allowed creating AF_UNIX sockets on
>> Windows Server 2016 but didn't work correctly. This is why the code now
>> checks that the transport protocol is from MS. I guess I would also try
>> to find out if they have some security or other config on these systems.
>> If you can do some debugging on their systems then it would help to see
>> if there is a JDK issue or not.
>>
>> -Alan
>>
>> On 10/11/2022 09:06, Mike Hearn wrote:
>>> Hi,
>>>
>>> A customer reported that our app doesn't work on most but not all of
>>> their Windows machines. We cannot reproduce it but the offending line
>>> is simple:
>>>
>>> HttpClient.newBuilder().followRedirects(HttpClient.Redirect.NORMAL).build()
>>>
>>> The stack trace shows clearly that this is a bug in either Windows or
>>> Java. It attempts to open a UNIX domain socket as part of initializing
>>> the sockets subsystem, and this goes wrong:
>>>
>>> Caused by: java.io.IOException: Unable to establish loopback connection
>>> at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:101)
>>> at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:67)
>>> at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
>>> at java.base/sun.nio.ch.PipeImpl.<init>(PipeImpl.java:195)
>>> at java.base/sun.nio.ch.WEPollSelectorImpl.<init>(WEPollSelectorImpl.java:78)
>>> at java.base/sun.nio.ch.WEPollSelectorProvider.openSelector(WEPollSelectorProvider.java:33)
>>> at java.base/java.nio.channels.Selector.open(Selector.java:295)
>>> ... 11 more
>>> Caused by: java.net.SocketException: Invalid argument: connect
>>> at java.base/sun.nio.ch.UnixDomainSockets.connect0(Native Method)
>>> at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:148)
>>> at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:144)
>>> at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:851)
>>> at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:285)
>>> at java.base/sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:131)
>>> at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:83)
>>> ... 17 more
>>>
>>> In July the change to use a UNIX domain socket in PipeImpl was rolled
>>> back in JDK17 (JDK-8280233) due to the discovery of a race condition
>>> in winsock that Microsoft apparently aren't going to fix any time soon
>>> (or if they are, they won't backport it via Windows Update). But then
>>> it's mentioned that the rollback is temporary and in JDK-8280944 the
>>> selector wakeup code (which seems to be what's breaking here) is
>>> switched to use domain sockets again, because the specific race in
>>> winsock wouldn't hit that codepath,
>>>
>>> This bug seems to be slightly different, as the failure isn't a hang
>>> but rather always an exception. Therefore:
>>>
>>> 1. I am curious if anyone knows what might cause this specific error
>>> with "Invalid argument: connect"? Is it possible that some Windows
>>> users have UNIX domain sockets disabled by their admins, for example?
>>> The machines that fail are joined to a managed Windows network.
>>>
>>> 2. From reading the code it appears that issues may have been found
>>> during the initial implementation, because the pre-rollback code has a
>>> "noUnixDomainSockets" flag that's set if there's a failure to open or
>>> bind the server side of the socket, such that it will fall back to
>>> TCP. But the exception I'm seeing comes from trying to open the client
>>> side of the pipe (sc1 = SocketChannel.open(sa)), and there's no
>>> fallback code there. Maybe this code needs to be made more robust
>>> against UNIX domain sockets not working?
>>>
>>> 3. Overall it looks plausible that there are more bugs in Microsoft's
>>> domain socket implementation which only appear on some machines and
>>> not others. It's worth considering to not re-enable this code and
>>> continue with the TCP based wakeup implementation, or at the very
>>> least, provide a system property that allows the old codepath to be
>>> used.
>>>
>>> We bundle a lightly forked copy of JDK17 so my plan for now is to
>>> include commit de739ff6 by Azul which backports the rollback to 17u,
>>> and see if that works. But if anyone has better insight that would let
>>> us unfork again, that would be appreciated.
>>>
>>> thanks,
>>> -mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20221110/ef7bddcd/attachment-0001.htm>
More information about the nio-dev
mailing list