<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Considering that the PipeImpl is able to open and bind the server
side of the pipe<br>
and the step that fails is opening a client to the server this
suggests<br>
that security software is the most likely explanation and it is
definitely not that<br>
UNIX domain sockets are somehow disabled on the system. If that
were<br>
the case, then the initial creation of the server side would fail,
and it would<br>
fall back to TCP at that point.</p>
<p>- Michael.<br>
</p>
<p>On 10/11/2022 13:51, Mike Hearn wrote:<br>
</p>
<blockquote type="cite" cite="mid:CAGv+2bNwg02ZCUVhDd199dDZehgqXO+c1S75BhEY_caTnGTQBQ@mail.gmail.com">
<pre class="moz-quote-pre" wrap="">Thanks. Yes, it's possible it's an issue created by e.g. security
software that's trying to intercept connections and doesn't understand
domain sockets. They reported that it happens on Windows 11 for many
people but the one person who used Windows 10 didn't see the issue.
I'm not sure how to interpret that, as presumably any AV scanner would
be the same, so I wonder if there's some interaction between Windows
11 and something else which isn't showing up more widely because it
requires some other condition to hold true first.
We can't debug on their systems unfortunately, nor reproduce the issue
locally. For now the stack trace is all we've got. I'll ask about
security software. I'm posting here in the hope that someone else
recognizes the failure, or if not, that this thread is useful to
future searchers.
On Thu, 10 Nov 2022 at 10:54, Alan Bateman <a class="moz-txt-link-rfc2396E" href="mailto:Alan.Bateman@oracle.com"><Alan.Bateman@oracle.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
I haven't seen any recent issues reported in this area. It's possible
this is related to something installed or some configuration on these
systems. If something were fundamentally broken then I would expect we
would have had many bug reports.
Do you know which releases of Windows this is? At one point we issues
with 3rd party network stacks that allowed creating AF_UNIX sockets on
Windows Server 2016 but didn't work correctly. This is why the code now
checks that the transport protocol is from MS. I guess I would also try
to find out if they have some security or other config on these systems.
If you can do some debugging on their systems then it would help to see
if there is a JDK issue or not.
-Alan
On 10/11/2022 09:06, Mike Hearn wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi,
A customer reported that our app doesn't work on most but not all of
their Windows machines. We cannot reproduce it but the offending line
is simple:
HttpClient.newBuilder().followRedirects(HttpClient.Redirect.NORMAL).build()
The stack trace shows clearly that this is a bug in either Windows or
Java. It attempts to open a UNIX domain socket as part of initializing
the sockets subsystem, and this goes wrong:
Caused by: java.io.IOException: Unable to establish loopback connection
at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:101)
at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:67)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
at java.base/sun.nio.ch.PipeImpl.<init>(PipeImpl.java:195)
at java.base/sun.nio.ch.WEPollSelectorImpl.<init>(WEPollSelectorImpl.java:78)
at java.base/sun.nio.ch.WEPollSelectorProvider.openSelector(WEPollSelectorProvider.java:33)
at java.base/java.nio.channels.Selector.open(Selector.java:295)
... 11 more
Caused by: java.net.SocketException: Invalid argument: connect
at java.base/sun.nio.ch.UnixDomainSockets.connect0(Native Method)
at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:148)
at java.base/sun.nio.ch.UnixDomainSockets.connect(UnixDomainSockets.java:144)
at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:851)
at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:285)
at java.base/sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:131)
at java.base/sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:83)
... 17 more
In July the change to use a UNIX domain socket in PipeImpl was rolled
back in JDK17 (JDK-8280233) due to the discovery of a race condition
in winsock that Microsoft apparently aren't going to fix any time soon
(or if they are, they won't backport it via Windows Update). But then
it's mentioned that the rollback is temporary and in JDK-8280944 the
selector wakeup code (which seems to be what's breaking here) is
switched to use domain sockets again, because the specific race in
winsock wouldn't hit that codepath,
This bug seems to be slightly different, as the failure isn't a hang
but rather always an exception. Therefore:
1. I am curious if anyone knows what might cause this specific error
with "Invalid argument: connect"? Is it possible that some Windows
users have UNIX domain sockets disabled by their admins, for example?
The machines that fail are joined to a managed Windows network.
2. From reading the code it appears that issues may have been found
during the initial implementation, because the pre-rollback code has a
"noUnixDomainSockets" flag that's set if there's a failure to open or
bind the server side of the socket, such that it will fall back to
TCP. But the exception I'm seeing comes from trying to open the client
side of the pipe (sc1 = SocketChannel.open(sa)), and there's no
fallback code there. Maybe this code needs to be made more robust
against UNIX domain sockets not working?
3. Overall it looks plausible that there are more bugs in Microsoft's
domain socket implementation which only appear on some machines and
not others. It's worth considering to not re-enable this code and
continue with the TCP based wakeup implementation, or at the very
least, provide a system property that allows the old codepath to be
used.
We bundle a lightly forked copy of JDK17 so my plan for now is to
include commit de739ff6 by Azul which backports the rollback to 17u,
and see if that works. But if anyone has better insight that would let
us unfork again, that would be appreciated.
thanks,
-mike
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
</blockquote>
</body>
</html>