UNIX domain sockets bug on Windows

Mike Hearn mike at hydraulic.software
Wed Mar 8 13:01:12 UTC 2023


Thanks Michael, that's been a very useful suggestion. After I mailed,
we discovered a VM that could repro the problem. This let us check the
theory that it's FS virt related and indeed it is; disabling it fixes
the issue. It is still worth working around it in the JVM because not
everyone will see this thread and apps distributed via the MS Store
using their standard instructions will hit it.

To readers from the future! You have two ways to fix this if you're
using a JVM/Windows build with the issue. One way is to add the
following to the appropriate sections of your appx manifest:

<Properties>
  <virtualization:FileSystemWriteVirtualization>
    <virtualization:ExcludedDirectories>
      <virtualization:ExcludedDirectory>$(KnownFolder:LocalAppData)\Temp\JavaSockets</virtualization:ExcludedDirectory>
    </virtualization:ExcludedDirectories>
  </virtualization:FileSystemWriteVirtualization>
</Properties>
<Capabilities>
  <rescap:Capability Name="unvirtualizedResources" />
</Capabilities>

and then on Windows set the system property jdk.net.unixdomain.tmpdir
= "%LOCALAPPDATA%/Temp/JavaSockets"

Or a better way, head over to https://hydraulic.software/ and download
Conveyor which works around this bug and (sadly) several other Windows
bugs for you.








On Wed, 8 Mar 2023 at 12:35, Michael McMahon
<michael.x.mcmahon at oracle.com> wrote:
>
> If this filter is doing filename remapping, then one thing worth trying is to run with the system property jdk.net.unixdomain.tmpdir set to some alternate location. This specifies the location of the directory where we create the socket files for "automatically" bound sockets for ServerSocketChannel (ie when binding with a null address). This is what the NIO PipeImpl does when creating its loopback socket.
>
> Alternatively, you could try asking your customer to run a simple test of UNIX domain sockets on an affected machine and see what happens. Eg. the following snippet:
>
> var server = ServerSocketChannel.open(StandardProtocolFamily.UNIX);
> var name = Path.of("foo.sock");
> Files.deleteIfExists(name);
> server.bind(UnixDomainSocketAddress.of(name));
> var client = SocketChannel.open(server.getLocalAddress());
> var s1 = server.accept();
> System.out.printf("client: %s, server connection: %s\n", client.toString(), s1.toString());
>
> - Michael
>
> On 08/03/2023 10:14, Mike Hearn wrote:
>
> Unfortunately I need to resurrect this thread. Although in November we
> worked around the issue by patching the JVM we ship with our
> applications, now the apps being shipped by one of our customers are
> hitting the exact same thing. Obviously telling them to make their own
> JVM builds with patches is not a workable solution here, nor really
> for anyone else.
>
> Here's what we know:
>
> 1. The problem occurs only on some machines but not others (but the
> machines affected are affected reliably).
>
> 2. For unclear reasons on those machines the problem only occurs if
> the app is installed using an MSIX package (the type used by the
> Windows Store). If the app is run from an ordinary zip, network
> connections function correctly. This is probably why you find it hard
> to reproduce. I can supply such an app if it would help with testing.
>
> 3. We don't think it's security software related because the affected
> machines don't use any beyond the built-in Windows Defender (or put
> another way, if it is related, then that's the same as it being a
> Windows bug).
>
> Here's my guess as to what might be happening. Apps that are installed
> via the Windows Store or MSIX outside the store run using a
> lightweight form of file system virtualization. This is how Windows
> links files created in %APPDATA% to the app that created them,
> ensuring they get cleaned up correctly at uninstall time.
>
> https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/msix/desktop/desktop-to-uwp-behind-the-scenes__;!!ACWV5N9M2RV99hQ!LjA2A-_P6j5L4VE77_w_iMpceuHJIhdeVkj2dv2mTJgFEc_y3FPOOTfOTLAa8LkAODbJlqd5nHhfcAiJm_29sB6uMb8B$
>
> The implementation involves a kernel driver called bindflt. In theory
> this virtualization doesn't affect temporary directories, but the
> combination of UNIX domain sockets+bindflt is probably not very common
> at the moment, and it may be the case that the Windows team have
> shipped a regression in some versions. Now if that theory is correct,
> it doesn't really explain why some machines are affected and others
> aren't when they all appear to be running the same OS version. That is
> mysterious and I can't explain it, which counts against the bindflt
> theory. An alternative possibility is that it's another race in
> winsock; you previously had to do rollbacks due to the discovery of
> such races which implies there might be more. But then we face the
> question of why the packaging type appears to change the outcome on
> the machines that do experience the bug.
>
> Regardless of the cause the result is the same: new Javas can't be
> used for apps distributed in this way. The best workaround would be to
> detect that this codepath is failing with the stack trace in the first
> email in this thread and fall back to the old TCP codepath. Is there
> any chance you'd consider such a change? Also if anyone at Oracle has
> contacts on the Windows team, it'd be great to ask them to investigate
> the issue. I don't have any such contacts :(
>
> thanks,
> -mike


More information about the nio-dev mailing list