Potential lost write IO notifications in Windows when using virtual threads
Matthew Swift
matthew.swift at gmail.com
Thu Feb 12 15:22:11 UTC 2026
Thanks for your response, Alan:
> Can you determine if a blocking read outstanding on the same connection
that another thread is blocked on the large write?
Yes, there are two server threads per connection: a reader virtual thread
that loops around reading incoming client requests from the socket using a
blocking Socket.read(); a writer virtual thread which polls responses from
a write queue and writes them to the socket using a blocking
Socket.write(). For additional context, we use a platform thread for
accepting connections. Each connection is handed off to a dedicated reader
which configures the TCP socket and performs any initial blocking
handshaking (e.g. TLS). The reader thread then creates the write queue and
writer thread before waiting for incoming requests. A separate write
timeout thread (also virtual) monitors active connections to see if any
writes have taken too long and aborts the connection if needed. FWIW, I
believe this architecture is very similar to Helidon's.
> Can you set SO_SNDBUF to a large value to see if that reduces or
eliminates the sightings?
Yes it does! As suggested, I set SO_SNDBUF to 2MB on the server and I have
been unable to reproduce the problem since with this setting. The problem
reoccurs as soon as I reduce it back down to e.g. 8KB.
The description of JDK-8334574 does indeed look very similar to the
architecture of our server and would explain the behavior we're seeing.
I'll spend some time looking at the reproducer to see if I can understand
how it differs from my futile attempts. :-)
Cheers,
Matt
On Wed, 11 Feb 2026 at 17:27, Alan Bateman <alan.bateman at oracle.com> wrote:
> On 11/02/2026 16:00, Matthew Swift wrote:
> > Hi,
> >
> > I suspect that I've discovered a bug in the IO polling mechanism used
> > by virtual threads on Windows. I've been able to systematically
> > reproduce the problem on Windows 2016, 2019 and 2022 Intel x64 using
> > both JDK 25.0.2 and 25.0.3 EA build 01, and also JDK 27 EA 8 build.
> > The problem occurs when the server attempts to send a large response
> > message to the client. The server's write operation becomes blocked
> > and is never notified when the socket becomes writable again:
>
> Can you determine if a blocking read outstanding on the same connection
> that another thread is blocked on the large write? Can you set SO_SNDBUF
> to a large value to see if that reduces or eliminates the sightings?
>
> We have been chasing an issue in the Windows Ancillary Function Driver
> for winsock that arises when a Windows SOCKET is use with two different
> completion ports or AFD device handles at the same time. Daniel Jeliński
> has some good analysis here:
> https://github.com/piscisaureus/wepoll/issues/35
>
> We have a prototype poller implementation that works around this issue
> but it has some performance impact (on Windows) so have been slow to
> process it [1].
>
> -Alan
>
> [1] https://bugs.openjdk.org/browse/JDK-8334574
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20260212/4925ff9c/attachment.htm>
More information about the loom-dev
mailing list