Potential lost write IO notifications in Windows when using virtual threads

Matthew Swift matthew.swift at gmail.com
Thu Feb 12 15:21:37 UTC 2026


Thanks for your quick reply Robert. I'm quite confident that there's no
race condition here, well not in the application code at least (I hope I
don't sound too arrogant). I'm not even sure how an application layer race
condition could explain this behavior to be honest. The client and server
are separate processes. The client is blocked on Socket.read() and the
server is blocked on Socket.write(). Logically, one of them should be able
to proceed - they cannot both be blocked while reading from / writing to
the exact same TCP connection.

I did enable JFR recording of both SocketRead and SocketWrite events for
both the client and server processes using a threshold of 0 in order to
capture all events. For example:

    java
-XX:StartFlightRecording:filename=client.jfr,dumponexit=true,jdk.SocketRead#threshold=0,jdk.SocketWrite#threshold=0
JndiSearch.java

In the case of the JNDI based client I can see it receiving the first 30-40
packets and then waiting. Similarly, on the server side, I can see
equivalent write events before it blocks. The final write fails with a
duration of 2 minutes, once the server's write timeout thread kicks in and
aborts the connection. I can provide the JFR details if it helps, but I
don't want to bloat this thread unnecessarily.

I think Alan's identified the issue in his messages...

On Wed 11 Feb 2026, 17:08 robert engels, <robaho at me.com> wrote:

> I suspect you have a race condition in your server - virtual threads with
> the lower cost context switching often exacerbates this.
>
> In addition to the Java details I would dump the kernel level queue and
> socket details and add the logging to show exactly which socket is stuck
> along with these details.
>
> I’m not saying that there can’t possibly be a bug, but the inability to
> create a standalone test case seems to say it isn’t a bug.
>
> On Feb 11, 2026, at 10:01 AM, Matthew Swift <matthew.swift at gmail.com>
> wrote:
>
> 
> Hi,
>
> I suspect that I've discovered a bug in the IO polling mechanism used by
> virtual threads on Windows. I've been able to systematically reproduce the
> problem on Windows 2016, 2019 and 2022 Intel x64 using both JDK 25.0.2 and
> 25.0.3 EA build 01, and also JDK 27 EA 8 build. The problem occurs when the
> server attempts to send a large response message to the client. The
> server's write operation becomes blocked and is never notified when the
> socket becomes writable again:
>
> #203 "Connection writer LDAPS(connId=13 from=/127.0.0.1:61126 to=/
> 127.0.0.1:1636)" virtual WAITING 2026-02-11T12:15:57.559772800Z
>     at java.base/java.lang.VirtualThread.park(VirtualThread.java:738)
>     at java.base/java.lang.System$1.parkVirtualThread(System.java:2284)
>     at
> java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:367)
>     at java.base/sun.nio.ch.Poller.poll(Poller.java:197)
>     at java.base/sun.nio.ch.Poller.poll(Poller.java:144)
>     at java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:174)
>     at java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:200)
>     at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
>     at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:448)
>     at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:821)
>     at
> java.base/java.net.Socket$SocketOutputStream.implWrite(Socket.java:1086)
>     at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1076)
>     at
> java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:345)
>     at
> java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1306)
>     ...
>
> The read/write pollers are waiting:
>
> #132 "Read-Poller" RUNNABLE 2026-02-11T12:15:57.558775900Z
>     at java.base/sun.nio.ch.WEPoll.wait(Native Method)
>     at java.base/sun.nio.ch.WEPollPoller.poll(WEPollPoller.java:61)
>     at java.base/sun.nio.ch.Poller.pollerLoop(Poller.java:248)
>     at java.base/java.lang.Thread.run(Thread.java:1474)
>     at
> java.base/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:148)
>
> #133 "Write-Poller" RUNNABLE 2026-02-11T12:15:57.558775900Z
>     at java.base/sun.nio.ch.WEPoll.wait(Native Method)
>     at java.base/sun.nio.ch.WEPollPoller.poll(WEPollPoller.java:61)
>     at java.base/sun.nio.ch.Poller.pollerLoop(Poller.java:248)
>     at java.base/java.lang.Thread.run(Thread.java:1474)
>     at
> java.base/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:148)
>
> The jcmd Thread.vthread_pollers command reports a registered read/write
> poller, which is expected AFAIK:
>
>     Read I/O pollers:
>     [0] sun.nio.ch.WEPollPoller at 246b483f [registered = 1, owner =
> Thread[#132,Read-Poller,5,InnocuousThreadGroup]]
>
>     Write I/O pollers:
>     [0] sun.nio.ch.WEPollPoller at c523ae0 [registered = 1, owner =
> Thread[#133,Write-Poller,5,InnocuousThreadGroup]]
>
> For completeness, the scheduler's state is:
>
>     java.util.concurrent.ForkJoinPool at 484dbf67[Running, parallelism = 4,
> size = 4, active = 0, running = 0, steals = 2164, tasks = 0, submissions =
> 0, delayed = 3]
>
> Meanwhile, the client side is blocked waiting for the remainder of the
> server's response. I have reproduced the problem using several clients,
> some using blocking IO, some using async IO, and one using JDK's JNDI LDAP
> client as well. The client always uses platform threads. The problem only
> happens when the server is running on Windows, never Linux, and only when
> using virtual IO threads for read/write operations. I have not been able to
> reproduce the problem when using platform threads in the server for IO: I
> have a simple feature flag for switching the thread implementation. What's
> annoying is that I have been unable to isolate the problem into a
> standalone reproducible test case that I can publish here, despite several
> attempts. It only occurs in our server application for some reason. I
> wonder whether it could be due to the server's mixed use of platform and
> virtual threads for IO in different sub-systems (we're in the process of
> migrating). Could the polling mechanism used for virtual threads on Windows
> be interacting with the polling mechanism used for platform threads? I'm
> just guessing, sadly.
>
> In summary, I strongly suspect that there is a bug in the Windows IO
> poller that is being used for virtual threads, but I have no idea how to
> proceed in debugging it further. Do you have any suggestions as to how I
> may proceed?
>
> Thanks in advance,
> Matt
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20260212/65d7905b/attachment-0001.htm>


More information about the loom-dev mailing list