Performance Issues with Virtual Threads + ThreadLocal Caching in Third-Party Libraries (JDK 25)

Fri Jan 23 12:47:37 UTC 2026

Hi Robert,

Thanks you, but I'm a bit confused. In the example above, I only set the
core pool size to 200 virtual threads, but for the specific test case we’re
talking about, the concurrency isn’t actually being limited by the pool
size at all. Since the maximum thread count is Integer.MAX_VALUE and it’s
using a SynchronousQueue, tasks are handed off immediately and a new thread
gets created to run them right away anyway.

Best Regards.
Jianbin Chen, github-id: funky-eyes

robert engels <robaho at me.com> 于 2026年1月23日周五 20:28写道：

> Try using a semaphore to limit the maximum number of tasks in progress at
> anyone time - that is what is causing your memory spike. Think of it this
> way since VT threads are so cheap to create - you are essentially creating
> them all at once - making the working set size equally to the maximum.  So
> you have N * WSS, where as in the other you have POOLSIZE * WSS.
>
> On Jan 23, 2026, at 4:14 AM, Jianbin Chen <jianbin at apache.org> wrote:
>
> 
> Hi Alan,
>
> Thanks for your reply and for mentioning JEP 444.
> I’ve gone through the guidance in JEP 444 and have some understanding of
> it — which is exactly why I’m feeling a bit puzzled in practice and would
> really like to hear your thoughts.
>
> Background — ThreadLocal example (Aerospike)
> ```java
> private static final ThreadLocal<byte[]> BufferThreadLocal = new
> ThreadLocal<byte[]>() {
>     @Override
>     protected byte[] initialValue() {
>         return new byte[DefaultBufferSize];
>     }
> };
> ```
> This Aerospike code allocates a default 8KB byte[] whenever a new thread
> is created and stores it in a ThreadLocal for per-thread caching.
>
> My concern
> - With a traditional platform-thread pool, those ThreadLocal byte[]
> instances are effectively reused because threads are long-lived and pooled.
> - If we switch to creating a brand-new virtual thread per task (no
> pooling), each virtual thread gets its own fresh ThreadLocal byte[], which
> leads to many short-lived 8KB allocations.
> - That raises allocation rate and GC pressure (despite collectors like
> ZGC), because ThreadLocal caches aren’t reused when threads are ephemeral.
>
> So my question is: for applications originally designed around
> platform-thread pools, wouldn’t partially pooling virtual threads be
> beneficial? For example, Tomcat’s default max threads is 200 — if I keep a
> pool of 200 virtual threads, then when load exceeds that core size, a
> SynchronousQueue will naturally cause new virtual threads to be created on
> demand. This seems to preserve the behavior that ThreadLocal-based
> libraries expect, without losing the ability to expand under spikes. Since
> virtual threads are very lightweight, pooling a reasonable number (e.g.,
> 200) seems to have negligible memory downside while retaining ThreadLocal
> cache effectiveness.
>
> Empirical test I ran
> (I ran a microbenchmark comparing an unpooled per-task virtual-thread
> executor and a ThreadPoolExecutor that keeps 200 core virtual threads.)
>
> ```java
> public static void main(String[] args) throws InterruptedException {
>     Executor executor =
> Executors.newThreadPerTaskExecutor(Thread.ofVirtual().name("test-",
> 1).factory());
>     Executor executor2 = new ThreadPoolExecutor(
>         200,
>         Integer.MAX_VALUE,
>         0L,
>         java.util.concurrent.TimeUnit.SECONDS,
>         new SynchronousQueue<>(),
>         Thread.ofVirtual().name("test-threadpool-", 1).factory()
>     );
>
>     // Warm-up
>     for (int i = 0; i < 10100; i++) {
>         executor.execute(() -> {
>             // simulate I/O wait
>             try { Thread.sleep(100); } catch (InterruptedException e) {
> throw new RuntimeException(e); }
>         });
>         executor2.execute(() -> {
>             // simulate I/O wait
>             try { Thread.sleep(100); } catch (InterruptedException e) {
> throw new RuntimeException(e); }
>         });
>     }
>
>     // Ensure JIT + warm-up complete
>     Thread.sleep(5000);
>
>     long start = System.currentTimeMillis();
>     CountDownLatch countDownLatch = new CountDownLatch(50000);
>     for (int i = 0; i < 50000; i++) {
>         executor.execute(() -> {
>             try { Thread.sleep(100); countDownLatch.countDown(); } catch
> (InterruptedException e) { throw new RuntimeException(e); }
>         });
>     }
>     countDownLatch.await();
>     System.out.println("thread time: " + (System.currentTimeMillis() -
> start) + " ms");
>
>     start = System.currentTimeMillis();
>     CountDownLatch countDownLatch2 = new CountDownLatch(50000);
>     for (int i = 0; i < 50000; i++) {
>         executor2.execute(() -> {
>             try { Thread.sleep(100); countDownLatch2.countDown(); } catch
> (InterruptedException e) { throw new RuntimeException(e); }
>         });
>     }
>     countDownLatch.await();
>     System.out.println("thread pool time: " + (System.currentTimeMillis()
> - start) + " ms");
> }
> ```
>
> Result summary
> - In my runs, the pooled virtual-thread executor (executor2) performed
> better than the unpooled per-task virtual-thread executor.
> - Even when I increased load by 10x or 100x, the pooled virtual-thread
> executor still showed better performance.
> - In realistic workloads, it seems pooling some virtual threads reduces
> allocation/GC overhead and improves throughput compared to strictly
> unpooled virtual threads.
>
> Final thought / request for feedback
> - From my perspective, for systems originally tuned for platform-thread
> pools, partially pooling virtual threads seems to have no obvious downside
> and can restore ThreadLocal cache effectiveness used by many third-party
> libraries.
> - If I’ve misunderstood JEP 444 recommendations, virtual-thread semantics,
> or ThreadLocal behavior, please point out what I’m missing. I’d appreciate
> your guidance.
>
> Best Regards.
> Jianbin Chen, github-id: funky-eyes
>
> Alan Bateman <alan.bateman at oracle.com> 于 2026年1月23日周五 17:27写道：
>
>> On 23/01/2026 07:30, Jianbin Chen wrote:
>> > :
>> >
>> > So my question is:
>> >
>> > **In scenarios where third-party libraries heavily rely on ThreadLocal
>> > for caching / buffering (and we cannot change those libraries to use
>> > object pools instead), is explicitly pooling virtual threads (using a
>> > ThreadPoolExecutor with virtual thread factory) considered a
>> > recommended / acceptable workaround?**
>> >
>> > Or are there better / more idiomatic ways to handle this kind of
>> > compatibility issue with legacy ThreadLocal-based libraries when
>> > migrating to virtual threads?
>> >
>> > I have already opened a related discussion in the Dubbo project (since
>> > Dubbo is one of the libraries affected in our stack):
>> >
>> > https://github.com/apache/dubbo/issues/16042
>> >
>> > Would love to hear your thoughts — especially from people who have
>> > experience running large-scale virtual-thread-based services with
>> > mixed third-party dependencies.
>> >
>>
>> The guidelines that we put in JEP 444 [1] is to not pool virtual threads
>> and to avoid caching costing resources in thread locals. Virtual threads
>> support thread locals of course but that is not useful when some library
>> is looking to share a costly resource between tasks that run on the same
>> thread in a thread pool.
>>
>> I don't know anything about Aerospike but working with the maintainers
>> of that library to re-work its buffer management seems like the right
>> course of action here. Your mail says "byte buffers". If this is
>> ByteBuffer it might be that they are caching direct buffers as they are
>> expensive to create (and managed by the GC). Maybe they could look at
>> using MemorySegment (it's easy to get a ByteBuffer view of a memory
>> segment) and allocate from an arena that better matches the lifecycle.
>>
>> Hopefully others will share their experiences with migration as it is
>> indeed challenging to migrate code developed for thread pools to work
>> efficiently on virtual threads where there is 1-1 relationship between
>> the task to execute and the thread.
>>
>> -Alan
>>
>> [1] https://openjdk.org/jeps/444#Thread-local-variables
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20260123/8eeb9a2f/attachment-0001.htm>