<div dir="auto"><div>Hi Alan,<div dir="auto"><br></div><div dir="auto">Thanks for your reply and for mentioning JEP 444.</div><div dir="auto">I’ve gone through the guidance in JEP 444 and have some understanding of it — which is exactly why I’m feeling a bit puzzled in practice and would really like to hear your thoughts.</div><div dir="auto"><br></div><div dir="auto">Background — ThreadLocal example (Aerospike)</div><div dir="auto">```java</div><div dir="auto">private static final ThreadLocal<byte[]> BufferThreadLocal = new ThreadLocal<byte[]>() {</div><div dir="auto">    @Override</div><div dir="auto">    protected byte[] initialValue() {</div><div dir="auto">        return new byte[DefaultBufferSize];</div><div dir="auto">    }</div><div dir="auto">};</div><div dir="auto">```</div><div dir="auto">This Aerospike code allocates a default 8KB byte[] whenever a new thread is created and stores it in a ThreadLocal for per-thread caching.</div><div dir="auto"><br></div><div dir="auto">My concern</div><div dir="auto">- With a traditional platform-thread pool, those ThreadLocal byte[] instances are effectively reused because threads are long-lived and pooled.</div><div dir="auto">- If we switch to creating a brand-new virtual thread per task (no pooling), each virtual thread gets its own fresh ThreadLocal byte[], which leads to many short-lived 8KB allocations.</div><div dir="auto">- That raises allocation rate and GC pressure (despite collectors like ZGC), because ThreadLocal caches aren’t reused when threads are ephemeral.</div><div dir="auto"><br></div><div dir="auto">So my question is: for applications originally designed around platform-thread pools, wouldn’t partially pooling virtual threads be beneficial? For example, Tomcat’s default max threads is 200 — if I keep a pool of 200 virtual threads, then when load exceeds that core size, a SynchronousQueue will naturally cause new virtual threads to be created on demand. This seems to preserve the behavior that ThreadLocal-based libraries expect, without losing the ability to expand under spikes. Since virtual threads are very lightweight, pooling a reasonable number (e.g., 200) seems to have negligible memory downside while retaining ThreadLocal cache effectiveness.</div><div dir="auto"><br></div><div dir="auto">Empirical test I ran</div><div dir="auto">(I ran a microbenchmark comparing an unpooled per-task virtual-thread executor and a ThreadPoolExecutor that keeps 200 core virtual threads.)</div><div dir="auto"><br></div><div dir="auto">```java</div><div dir="auto">public static void main(String[] args) throws InterruptedException {</div><div dir="auto">    Executor executor = Executors.newThreadPerTaskExecutor(Thread.ofVirtual().name("test-", 1).factory());</div><div dir="auto">    Executor executor2 = new ThreadPoolExecutor(</div><div dir="auto">        200,</div><div dir="auto">        Integer.MAX_VALUE,</div><div dir="auto">        0L,</div><div dir="auto">        java.util.concurrent.TimeUnit.SECONDS,</div><div dir="auto">        new SynchronousQueue<>(),</div><div dir="auto">        Thread.ofVirtual().name("test-threadpool-", 1).factory()</div><div dir="auto">    );</div><div dir="auto"><br></div><div dir="auto">    // Warm-up</div><div dir="auto">    for (int i = 0; i < 10100; i++) {</div><div dir="auto">        executor.execute(() -> {</div><div dir="auto">            // simulate I/O wait</div><div dir="auto">            try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto">        });</div><div dir="auto">        executor2.execute(() -> {</div><div dir="auto">            // simulate I/O wait</div><div dir="auto">            try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto">        });</div><div dir="auto">    }</div><div dir="auto"><br></div><div dir="auto">    // Ensure JIT + warm-up complete</div><div dir="auto">    Thread.sleep(5000);</div><div dir="auto"><br></div><div dir="auto">    long start = System.currentTimeMillis();</div><div dir="auto">    CountDownLatch countDownLatch = new CountDownLatch(50000);</div><div dir="auto">    for (int i = 0; i < 50000; i++) {</div><div dir="auto">        executor.execute(() -> {</div><div dir="auto">            try { Thread.sleep(100); countDownLatch.countDown(); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto">        });</div><div dir="auto">    }</div><div dir="auto">    countDownLatch.await();</div><div dir="auto">    System.out.println("thread time: " + (System.currentTimeMillis() - start) + " ms");</div><div dir="auto"><br></div><div dir="auto">    start = System.currentTimeMillis();</div><div dir="auto">    CountDownLatch countDownLatch2 = new CountDownLatch(50000);</div><div dir="auto">    for (int i = 0; i < 50000; i++) {</div><div dir="auto">        executor2.execute(() -> {</div><div dir="auto">            try { Thread.sleep(100); countDownLatch2.countDown(); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto">        });</div><div dir="auto">    }</div><div dir="auto">    countDownLatch.await();</div><div dir="auto">    System.out.println("thread pool time: " + (System.currentTimeMillis() - start) + " ms");</div><div dir="auto">}</div><div dir="auto">```</div><div dir="auto"><br></div><div dir="auto">Result summary</div><div dir="auto">- In my runs, the pooled virtual-thread executor (executor2) performed better than the unpooled per-task virtual-thread executor.</div><div dir="auto">- Even when I increased load by 10x or 100x, the pooled virtual-thread executor still showed better performance.</div><div dir="auto">- In realistic workloads, it seems pooling some virtual threads reduces allocation/GC overhead and improves throughput compared to strictly unpooled virtual threads.</div><div dir="auto"><br></div><div dir="auto">Final thought / request for feedback</div><div dir="auto">- From my perspective, for systems originally tuned for platform-thread pools, partially pooling virtual threads seems to have no obvious downside and can restore ThreadLocal cache effectiveness used by many third-party libraries.</div><div dir="auto">- If I’ve misunderstood JEP 444 recommendations, virtual-thread semantics, or ThreadLocal behavior, please point out what I’m missing. I’d appreciate your guidance.<br><br><div data-smartmail="gmail_signature" dir="auto">Best Regards.<br>Jianbin Chen, github-id: funky-eyes </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Alan Bateman <<a href="mailto:alan.bateman@oracle.com">alan.bateman@oracle.com</a>> 于 2026年1月23日周五 17:27写道：<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 23/01/2026 07:30, Jianbin Chen wrote:<br>

> :<br>

><br>

> So my question is:<br>

><br>

> **In scenarios where third-party libraries heavily rely on ThreadLocal <br>

> for caching / buffering (and we cannot change those libraries to use <br>

> object pools instead), is explicitly pooling virtual threads (using a <br>

> ThreadPoolExecutor with virtual thread factory) considered a <br>

> recommended / acceptable workaround?**<br>

><br>

> Or are there better / more idiomatic ways to handle this kind of <br>

> compatibility issue with legacy ThreadLocal-based libraries when <br>

> migrating to virtual threads?<br>

><br>

> I have already opened a related discussion in the Dubbo project (since <br>

> Dubbo is one of the libraries affected in our stack):<br>

><br>

> <a href="https://github.com/apache/dubbo/issues/16042" rel="noreferrer noreferrer" target="_blank">https://github.com/apache/dubbo/issues/16042</a><br>

><br>

> Would love to hear your thoughts — especially from people who have <br>

> experience running large-scale virtual-thread-based services with <br>

> mixed third-party dependencies.<br>

><br>

<br>

The guidelines that we put in JEP 444 [1] is to not pool virtual threads <br>

and to avoid caching costing resources in thread locals. Virtual threads <br>

support thread locals of course but that is not useful when some library <br>

is looking to share a costly resource between tasks that run on the same <br>

thread in a thread pool.<br>

<br>

I don't know anything about Aerospike but working with the maintainers <br>

of that library to re-work its buffer management seems like the right <br>

course of action here. Your mail says "byte buffers". If this is <br>

ByteBuffer it might be that they are caching direct buffers as they are <br>

expensive to create (and managed by the GC). Maybe they could look at <br>

using MemorySegment (it's easy to get a ByteBuffer view of a memory <br>

segment) and allocate from an arena that better matches the lifecycle.<br>

<br>

Hopefully others will share their experiences with migration as it is <br>

indeed challenging to migrate code developed for thread pools to work <br>

efficiently on virtual threads where there is 1-1 relationship between <br>

the task to execute and the thread.<br>

<br>

-Alan<br>

<br>

[1] <a href="https://openjdk.org/jeps/444#Thread-local-variables" rel="noreferrer noreferrer" target="_blank">https://openjdk.org/jeps/444#Thread-local-variables</a><br>

</blockquote></div></div></div>