<div dir="auto"><div>Hi Alan,<div dir="auto"><br></div><div dir="auto">Thanks for your reply and for mentioning JEP 444.</div><div dir="auto">I’ve gone through the guidance in JEP 444 and have some understanding of it — which is exactly why I’m feeling a bit puzzled in practice and would really like to hear your thoughts.</div><div dir="auto"><br></div><div dir="auto">Background — ThreadLocal example (Aerospike)</div><div dir="auto">```java</div><div dir="auto">private static final ThreadLocal<byte[]> BufferThreadLocal = new ThreadLocal<byte[]>() {</div><div dir="auto"> @Override</div><div dir="auto"> protected byte[] initialValue() {</div><div dir="auto"> return new byte[DefaultBufferSize];</div><div dir="auto"> }</div><div dir="auto">};</div><div dir="auto">```</div><div dir="auto">This Aerospike code allocates a default 8KB byte[] whenever a new thread is created and stores it in a ThreadLocal for per-thread caching.</div><div dir="auto"><br></div><div dir="auto">My concern</div><div dir="auto">- With a traditional platform-thread pool, those ThreadLocal byte[] instances are effectively reused because threads are long-lived and pooled.</div><div dir="auto">- If we switch to creating a brand-new virtual thread per task (no pooling), each virtual thread gets its own fresh ThreadLocal byte[], which leads to many short-lived 8KB allocations.</div><div dir="auto">- That raises allocation rate and GC pressure (despite collectors like ZGC), because ThreadLocal caches aren’t reused when threads are ephemeral.</div><div dir="auto"><br></div><div dir="auto">So my question is: for applications originally designed around platform-thread pools, wouldn’t partially pooling virtual threads be beneficial? For example, Tomcat’s default max threads is 200 — if I keep a pool of 200 virtual threads, then when load exceeds that core size, a SynchronousQueue will naturally cause new virtual threads to be created on demand. This seems to preserve the behavior that ThreadLocal-based libraries expect, without losing the ability to expand under spikes. Since virtual threads are very lightweight, pooling a reasonable number (e.g., 200) seems to have negligible memory downside while retaining ThreadLocal cache effectiveness.</div><div dir="auto"><br></div><div dir="auto">Empirical test I ran</div><div dir="auto">(I ran a microbenchmark comparing an unpooled per-task virtual-thread executor and a ThreadPoolExecutor that keeps 200 core virtual threads.)</div><div dir="auto"><br></div><div dir="auto">```java</div><div dir="auto">public static void main(String[] args) throws InterruptedException {</div><div dir="auto"> Executor executor = Executors.newThreadPerTaskExecutor(Thread.ofVirtual().name("test-", 1).factory());</div><div dir="auto"> Executor executor2 = new ThreadPoolExecutor(</div><div dir="auto"> 200,</div><div dir="auto"> Integer.MAX_VALUE,</div><div dir="auto"> 0L,</div><div dir="auto"> java.util.concurrent.TimeUnit.SECONDS,</div><div dir="auto"> new SynchronousQueue<>(),</div><div dir="auto"> Thread.ofVirtual().name("test-threadpool-", 1).factory()</div><div dir="auto"> );</div><div dir="auto"><br></div><div dir="auto"> // Warm-up</div><div dir="auto"> for (int i = 0; i < 10100; i++) {</div><div dir="auto"> executor.execute(() -> {</div><div dir="auto"> // simulate I/O wait</div><div dir="auto"> try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto"> });</div><div dir="auto"> executor2.execute(() -> {</div><div dir="auto"> // simulate I/O wait</div><div dir="auto"> try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto"> });</div><div dir="auto"> }</div><div dir="auto"><br></div><div dir="auto"> // Ensure JIT + warm-up complete</div><div dir="auto"> Thread.sleep(5000);</div><div dir="auto"><br></div><div dir="auto"> long start = System.currentTimeMillis();</div><div dir="auto"> CountDownLatch countDownLatch = new CountDownLatch(50000);</div><div dir="auto"> for (int i = 0; i < 50000; i++) {</div><div dir="auto"> executor.execute(() -> {</div><div dir="auto"> try { Thread.sleep(100); countDownLatch.countDown(); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto"> });</div><div dir="auto"> }</div><div dir="auto"> countDownLatch.await();</div><div dir="auto"> System.out.println("thread time: " + (System.currentTimeMillis() - start) + " ms");</div><div dir="auto"><br></div><div dir="auto"> start = System.currentTimeMillis();</div><div dir="auto"> CountDownLatch countDownLatch2 = new CountDownLatch(50000);</div><div dir="auto"> for (int i = 0; i < 50000; i++) {</div><div dir="auto"> executor2.execute(() -> {</div><div dir="auto"> try { Thread.sleep(100); countDownLatch2.countDown(); } catch (InterruptedException e) { throw new RuntimeException(e); }</div><div dir="auto"> });</div><div dir="auto"> }</div><div dir="auto"> countDownLatch.await();</div><div dir="auto"> System.out.println("thread pool time: " + (System.currentTimeMillis() - start) + " ms");</div><div dir="auto">}</div><div dir="auto">```</div><div dir="auto"><br></div><div dir="auto">Result summary</div><div dir="auto">- In my runs, the pooled virtual-thread executor (executor2) performed better than the unpooled per-task virtual-thread executor.</div><div dir="auto">- Even when I increased load by 10x or 100x, the pooled virtual-thread executor still showed better performance.</div><div dir="auto">- In realistic workloads, it seems pooling some virtual threads reduces allocation/GC overhead and improves throughput compared to strictly unpooled virtual threads.</div><div dir="auto"><br></div><div dir="auto">Final thought / request for feedback</div><div dir="auto">- From my perspective, for systems originally tuned for platform-thread pools, partially pooling virtual threads seems to have no obvious downside and can restore ThreadLocal cache effectiveness used by many third-party libraries.</div><div dir="auto">- If I’ve misunderstood JEP 444 recommendations, virtual-thread semantics, or ThreadLocal behavior, please point out what I’m missing. I’d appreciate your guidance.<br><br><div data-smartmail="gmail_signature" dir="auto">Best Regards.<br>Jianbin Chen, github-id: funky-eyes </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Alan Bateman <<a href="mailto:alan.bateman@oracle.com">alan.bateman@oracle.com</a>> 于 2026年1月23日周五 17:27写道:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 23/01/2026 07:30, Jianbin Chen wrote:<br>
> :<br>
><br>
> So my question is:<br>
><br>
> **In scenarios where third-party libraries heavily rely on ThreadLocal <br>
> for caching / buffering (and we cannot change those libraries to use <br>
> object pools instead), is explicitly pooling virtual threads (using a <br>
> ThreadPoolExecutor with virtual thread factory) considered a <br>
> recommended / acceptable workaround?**<br>
><br>
> Or are there better / more idiomatic ways to handle this kind of <br>
> compatibility issue with legacy ThreadLocal-based libraries when <br>
> migrating to virtual threads?<br>
><br>
> I have already opened a related discussion in the Dubbo project (since <br>
> Dubbo is one of the libraries affected in our stack):<br>
><br>
> <a href="https://github.com/apache/dubbo/issues/16042" rel="noreferrer noreferrer" target="_blank">https://github.com/apache/dubbo/issues/16042</a><br>
><br>
> Would love to hear your thoughts — especially from people who have <br>
> experience running large-scale virtual-thread-based services with <br>
> mixed third-party dependencies.<br>
><br>
<br>
The guidelines that we put in JEP 444 [1] is to not pool virtual threads <br>
and to avoid caching costing resources in thread locals. Virtual threads <br>
support thread locals of course but that is not useful when some library <br>
is looking to share a costly resource between tasks that run on the same <br>
thread in a thread pool.<br>
<br>
I don't know anything about Aerospike but working with the maintainers <br>
of that library to re-work its buffer management seems like the right <br>
course of action here. Your mail says "byte buffers". If this is <br>
ByteBuffer it might be that they are caching direct buffers as they are <br>
expensive to create (and managed by the GC). Maybe they could look at <br>
using MemorySegment (it's easy to get a ByteBuffer view of a memory <br>
segment) and allocate from an arena that better matches the lifecycle.<br>
<br>
Hopefully others will share their experiences with migration as it is <br>
indeed challenging to migrate code developed for thread pools to work <br>
efficiently on virtual threads where there is 1-1 relationship between <br>
the task to execute and the thread.<br>
<br>
-Alan<br>
<br>
[1] <a href="https://openjdk.org/jeps/444#Thread-local-variables" rel="noreferrer noreferrer" target="_blank">https://openjdk.org/jeps/444#Thread-local-variables</a><br>
</blockquote></div></div></div>