Performance of pooling virtual threads vs. semaphores

Attila Kelemen attila.kelemen85 at
Thu May 30 12:06:51 UTC 2024

Though the additional work the VT has to do is understandable. However, I
don't see them explaining these measurements. Because in the case of 10k
tasks VT wins over FJP, but with 1M tasks, VT loses to FJP. What is the
source of the scaling difference, when there are still only 128 carriers,
and 600 concurrent threads in both cases? If this was merely more work,
then I would expect to see the same relative difference between FJP and VT
when there are 10k tasks and when there are 1M tasks. Just a wild naive
guess: Could the GC scale worse for that many VTs, or is that a stupid idea?

> If the concurrency for the virtual thread run is limited to the same
> value as the thread count in the thread pool runs then you are unlikely
> to see benefit. The increased CPU time probably isn't too surprising
> either. In the two runs with threads then the N task are queued once. In
> the virtual thread run then the tasks for the N virtual threads may be
> queued up to 4 times, one for the initial submit, one waiting for
> semaphore permit, and twice for the two sleeps. Also when CPU
> utilization is low (as I assume it is here) then the FJP scan does tend
> up to show up in profiles.
> Has Chi looked into increasing the concurrency so that it's not limited
> to 600? Concurrency may need limited at finer grain the "real world
> program", but may not the number of threads.
> -Alan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the loom-dev mailing list