Performance of pooling virtual threads vs. semaphores

robert engels rengels at
Thu May 30 01:15:14 UTC 2024

You can see from the flame graph that the system is spending all of its time scheduling VTs and sleeping - the “work” is negligible. The is the carrier thread on my system with the most “work” (execute()) - most of the carrier threads are doing no work at all. Every carrier is spending its time going to sleep and waking back up.

> On May 29, 2024, at 7:10 PM, robert engels <rengels at> wrote:
> Ignore that - the first set of metrics are in ms.
> But the wall time within a numTasks is nearly identical regardless of scenario.
>> On May 29, 2024, at 7:09 PM, robert engels <rengels at <mailto:rengels at>> wrote:
>> But looking at the numbers some more they don’t make sense. The total times go way down, moving from 10_000 tasks to 100_000 tasks. Then they go up again (expected) moving from 100_000 to 1_000_000
>> I don’t think the wall time should ever go down when the number of tasks go up, so something doesn’t seem right.
>>> On May 29, 2024, at 6:03 PM, Attila Kelemen <attila.kelemen85 at <mailto:attila.kelemen85 at>> wrote:
>>> Yeah, just realized that and sent my email pretty much literally a second after your email :)
>>> Anyway, while yes in theory 1M threads are contending for the semaphore, but I don't think that should be a problem, because the contention is rather theoretical, since those "contending" VTs are just sitting in a queue, and after each release only one of them should be released. Also, I think Liam's comparison is fair, because none of the other two methods push back, so pushing back only in the VT version would be very unfair.
>>> robert engels <rengels at <mailto:rengels at>> ezt írta (időpont: 2024. máj. 30., Cs, 0:58):
>>> I remember that too, but in this case I don’t think it is the cause.
>>> In the bounded/pooled thread scenario - you are only scheduling 600 threads (either platform or virtual).
>>> In “scenario #2” all 1M virtual threads are created and are contending on a sempahore. This contention on a single resource does not occur in the other scenarios - this will lead to thrashing of the scheduler.
>>> I suspect if it is run under a profiler it will be obvious. With 128 carrier threads, you have increased the contention over a typical machine by an order of magnitude.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 245369 bytes
Desc: not available
URL: <>

More information about the loom-dev mailing list