[External] : Re: Experience using virtual threads in EA 23-loom+4-102

Sat Jul 6 20:23:58 UTC 2024

> On 4 Jul 2024, at 19:41, Robert Engels <robaho at icloud.com> wrote:
> 
>> 
>> Consider 100 tasks, each running for 1 CPU second on a 1 core machine (although it works the same way with any number of cores). With time-sharing, the average (and median) latency would be 100s. Without time sharing the average (and median) latency would be almost half that.
> 
> 
> Yes, but the tail latency for timesharing is 100s.
> 
> Without timesharing, the tail latency is more than 5000 secs - or an order of magnitude more.

It would also be 100s. Time sharing makes things strictly worse in this case, and doesn’t even help tail latencies.

> 
> I don’t have hard numbers, I can only look to why most OS’s use timeslicing, and why Go had to add it, and make an educated guess.

OSes use timesharing for very different reasons than improving server workloads (and as far as we know — it doesn’t), and Go added them because there are situations where time sharing is helpful, but Java offers an even better solution than adding time sharing in the virtual thread scheduler.

In any event, there’s no point discussing the hypotheticals of scheduling. They are well known and fairly basic in this field. What isn’t well known is how common are server workloads where time sharing helps in practice.

> 
> I am not saying this needs to change, but I think it needs to be better communicated to the community and what the ramifications are (e.g. atomic spin loops can deadlock the system).

There are bigger problems than deadlocks if you use spin loops — with or without time sharing — when you have many thousands of threads.

The most relevant thing to communicate, and we have, is that virtual threads’ effectiveness stems from there being a very high number of them. Everything else follows.

— Ron