Experience using virtual threads in EA 23-loom+4-102
robert engels
robaho at icloud.com
Sat Jun 29 18:11:35 UTC 2024
Sorry for all the typos. And to be clear I wasn’t suggesting that VT should have time sharing. Since platform threads are available to address a lot of these issues. For a data point, Go was forced to implement time sharing because too many things degenerated poorly.
> On Jun 29, 2024, at 1:05 PM, robert engels <robaho at icloud.com> wrote:
>
> I think the issue is for cpu bound work land - eg image conversions, etc. I think not having time sharing is only an issue for machines with low cpu counts. The issue is that tail latency gets out of control. IO tasks get blocked too easily. Every general workload OS has time sharing (that I’m aware of) has time sharing just for this reason.
>
>> On Jun 29, 2024, at 9:07 AM, Ron Pressler <ron.pressler at oracle.com> wrote:
>>
>>
>>
>>>> On 24 Jun 2024, at 21:12, Robert Engels <robaho at icloud.com> wrote:
>>>
>>>
>>> The problem with using VT for everything is that a VT is not time-sliced, so you could quickly consume all of the carrier threads and then you make no progress on the IO (fan out) requests - which is especially bad if they are simply calling out to other servers (less bad if doing lots of local disk io).
>>
>> The only reason why virtual threads do not time share is because we have yet to find a server workload for which time-sharing would help. Even the OS starts relying heavily on time-sharing only at 100% CPU, and people don’t generally think their servers behave well at 100% CPU. In other words, the only reason virtual threads don’t time share is that we haven’t found a problem that time sharing solves satisfactorily.
>>
>> Even theoretically, time sharing is about as likely to hurt as it is to help — when tasks consume similar amounts of CPU to each other, time sharing can very significantly hurt the average latency; on the other hand, when tasks vary a lot from each other in the amount of CPU they consume, time sharing could significantly improve the average latency but, again, we need to see such workloads before we can add time sharing. After all, we can only solve a problem once we see what it is.
>>
>> The reason why this isn’t obvious is that if you have some portion of tasks that vary greatly in their CPU consumption, that means that some consume a lot of CPU. If that portion is very small, it’s not clear that time sharing helps; if that portion is high, and remember — we’re talking about many thousands of threads at least — then it’s likely that the overall CPU consumption is very high, in which case it is also unclear that time-sharing can help.
>>
>> However, that we have not found realistic cases where time sharing helps does not mean they don’t exist. We would very much like to see realistic server instances where there is a problem that time sharing could solve.
>>
>> — Ron
More information about the loom-dev
mailing list