[External] : Re: Experience using virtual threads in EA 23-loom+4-102

Wed Jul 3 16:43:51 UTC 2024

I don't think that is correct - but I could be wrong.

With platform threads the min and max latency in a completely cpu bound scenario should be very close the the average with a completely fair scheduler (when all tasks/threads are submitted at the same time).

Without timesharing, the average will be the same, but the min and max latencies will be far off the average - as the tasks submitted first complete very quickly, and the tasks submitted at the end take a very long time because they need to have all of the tasks before them complete.

In regards to the the “enough cpus” comment, I only meant that if there are enough cpus and a “somewhat” balanced workload, it is unlikely that all of the cpu bounds tasks could consume all of the carrier threads given a random distribution. If you have more active tasks than cpus and the majority of the tasks are cpu bound, the IO tasks are going to suffer in a non-timesliced scenario - they will be stalled waiting for a carrier thread - even though the amount of cpu they need will be very small.

This has a lot of info on the subject https://docs.kernel.org/scheduler/sched-design-CFS.html <https://docs.kernel.org/scheduler/sched-design-CFS.html> including:

On real hardware, we can run only a single task at once, so we have to introduce the concept of “virtual runtime.” The virtual runtime of a task specifies when its next timeslice would start execution on the ideal multi-tasking CPU described above. In practice, the virtual runtime of a task is its actual runtime normalized to the total number of running tasks.

I recommend this  https://opensource.com/article/19/2/fair-scheduling-linux <https://opensource.com/article/19/2/fair-scheduling-linux> for an in-depth discussion on how the dynamic timeslices are computed.

The linux scheduler relies on timeslicing in order to have a “fair” system. I think most Java “server” type applications strive for fairness as well - i.e. long tail latencies in anything are VERY bad (thus the constant fight against long GC pauses - better to amortize those for consistency).

> On Jul 3, 2024, at 10:51 AM, Ron Pressler <ron.pressler at oracle.com> wrote:
> 
> 
> 
>> On 29 Jun 2024, at 19:04, robert engels <robaho at icloud.com> wrote:
>> 
>> I think the issue is for cpu bound work land - eg image conversions, etc. I think not having time sharing is only an issue for machines with low cpu counts. The issue is that tail latency gets out of control.
> 
> Latency is not generally improved by time sharing regardless of the number of CPUs. In some situations time sharing will make it (potentially much) better, and in others it will make it (potentially much) worse. For example, if you start many threads doing the *same* heavy computation on platform threads and compare that to virtual threads you’ll see that the virtual thread latencies will be much better than the platform thread latencies. If the computation done on the different threads is very different in its duration you’ll see the exact opposite occur.
> 
> 
>> IO tasks get blocked too easily. Every general workload OS has time sharing (that I’m aware of) has time sharing just for this reason. 
> 
> IO preempts threads both in the OS and with the virtual thread scheduler even without time sharing. The reason non-realtime OSes have time sharing is different, and has nothing to do with performance (realtime kernels is a different matter). The main reason for time sharing is to keep the machine responsive when it’s at 100% CPU to allow operator interaction, but this is already the case with virtual threads by virtue of the fact that they’re mapped to OS threads.
> 
> — Ron

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240703/3ee8fd8c/attachment-0001.htm>