[External] : Re: Experience using virtual threads in EA 23-loom+4-102

Thu Jul 4 18:41:03 UTC 2024

> 
> Consider 100 tasks, each running for 1 CPU second on a 1 core machine (although it works the same way with any number of cores). With time-sharing, the average (and median) latency would be 100s. Without time sharing the average (and median) latency would be almost half that.

Yes, but the tail latency for timesharing is 100s.

Without timesharing, the tail latency is more than 5000 secs - or an order of magnitude more. So much more in fact, that most requests will be cancelled due to timeout and resubmitted which further degrades the situation as the number of retry requests explode.

As to a realistic workload, I imagine all of them. All the services I can think of strive to minimize tail latency, not the average latency - or at least the 90-99% latency.

People can’t really distinguish 2x with small numbers - 50x it is obvious (100ms ms vs 200 ms — 100ms vs 5 secs). Human perception/responsiveness expectations come into play.

In some ways the low median hurts the non-timesliced case because the user/system expects a response within X and when it doesn’t happen (more than 50% of the time) they cancel and resubmit the request suspecting an error (or just human impatience).

Obviously, the request volume and single average task duration play a large part in this, but it doesn’t seem that “show me a workload” is a fair argument - timeslicing for fairness has been accepted for decades. And “fair” systems are the expected systems. If that weren’t the case, most server Linux installations would be configured with timeslicing turned off - which they aren’t. Modern developers rely on timeslicing when designing systems.

All of the above doesn’t even address the case of IO tasks being blocked by non-preemptible cpu bound tasks - which is often the strongest argument for timeslicing, especially in low cpu count systems with varied workloads. Which is also why the OS scheduler (in a non-preemtible OS) will play certain tricks, like running all ready to run IO bound tasks before scheduling any new tasks (which may be cpu bound). Similarly, even with timeslicing, most schedulers will prioritize running tasks that the OS estimates will run for a small amount then give up their slice due to IO.

I don’t have hard numbers, I can only look to why most OS’s use timeslicing, and why Go had to add it, and make an educated guess.

Luckily, Java also has platform threads, so even with non-timesliced VT’s, there are work-arounds available.

I am not saying this needs to change, but I think it needs to be better communicated to the community and what the ramifications are (e.g. atomic spin loops can deadlock the system).

> On Jul 4, 2024, at 1:02 PM, Ron Pressler <ron.pressler at oracle.com> wrote:
> 
> 
> 
>> On 3 Jul 2024, at 17:43, Robert Engels <robaho at icloud.com> wrote:
>> 
>> I don't think that is correct - but I could be wrong.
>> 
>> With platform threads the min and max latency in a completely cpu bound scenario should be very close the the average with a completely fair scheduler (when all tasks/threads are submitted at the same time).
>> 
>> Without timesharing, the average will be the same, but the min and max latencies will be far off the average - as the tasks submitted first complete very quickly, and the tasks submitted at the end take a very long time because they need to have all of the tasks before them complete.
> 
> 
> Consider 100 tasks, each running for 1 CPU second on a 1 core machine (although it works the same way with any number of cores). With time-sharing, the average (and median) latency would be 100s. Without time sharing the average (and median) latency would be almost half that.
> 
>> 
>> The linux scheduler relies on timeslicing in order to have a “fair” system. I think most Java “server” type applications strive for fairness as well - i.e. long tail latencies in anything are VERY bad (thus the constant fight against long GC pauses - better to amortize those for consistency).
> 
> We are familiar with the theory and design of time sharing. The one and only reason that the virtual thread scheduler doesn’t employ it is that we’ve yet to find realistic server workload for which it helpst. If someone brings to our attention some realistic server workloads where it helps, then we’ll have an actual problem we can try solving.
> 
> — Ron

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240704/275ae914/attachment-0001.htm>