<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">


<br class="">


<div><br class="">


<blockquote type="cite" class="">


<div class="">On 9 Jan 2023, at 18:34, Robert Engels <<a href="mailto:rengels@ix.netcom.com" class="">rengels@ix.netcom.com</a>> wrote:</div>


<br class="Apple-interchange-newline">


<div class="">


<div dir="auto" class="">


<div dir="ltr" class=""></div>


<div dir="ltr" class="">I think what is not given enough weight in the analysis is that long running tasks are usually deprioritized by the scheduler - not equally time slices - reducing the latency for short requests - increasing the latency for long/batch


 requests.</div>


</div>


</div>


</blockquote>


<div><br class="">


</div>


Actually, this is covered as a special case. Even assuming perfect time-sharing, its effectiveness for virtual thread use cases is unclear until we obtain more data from the field.</div>


<div><br class="">


<blockquote type="cite" class="">


<div class="">


<div dir="auto" class="">


<div dir="ltr" class=""><br class="">


</div>


<div dir="ltr" class="">This is expected today based on the Linux (and many other) schedulers. The vthread scheduler is breaking from this - which is fine if it has good reasons to do so. </div>


</div>


</div>


</blockquote>


<div><br class="">


</div>


<div>An OS scheduler must be a reasonable compromise for many kinds of threads. Virtual threads are optimised for transaction-processing workloads. I assume it will take some years to gather sufficient information from the field so that we can tweak our decisions


 based on data, but after spending years considering various hypotheses, I don’t see a reason to change what we have now without obtaining more data.</div>


<br class="">


<blockquote type="cite" class="">


<div class="">


<div dir="auto" class="">


<div dir="ltr" class=""><br class="">


</div>


<div dir="ltr" class="">You can read the Go rationale for adding time slicing into the scheduler here <a href="https://urldefense.com/v3/__https://github.com/golang/go/issues/10958__;!!ACWV5N9M2RV99hQ!Nz7jOp44g9n4er9z64ZGfRkEP3M1GkA253HsO9jvwl4nljkx_20O3ValbSn7bxBnidy4XmnTeiBovXCBUg$" class="">https://github.com/golang/go/issues/10958</a></div>


<div dir="ltr" class=""><br class="">


</div>


<div dir="ltr" class="">There were multiple issues it addressed - I’m not sure all of them apply to Java. </div>


</div>


</div>


</blockquote>


<div><br class="">


</div>


<div>When we studied those motivations some years ago, it appeared they do not apply to Java. Once again, we must only solve problems faced by Java developers in the field.</div>


<div><br class="">


</div>


<div>— Ron</div>


<br class="">


<blockquote type="cite" class="">


<div class="">


<div dir="auto" class="">


<div dir="ltr" class=""><br class="">


<blockquote type="cite" class="">On Jan 9, 2023, at 12:19 PM, Ron Pressler <<a href="mailto:ron.pressler@oracle.com" class="">ron.pressler@oracle.com</a>> wrote:<br class="">


<br class="">


</blockquote>


</div>


<blockquote type="cite" class="">


<div dir="ltr" class="">


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


I think it would be interesting to explain in more detail the effects of scheduling, and why the question of time-sharing is not obvious and so crucially depends on real-world data. </div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue"; min-height: 15px;" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


Suppose you are given 10 tasks, 5 of them have a processing duration of 0ms, and 5 of them have a duration of 100ms. For simplicity, let’s assume we have no parallelism. Both a shortest-task-first and a longest-task-first will complete all tasks in 500ms, but


 their average task latency will be quite different. That of the shortest-task-first scheduler will be 150ms (= (5*0 + 100 + 200 + 300 + 400 + 500)/10), while that of the longest-task-first scheduler will be 400ms (= 100 + 200 + 300 + 400 + 500 + 5*500)/10).


 A perfect time-sharing scheduler (with zero overhead and infinitesimal granularity) would yield an average latency of 250ms (= 0*5 + 500*5). Those are big differences!</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue"; min-height: 15px;" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


But now let’s consider a server where an infinite stream of requests arrive from the outside, half of them with a processing duration of 0ms and half with a duration of 100ms. Regardless of how we schedule the requests in the queue that may form, because the


 average request duration is 50ms, as long as the request rate is less than or equal to 20 req/s the server will be stable. If the rate is higher than that, the server will become unstable with requests piling up in an ever-growing queue and the latency will


 climb to infinity — again, regardless of scheduling policy.</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue"; min-height: 15px;" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


What about latency? The average latency will depend on the distribution of requests. Without time-sharing, it can range between 50ms and 100ms; with perfect time-sharing it may be much higher (i.e. worse). Perfect time-sharing will decrease the latency of the


 short tasks (to 0ms!) at the expense of increasing the latency of the long tasks.</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue"; min-height: 15px;" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


But say we conclude that reducing latencies of short tasks at the expense of long tasks is what everyone always wants; that’s not entirely obvious, but not completely unreasonable, either. Suppose that at the same request rate of 20 per second, the probability


 for a long task weren't 0.5 but 0.55, or that instead of 100ms it takes 110 ms. In that situation time sharing can no longer help — the server will destabilise. Alternatively, suppose that the probability of a long task is 0.05 or that its duration is 50 ms;


 time sharing is no longer effective, either. So at 20 req/s, within the band of 50-100ms or 0.05-0.5 probability time sharing can help; above it or below it — it doesn’t.</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue"; min-height: 15px;" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


Keeping in mind that time-sharing can’t actually be perfect and that no request can actually have a duration of zero, I hope it is now clearer why we’re so curious to find real-world cases and why simulations provide little insight. It’s easy to construct an


 artificial simulation at the operational band where time-sharing is effective, but it’s precisely because, in practice, it is most effective when the server is on the edge of stability and becomes gradually less effective the further away we are from that


 tipping-point that the most important questions become: how often do servers operate within that operational band, where exactly along that band do they commonly find themselves, and how does that situation arise in real-world scenarios? Only when we get real-wold


 data can we answer those questions and can consider the pros and cons, and only then can we either conclude that the work isn’t worth it or be able to satisfactorily justify it.</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


(Note that for the classic case where time sharing helps— some fixed set of background processing operations — there is no need to add time-sharing to the virtual thread scheduler, as a better solution is already available.)</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


<br class="">


</div>


<div style="margin: 0px; font-stretch: normal; font-size: 13px; line-height: normal; font-family: "Helvetica Neue";" class="">


— Ron</div>


</div>


</blockquote>


</div>


</div>


</blockquote>


</div>


<br class="">


</body>


</html>