<div dir="auto">Hi Ron,<div dir="auto"><br></div><div dir="auto">The claim in JEP is the same as in this email thread, so that is not much help. But now I don't need help anymore, because I found the explaination how the thread count, response time, request rate and thread-per-request are connected.</div><div dir="auto"><br></div><div dir="auto">Now what makes me bothered about the claims.</div><div dir="auto"><br></div><div dir="auto">Little's law connects throughput to concurrency. We agreed it has no connection to thread count. That's a disconnect between the claim about threads and Little's law dictating it.</div><div dir="auto"><br></div><div dir="auto">There's also the assumption that response time remains constant, but that's a mighty assumption - response time changes with thread count.</div><div dir="auto"><br></div><div dir="auto">There's also the claim of needing more threads. That's also not something that follows from thread-per-request. Essentially, thread-per-request is a constructive proof of having an infinite number of threads. How can one want more? Also, from a different angle - the number of threads in the thread-per-request needed does not depend on throughput at all.</div><div dir="auto"><br></div><div dir="auto">Just consider what the request rate means. It means that if you choose however small time frame, and however large request count, there is a nonzero probability that it will happen. Consequently the number of threads needed is arbitrarily large for any throughput. Which is just another way to say that the number of threads is effectively infinite and there is no point trying to connect it to Little's law. Request rate doesn't change the number of threads that can exist at any given time, it only changes the probability of observing any particular number of them in a fixed period of time.</div><div dir="auto"><br></div><div dir="auto">All this is only criticism of the formal mathematical claims made here and in the JEP. Nothing needs doing, if no one is interested in formal claims being perfect.</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">Alex</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 26 Jul 2022, 15:41 Ron Pressler, <<a href="mailto:ron.pressler@oracle.com">ron.pressler@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word;line-break:after-white-space">

<br>

<div><br>

<blockquote type="cite">

<div>On 26 Jul 2022, at 14:33, Alex Otenko <<a href="mailto:oleksandr.otenko@gmail.com" target="_blank" rel="noreferrer">oleksandr.otenko@gmail.com</a>> wrote:</div>

<br>

<div>

<div dir="auto">Hi Ron,

<div dir="auto"><br>

</div>

<div dir="auto">I think I can verbalize what bothered me all along.

<div dir="auto"><br>

</div>

<div dir="auto">I wish someone made a distinction between:</div>

<div dir="auto"><br>

</div>

<div dir="auto">Offered traffic - actual term; determined based on the time one thread spends on request.</div>

<div dir="auto"><br>

</div>

<div dir="auto">Capacity - I don't think this is the actual term. This is the actual thread count. If this is at or below offered traffic, the system is not stable. You can increase capacity until you get to the thread-per-request, which probably corresponds

 to +oo.</div>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

I don’t understand this sentence.</div>

<div><br>

<blockquote type="cite">

<div>

<div dir="auto">

<div dir="auto">

<div dir="auto"><br>

</div>

<div dir="auto">Concurrency as used in Little's law. This is measured in the same units as offered traffic, but is not the same as offered traffic, because the time used here is the actual response time, which includes all sorts of waits.</div>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div>None of that matters. Little’s law is a mathematical theorem about some unit arriving at some processing centre — a customer, a request, whatever — and for *that* unit, the theorem relates the average latency of performing that operation and the average

 rate of arrival of those things to the average number of those things existing concurrently in the centre. So, we pick requests as the things we look at, and everything follows. The theorem tells us how many requests, on average, are concurrently being processed,

 and since we’re assuming thread-per-request, this tells us how many threads are active, because *by definition* of thread-per-request a concurrent request takes at least one thread.</div>

<br>

<blockquote type="cite">

<div>

<div dir="auto">

<div dir="auto">

<div dir="auto"><br>

</div>

<div dir="auto">The confusing bit then is that we can't be talking of concurrency before capacity exceeds offered traffic, because the system is not stable, and after that adding threads only decreases concurrency.</div>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div><br>

</div>

<div>No one is talking about *adding* threads. The number of threads grows because rising throughput *makes it grow* in a thread-per-request system. Also, we’re not interested in what’s happening in a system in the process of crashing.</div>

<br>

<blockquote type="cite">

<div>

<div dir="auto">

<div dir="auto">

<div dir="auto"><br>

</div>

<div dir="auto"><br>

</div>

<div dir="auto">Then also the pragmatic angle. At which point, or for what systems should I say "yeah, we can't do this without Virtual threads", and at which point should I say "thread-per-request is the way to go".</div>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

As explained in JEP 425, there is absolutely no such point: Picking thread-per-request is the premise we’re taking as a given, not the conclusion. I.e. we assume thread-per-request, and the conclusion is that we need many threads. Virtual threads are designed

 to allow thread-per-request servers to achieve the maximum throughput allowable by the hardware.</div>

<div><br>

</div>

<div>Why do so many people want to pick thread-per-request? Because thread-per-request is the model that allows representing your application’s unit of concurrency with the platform’s unit of concurrency, and the Java platform has only one such unit: the thread.

 I.e. it is the only model that the language and the platform fully support. That is why asynchronous APIs are essentially DSLs and do not rely on the language’s basic composition constructs (loops, try/catch, try-with-resources etc.), why JFR yields less-than-informative

 profiles for such programs, and why debuggers can’t step through the logical flow of such programs.</div>

<div><br>

</div>

<div>So there is absolutely no point at which you’d say “we must do it like that”. But *if* you choose to do it like that then you’d need virtual threads if your concurrency exceeds ~1000.</div>

<div><br>

</div>

<div>Thread-per-request or async are neither good nor bad; they’re just different aesthetic styles for writing code. But Java only fully supports the former, and *IF* you choose to do it that way, THEN you’ll need virtual threads. In other words, a person who

 should be interested in virtual threads is one who thinks it would be nice to write code in the thread-per-request style, but doesn’t want to give up on throughput. I think the JEP is clear on that.</div>

<div><br>

<blockquote type="cite">

<div>

<div dir="auto">

<div dir="auto">

<div dir="auto"><br>

</div>

<div dir="auto">The answer to the first question is: "when your offered traffic is in thousands per CPU". Why CPU specifically? Because otherwise something else is the bottleneck. This means 100ms wait per 100 microseconds of on-cpu time. I don't know

 how common this is in the world, but in my practice this never was the case - because 100 microseconds is about as much as a REST endpoint takes to produce a few KB of JSON, and 100ms wait is an eternity in comparison. Why thousands? Because we had 200 threads

 per CPU and sync code, and were fine. Maybe it's gross, but Virtual threads is not the killer feature in those cases. Ok, I haven't seen the world, but I reckon the back of the envelope working out is ok.</div>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div>If what you’re claiming is that simple thread-per-request servers using OS threads are satisfactory for virtually all systems, then that has long since been established to not be the case. There’s just no point arguing over this. As I think I already told

 you, 100ms wait is the total of all waits, even if done in parallel, and it is quite common because quite a lot of servers do outgoing calls to scores of services. It is very common for a single incoming request to do 20 outgoing I/O requests if not more.</div>

<br>

<blockquote type="cite">

<div>

<div dir="auto">

<div dir="auto">

<div dir="auto"><br>

</div>

<div dir="auto">The second question is then not really based on performance, rather on architectural differences that thread-per-request offers. One less thing to tune is good. The reason that this is not a performance question, is that adding threads

 gets response time indistinguishably close to minimal possible way before you get to +oo.</div>

</div>

</div>

</div>

</blockquote>

<div><br>

</div>

<div>As long as you’re talking about “adding threads” I can tell you’re not getting this. No one is suggesting adding threads.</div>

<div><br>

</div>

<div>If you pick thread-per-request, then the number of threads grows with throughput, and that’s why you need virtual threads.</div>

<br>

<blockquote type="cite">

<div>

<div dir="auto">

<div dir="auto">

<div dir="auto"><br>

</div>

<div dir="auto"><br>

</div>

<div dir="auto">Alex</div>

</div>

</div>

<br>

<div class="gmail_quote">

<div dir="ltr" class="gmail_attr">On Tue, 26 Jul 2022, 10:15 Ron Pressler, <<a href="mailto:ron.pressler@oracle.com" target="_blank" rel="noreferrer">ron.pressler@oracle.com</a>> wrote:<br>

</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word;line-break:after-white-space">Let me make this as simple as I think I can:<br>

<br>

<div>1. We are talking *only* about a server that creates a new thread for every incoming request. That’s how we define “thread-per-request." If what you have in mind is a server that operates in any other way, you’re misunderstanding the conversation.<br>

</div>

<div><br>

</div>

<div>2. While artificially increasing the number of threads in that server would do nothing, whatever that system’s latency is, whatever its resource utilisation is, a rising rate of requests *will* result in that server having more threads that are

 alive concurrently (by virtue of how it operates, as a rising request rate will not cause that server to reduce latency); i.e. it’s the increased throughput that causes the number of threads to rise, not vice-versa. Therefore, to cope with high request rates

 that server must have the capacity for many threads.<br>

</div>

<br>

That is all, and that is how we know that a server using virtual threads would normally have a great many of them: because virtual threads are used by thread-per-request servers with high throughputs. Other things will happen too, and other concurrency limits

 will eventually come into play, but this — that the number of threads will rise — is necessarily true.<br>

<br>

Now we can get to what I think your actual point is. You believe that the server we’re talking about must be at some kind of a disadvantage compared to other kinds of servers. I understand you want me to convince that is not the case, but the only thing I can

 do to do that at this point is for you to actually write a server in this style, employing virtual threads, and then report what problems and limitations you actually run into, not hypothesise what problems you think you might run into. That will help you

 understand how virtual threads are used, and will help us find potentially missing APIs.<br>

<br>

<div>— Ron</div>

</div>

</blockquote>

</div>

</div>

</blockquote>

</div>

</div>

</blockquote></div>