[External] : Re: jstack, profilers and other tools

Thu Jul 28 23:46:01 UTC 2022

On 28 Jul 2022, at 21:31, Alex Otenko <oleksandr.otenko at gmail.com<mailto:oleksandr.otenko at gmail.com>> wrote:

Hi Ron,

The claim in JEP is the same as in this email thread, so that is not much help. But now I don't need help anymore, because I found the explaination how the thread count, response time, request rate and thread-per-request are connected.

Now what makes me bothered about the claims.

Little's law connects throughput to concurrency. We agreed it has no connection to thread count. That's a disconnect between the claim about threads and Little's law dictating it.

Not really, no. In thread-per-request systems, the number of threads is equal to (or perhaps greater than) the concurrency because that’s the definition of thread-per-request. That’s why we agreed that in thread-per-request systems, Little’s law tells us the (average) number of threads.

There's also the assumption that response time remains constant, but that's a mighty assumption - response time changes with thread count.

There is absolutely no such assumption. Unless the a rising request rate causes the latency to significantly *drop*, the number of threads will grow. If the latency happens to rise, the number of threads will grow even faster.

There's also the claim of needing more threads. That's also not something that follows from thread-per-request. Essentially, thread-per-request is a constructive proof of having an infinite number of threads. How can one want more? Also, from a different angle - the number of threads in the thread-per-request needed does not depend on throughput at all.

The average number of requests being processed concurrently is equal to the rate of requests (i.e. throughput) times the average latency. Again, because a thread-per-request system is defined as one that assigns (at least) one thread for every request, the number of threads is therefore proportional to the throughput (as long as the system is stable).

Just consider what the request rate means. It means that if you choose however small time frame, and however large request count, there is a nonzero probability that it will happen. Consequently the number of threads needed is arbitrarily large for any throughput.

Little’s theorem is about *long-term average* concurrency, latency and throughput, and it is interesting precisely because it holds regardless of the distribution of the requests.

Which is just another way to say that the number of threads is effectively infinite and there is no point trying to connect it to Little's law.

I don’t understand what that means. A mathematical theorem about some quantity (again the theorem is about concurrency, but we *define* a thread-per-request system to be one where the number of threads is equal to (or greater than) the concurrency) is true whether you think there’s a point to it or not. The (average) number of threads is obviously not infinite, but equal to the throughput times latency (assuming just one thread per request).

Since Little’s law has been effectively used to size sever systems for decades, and so obviously there’s also a very practical point to understanding it.

Request rate doesn't change the number of threads that can exist at any given time

The (average) number of threads in a thread-per-request system rises proportionately with the (average) throughput. We’re not talking about the number of threads that *can* exist, but the number of threads that *do* exist (on average, of course). The number of threads that *can* exist puts a bound on the number of threads that *do* exist, and so on maximum throughput.

it only changes the probability of observing any particular number of them in a fixed period of time.

It changes their (average) number. You can start any thread-per-request server increase the load and see for yourself (if the server uses a pool, you’ll see an increase not in live threads but in non-idle threads, but it’s the same thing).

All this is only criticism of the formal mathematical claims made here and in the JEP. Nothing needs doing, if no one is interested in formal claims being perfect.

The claims, however, were written with care for precision, while your reading of them is not only imprecise and at times incorrect, but may lead people to misunderstand how concurrent systems behave.

Alex

On Tue, 26 Jul 2022, 15:41 Ron Pressler, <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>> wrote:

On 26 Jul 2022, at 14:33, Alex Otenko <oleksandr.otenko at gmail.com<mailto:oleksandr.otenko at gmail.com>> wrote:

Hi Ron,

I think I can verbalize what bothered me all along.

I wish someone made a distinction between:

Offered traffic - actual term; determined based on the time one thread spends on request.

Capacity - I don't think this is the actual term. This is the actual thread count. If this is at or below offered traffic, the system is not stable. You can increase capacity until you get to the thread-per-request, which probably corresponds to +oo.

I don’t understand this sentence.

Concurrency as used in Little's law. This is measured in the same units as offered traffic, but is not the same as offered traffic, because the time used here is the actual response time, which includes all sorts of waits.

None of that matters. Little’s law is a mathematical theorem about some unit arriving at some processing centre — a customer, a request, whatever — and for *that* unit, the theorem relates the average latency of performing that operation and the average rate of arrival of those things to the average number of those things existing concurrently in the centre. So, we pick requests as the things we look at, and everything follows. The theorem tells us how many requests, on average, are concurrently being processed, and since we’re assuming thread-per-request, this tells us how many threads are active, because *by definition* of thread-per-request a concurrent request takes at least one thread.

The confusing bit then is that we can't be talking of concurrency before capacity exceeds offered traffic, because the system is not stable, and after that adding threads only decreases concurrency.

No one is talking about *adding* threads. The number of threads grows because rising throughput *makes it grow* in a thread-per-request system. Also, we’re not interested in what’s happening in a system in the process of crashing.

Then also the pragmatic angle. At which point, or for what systems should I say "yeah, we can't do this without Virtual threads", and at which point should I say "thread-per-request is the way to go".

As explained in JEP 425, there is absolutely no such point: Picking thread-per-request is the premise we’re taking as a given, not the conclusion. I.e. we assume thread-per-request, and the conclusion is that we need many threads. Virtual threads are designed to allow thread-per-request servers to achieve the maximum throughput allowable by the hardware.

Why do so many people want to pick thread-per-request? Because thread-per-request is the model that allows representing your application’s unit of concurrency with the platform’s unit of concurrency, and the Java platform has only one such unit: the thread. I.e. it is the only model that the language and the platform fully support. That is why asynchronous APIs are essentially DSLs and do not rely on the language’s basic composition constructs (loops, try/catch, try-with-resources etc.), why JFR yields less-than-informative profiles for such programs, and why debuggers can’t step through the logical flow of such programs.

So there is absolutely no point at which you’d say “we must do it like that”. But *if* you choose to do it like that then you’d need virtual threads if your concurrency exceeds ~1000.

Thread-per-request or async are neither good nor bad; they’re just different aesthetic styles for writing code. But Java only fully supports the former, and *IF* you choose to do it that way, THEN you’ll need virtual threads. In other words, a person who should be interested in virtual threads is one who thinks it would be nice to write code in the thread-per-request style, but doesn’t want to give up on throughput. I think the JEP is clear on that.

The answer to the first question is: "when your offered traffic is in thousands per CPU". Why CPU specifically? Because otherwise something else is the bottleneck. This means 100ms wait per 100 microseconds of on-cpu time. I don't know how common this is in the world, but in my practice this never was the case - because 100 microseconds is about as much as a REST endpoint takes to produce a few KB of JSON, and 100ms wait is an eternity in comparison. Why thousands? Because we had 200 threads per CPU and sync code, and were fine. Maybe it's gross, but Virtual threads is not the killer feature in those cases. Ok, I haven't seen the world, but I reckon the back of the envelope working out is ok.

If what you’re claiming is that simple thread-per-request servers using OS threads are satisfactory for virtually all systems, then that has long since been established to not be the case. There’s just no point arguing over this. As I think I already told you, 100ms wait is the total of all waits, even if done in parallel, and it is quite common because quite a lot of servers do outgoing calls to scores of services. It is very common for a single incoming request to do 20 outgoing I/O requests if not more.

The second question is then not really based on performance, rather on architectural differences that thread-per-request offers. One less thing to tune is good. The reason that this is not a performance question, is that adding threads gets response time indistinguishably close to minimal possible way before you get to +oo.

As long as you’re talking about “adding threads” I can tell you’re not getting this. No one is suggesting adding threads.

If you pick thread-per-request, then the number of threads grows with throughput, and that’s why you need virtual threads.

Alex

On Tue, 26 Jul 2022, 10:15 Ron Pressler, <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>> wrote:
Let me make this as simple as I think I can:

1. We are talking *only* about a server that creates a new thread for every incoming request. That’s how we define “thread-per-request." If what you have in mind is a server that operates in any other way, you’re misunderstanding the conversation.

2. While artificially increasing the number of threads in that server would do nothing, whatever that system’s latency is, whatever its resource utilisation is, a rising rate of requests *will* result in that server having more threads that are alive concurrently (by virtue of how it operates, as a rising request rate will not cause that server to reduce latency); i.e. it’s the increased throughput that causes the number of threads to rise, not vice-versa. Therefore, to cope with high request rates that server must have the capacity for many threads.

That is all, and that is how we know that a server using virtual threads would normally have a great many of them: because virtual threads are used by thread-per-request servers with high throughputs. Other things will happen too, and other concurrency limits will eventually come into play, but this — that the number of threads will rise — is necessarily true.

Now we can get to what I think your actual point is. You believe that the server we’re talking about must be at some kind of a disadvantage compared to other kinds of servers. I understand you want me to convince that is not the case, but the only thing I can do to do that at this point is for you to actually write a server in this style, employing virtual threads, and then report what problems and limitations you actually run into, not hypothesise what problems you think you might run into. That will help you understand how virtual threads are used, and will help us find potentially missing APIs.

— Ron

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220728/86508d23/attachment-0001.htm>