[External] : Re: jstack, profilers and other tools

Tue Jul 26 14:41:41 UTC 2022

On 26 Jul 2022, at 14:33, Alex Otenko <oleksandr.otenko at gmail.com<mailto:oleksandr.otenko at gmail.com>> wrote:

Hi Ron,

I think I can verbalize what bothered me all along.

I wish someone made a distinction between:

Offered traffic - actual term; determined based on the time one thread spends on request.

Capacity - I don't think this is the actual term. This is the actual thread count. If this is at or below offered traffic, the system is not stable. You can increase capacity until you get to the thread-per-request, which probably corresponds to +oo.

I don’t understand this sentence.

Concurrency as used in Little's law. This is measured in the same units as offered traffic, but is not the same as offered traffic, because the time used here is the actual response time, which includes all sorts of waits.

None of that matters. Little’s law is a mathematical theorem about some unit arriving at some processing centre — a customer, a request, whatever — and for *that* unit, the theorem relates the average latency of performing that operation and the average rate of arrival of those things to the average number of those things existing concurrently in the centre. So, we pick requests as the things we look at, and everything follows. The theorem tells us how many requests, on average, are concurrently being processed, and since we’re assuming thread-per-request, this tells us how many threads are active, because *by definition* of thread-per-request a concurrent request takes at least one thread.

The confusing bit then is that we can't be talking of concurrency before capacity exceeds offered traffic, because the system is not stable, and after that adding threads only decreases concurrency.

No one is talking about *adding* threads. The number of threads grows because rising throughput *makes it grow* in a thread-per-request system. Also, we’re not interested in what’s happening in a system in the process of crashing.

Then also the pragmatic angle. At which point, or for what systems should I say "yeah, we can't do this without Virtual threads", and at which point should I say "thread-per-request is the way to go".

As explained in JEP 425, there is absolutely no such point: Picking thread-per-request is the premise we’re taking as a given, not the conclusion. I.e. we assume thread-per-request, and the conclusion is that we need many threads. Virtual threads are designed to allow thread-per-request servers to achieve the maximum throughput allowable by the hardware.

Why do so many people want to pick thread-per-request? Because thread-per-request is the model that allows representing your application’s unit of concurrency with the platform’s unit of concurrency, and the Java platform has only one such unit: the thread. I.e. it is the only model that the language and the platform fully support. That is why asynchronous APIs are essentially DSLs and do not rely on the language’s basic composition constructs (loops, try/catch, try-with-resources etc.), why JFR yields less-than-informative profiles for such programs, and why debuggers can’t step through the logical flow of such programs.

So there is absolutely no point at which you’d say “we must do it like that”. But *if* you choose to do it like that then you’d need virtual threads if your concurrency exceeds ~1000.

Thread-per-request or async are neither good nor bad; they’re just different aesthetic styles for writing code. But Java only fully supports the former, and *IF* you choose to do it that way, THEN you’ll need virtual threads. In other words, a person who should be interested in virtual threads is one who thinks it would be nice to write code in the thread-per-request style, but doesn’t want to give up on throughput. I think the JEP is clear on that.

The answer to the first question is: "when your offered traffic is in thousands per CPU". Why CPU specifically? Because otherwise something else is the bottleneck. This means 100ms wait per 100 microseconds of on-cpu time. I don't know how common this is in the world, but in my practice this never was the case - because 100 microseconds is about as much as a REST endpoint takes to produce a few KB of JSON, and 100ms wait is an eternity in comparison. Why thousands? Because we had 200 threads per CPU and sync code, and were fine. Maybe it's gross, but Virtual threads is not the killer feature in those cases. Ok, I haven't seen the world, but I reckon the back of the envelope working out is ok.

If what you’re claiming is that simple thread-per-request servers using OS threads are satisfactory for virtually all systems, then that has long since been established to not be the case. There’s just no point arguing over this. As I think I already told you, 100ms wait is the total of all waits, even if done in parallel, and it is quite common because quite a lot of servers do outgoing calls to scores of services. It is very common for a single incoming request to do 20 outgoing I/O requests if not more.

The second question is then not really based on performance, rather on architectural differences that thread-per-request offers. One less thing to tune is good. The reason that this is not a performance question, is that adding threads gets response time indistinguishably close to minimal possible way before you get to +oo.

As long as you’re talking about “adding threads” I can tell you’re not getting this. No one is suggesting adding threads.

If you pick thread-per-request, then the number of threads grows with throughput, and that’s why you need virtual threads.

Alex

On Tue, 26 Jul 2022, 10:15 Ron Pressler, <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>> wrote:
Let me make this as simple as I think I can:

1. We are talking *only* about a server that creates a new thread for every incoming request. That’s how we define “thread-per-request." If what you have in mind is a server that operates in any other way, you’re misunderstanding the conversation.

2. While artificially increasing the number of threads in that server would do nothing, whatever that system’s latency is, whatever its resource utilisation is, a rising rate of requests *will* result in that server having more threads that are alive concurrently (by virtue of how it operates, as a rising request rate will not cause that server to reduce latency); i.e. it’s the increased throughput that causes the number of threads to rise, not vice-versa. Therefore, to cope with high request rates that server must have the capacity for many threads.

That is all, and that is how we know that a server using virtual threads would normally have a great many of them: because virtual threads are used by thread-per-request servers with high throughputs. Other things will happen too, and other concurrency limits will eventually come into play, but this — that the number of threads will rise — is necessarily true.

Now we can get to what I think your actual point is. You believe that the server we’re talking about must be at some kind of a disadvantage compared to other kinds of servers. I understand you want me to convince that is not the case, but the only thing I can do to do that at this point is for you to actually write a server in this style, employing virtual threads, and then report what problems and limitations you actually run into, not hypothesise what problems you think you might run into. That will help you understand how virtual threads are used, and will help us find potentially missing APIs.

— Ron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220726/02a55528/attachment-0001.htm>