[External] : Re: jstack, profilers and other tools

Tue Jul 26 13:33:13 UTC 2022

Hi Ron,

I think I can verbalize what bothered me all along.

I wish someone made a distinction between:

Offered traffic - actual term; determined based on the time one thread
spends on request.

Capacity - I don't think this is the actual term. This is the actual thread
count. If this is at or below offered traffic, the system is not stable.
You can increase capacity until you get to the thread-per-request, which
probably corresponds to +oo.

Concurrency as used in Little's law. This is measured in the same units as
offered traffic, but is not the same as offered traffic, because the time
used here is the actual response time, which includes all sorts of waits.

The confusing bit then is that we can't be talking of concurrency before
capacity exceeds offered traffic, because the system is not stable, and
after that adding threads only decreases concurrency.

Then also the pragmatic angle. At which point, or for what systems should I
say "yeah, we can't do this without Virtual threads", and at which point
should I say "thread-per-request is the way to go".

The answer to the first question is: "when your offered traffic is in
thousands per CPU". Why CPU specifically? Because otherwise something else
is the bottleneck. This means 100ms wait per 100 microseconds of on-cpu
time. I don't know how common this is in the world, but in my practice this
never was the case - because 100 microseconds is about as much as a REST
endpoint takes to produce a few KB of JSON, and 100ms wait is an eternity
in comparison. Why thousands? Because we had 200 threads per CPU and sync
code, and were fine. Maybe it's gross, but Virtual threads is not the
killer feature in those cases. Ok, I haven't seen the world, but I reckon
the back of the envelope working out is ok.

The second question is then not really based on performance, rather on
architectural differences that thread-per-request offers. One less thing to
tune is good. The reason that this is not a performance question, is that
adding threads gets response time indistinguishably close to minimal
possible way before you get to +oo.

Alex

On Tue, 26 Jul 2022, 10:15 Ron Pressler, <ron.pressler at oracle.com> wrote:

> Let me make this as simple as I think I can:
>
> 1. We are talking *only* about a server that creates a new thread for
> every incoming request. That’s how we define “thread-per-request." If what
> you have in mind is a server that operates in any other way, you’re
> misunderstanding the conversation.
>
> 2. While artificially increasing the number of threads in that server
> would do nothing, whatever that system’s latency is, whatever its resource
> utilisation is, a rising rate of requests *will* result in that server
> having more threads that are alive concurrently (by virtue of how it
> operates, as a rising request rate will not cause that server to reduce
> latency); i.e. it’s the increased throughput that causes the number of
> threads to rise, not vice-versa. Therefore, to cope with high request rates
> that server must have the capacity for many threads.
>
> That is all, and that is how we know that a server using virtual threads
> would normally have a great many of them: because virtual threads are used
> by thread-per-request servers with high throughputs. Other things will
> happen too, and other concurrency limits will eventually come into play,
> but this — that the number of threads will rise — is necessarily true.
>
> Now we can get to what I think your actual point is. You believe that the
> server we’re talking about must be at some kind of a disadvantage compared
> to other kinds of servers. I understand you want me to convince that is not
> the case, but the only thing I can do to do that at this point is for you
> to actually write a server in this style, employing virtual threads, and
> then report what problems and limitations you actually run into, not
> hypothesise what problems you think you might run into. That will help you
> understand how virtual threads are used, and will help us find potentially
> missing APIs.
>
> — Ron
>
> On 26 Jul 2022, at 08:32, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>
> I am talking of all systems with threads. A thread-per-request system is
> just a system with more threads. I can't understand what fault you find in
> me comparing one and the other. Isn't that what should be done when someone
> wants to be convinced, rather than take it on faith?
>
> You ask me not to say something, and then you say it yourself. The missing
> bit is that one can also reduce response time, and that's what you might
> see, if you tried different thread counts. Well, unless you think there's
> nothing to see and increased concurrency is the only explanation.
>
> The picture is interesting, but out of context of resource utilization and
> the time spent by each thread on one request you can't say that 200 or 800
> threads is the right capacity to sustain offered traffic. Once you have
> enough threads to supply that capacity (which in this particular workload
> may require Virtual threads), then compare to even higher thread counts and
> to thread-per-request.
>
> On Mon, 25 Jul 2022, 10:18 Ron Pressler, <ron.pressler at oracle.com> wrote:
>
>> You are talking about systems where a thread processes more than one
>> request. That is the very opposite of what thread-per-request means. Also,
>> as I said over and over and over, there is no claim that adding threads
>> increases concurrency. Please stop repeating that nonsense. Higher
>> throughput in a thread-per-request system means higher concurrency, which,
>> *in a thread-per-request program* means adding threads, but adding threads
>> doers not increase the concurrency. Reading a book when it’s dark outside
>> requires turning on the lights, but reading a book with the lights on does
>> not make it dark outside.
>>
>> Here are measurements from an actual server. As the laws of the universe
>> require, they follow Little’s law — the system becomes unstable exactly
>> when the equation breaks — as anything else would be impossible. Every
>> simulation software will show you the same thing. If you do see anything
>> else, then you’re not talking about thread-per-request.
>>
>> Absolutely everything we’ve discussed is states in JEP 425, so I would
>> ask you to read it carefully, and only respond if you have a question about
>> a particular section you can quote. Ask yourself if you’re talking about
>> programs where a thread can make progress on more than one request; if so,
>> go back and think.
>>
>> — Ron
>>
>>
>>
>> On 25 Jul 2022, at 09:16, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>>
>> Well, there are a few things I said several times too, so we are in the
>> same boat. :)
>>
>> Ok, just open your favourite modelling software and see:
>>
>> Given a request rate and a request processing time, there is a minimal
>> number of threads that can process them. That's the capacity needed to do
>> work (i.e. for the system to remain stable).
>>
>> Thread-per-request is simply the maximum number of threads you can have
>> to process that work.
>>
>> Then you can see what your favourite modelling software says about
>> concurrency. It says that as you add threads, concurrency in the sense used
>> in Little's law decreases.
>>
>> Since this is also a mathematical fact, something in the claim that
>> adding threads increases concurrency, needs reconciling.
>>
>>
>> Alex
>>
>> On Sun, 24 Jul 2022, 20:45 Ron Pressler, <ron.pressler at oracle.com> wrote:
>>
>>>
>>>
>>> > On 24 Jul 2022, at 19:26, Alex Otenko <oleksandr.otenko at gmail.com>
>>> wrote:
>>> >
>>> > The "other laws" don't contradict Little's law, they only explain that
>>> you can't have an equals sign between thread count and throughput.
>>>
>>> That there is an equals sign when the system is stable is a mathematical
>>> theorem, so there cannot exist a correct explanation for its falsehood.
>>> Your feelings about it are irrelevant to its correctness. It is a theorem.
>>>
>>> >
>>> > Let me remind you what I mean.
>>> >
>>> > 1 thread, 10ms per request. At request rate 66.667 concurrency is 2,
>>> and at request rate 99 concurrency is 99. Etc.  All of this is because
>>> response time gets worse as the "other laws" predict. But we already see
>>> thread count is not a cap on concurrency, as was one of the claims earlier
>>> in this thread.
>>> >
>>> > If we increase thread count we can improve response times. But at
>>> thread count 5 or 6 you are only 1 microsecond away from the "optimal" 10ms
>>> response time. Whereas arithmetically the situation keeps improving (by an
>>> ever smaller fraction of a microsecond), the mathematics of it cannot
>>> capture the notion of diminished returns.
>>>
>>> First, as I must have repeated three times in this discussion, we’re
>>> talking about thread-per-request (please read JEP 425, as all this is
>>> explained there), so by definition, we’re talking about cases where the
>>> number of threads is equal to or greater than the concurrency, i.e. the
>>> number of requests in flight.
>>>
>>> Second, as I must have repeated at least three times in this discussion,
>>> increasing the number of threads does nothing. A mathematical theorem tells
>>> us what the concurrency *is equal to* in a stable system. It cannot
>>> possibly be any lower or any higher. So your number of requests that are
>>> being processed is equal to some number if your system is stable, and in
>>> the case  of thread-per-request programs, which are our topic, the number
>>> of threads processing requests is exactly equal to the number of concurrent
>>> requests times the number of threads per request, which is at least one by
>>> definition. If you add any more threads they cannot be processing requests,
>>> and if you have fewer threads then your system isn’t stable.
>>>
>>> Finally, if your latency starts going up, then so does your concurrency,
>>> up to the point where one of your software or hardware components reaches
>>> its peak concurrency and your server destabilises. While Little’s law tells
>>> you what the concurrency is equal to (and so, in a thread-per-request
>>> program what the number of request-processing threads is equal to), the
>>> number of threads is not the only limit on the maximum capacity. We know
>>> that in a thread-per-request server, every request consumes at least one
>>> thread, but it consumes other resources as well, and they, too, place
>>> limitations on concurrency. All this is factored into the bounds on the
>>> concurrency level. It’s just that we empirically know that the limitation
>>> on threads is hit *first* by many servers, which is why async APIs and
>>> lightweight user-mode threads were invented.
>>>
>>> Note that Little’s law, being a mathematical theorem, applies to every
>>> component separately, too. I.e., you can treat your CPU as the server, the
>>> requests would be the processing bursts, and the maximum concurrency would
>>> be the number of cores.
>>>
>>> >
>>> > So that answers why we are typically fine with a small thread count.
>>> >
>>>
>>> That we are not typically fine writing scalable thread-per-request
>>> programs with few threads is the reason why async I/O and user-mode threads
>>> were created. It is possible some people are fine, but clearly many are
>>> not. If your thread-per-request program needs to handle only a small number
>>> of requests concurrently, and so needs only a few threads, then there’s no
>>> need for you to use virtual threads. That is exactly why, when this
>>> discussion started what feels like a year ago, I said that when there are
>>> virtual threads, there must be many of them (or else they’re not needed).
>>>
>>> — Ron
>>>
>>>
>>>
>> <Screenshot 2022-07-25 at 10.05.53.png><Screenshot 2022-07-25 at
> 10.05.53.png>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220726/20c7ebb4/attachment-0001.htm>