[External] : Re: jstack, profilers and other tools

Tue Aug 2 19:53:26 UTC 2022

I think you have two different meanings of thread-per-request. What most
people write, is a pool of threads with a queue. Then you also have a
different notion where thread-per-request really creates a new thread for
every request as and when they arrive. It's hard to follow which one you
mean when.

I understand that when you propose to draw a line on the other side of the
queue you can describe what happens to requests. But because you vigorously
objected to including the queue wait, I want to see how the description of
the system changes if I add more threads. If there is no way to distinguish
the two (one thread vs two threads), I'll be inclined to use a description
where I can see the difference - for example, the description of the system
where queue wait is included. I see no harm in that, only clarity.

As for sizing - there are formulas that appear very similar to Little's
law, but it is made explicit what time is used there, so there really is no
confusion. For example, arrival_rate/departure_rate_per_thread tells you
how many threads you need "without" Little's law. (In quotes, because
1/departure_rate_per_thread is really the time it takes one thread to
process one request - queue wait conveniently left out)

Alex

On Tue, 2 Aug 2022, 20:20 Ron Pressler, <ron.pressler at oracle.com> wrote:

> I’m not sure I understand the question. If you mean to ask what the
> theorem says about threads, then it says nothing directly, just as it says
> nothing about the number of clerks or the square footage of a shop even
> though those are sized using it.
>
> The maths tells you how many units will be concurrently inside your system
> on average: L. You then use the knowledge about the units and the resources
> they consume to draw conclusions. E.g. you’d need to know how much area
> each customer in the shop needs to know how big it should be. In the case
> of servers, you’d need to know how much CPU, RAM and network bandwidth each
> request requires on average, as well as how many OS resources, such as file
> descriptors and threads.
>
> If you know that the server you’re making will be thread-per-request, then
> you know that whatever other resources a request might consume, every
> request will consume at least one thread, so you know you’d need to be able
> to have at least L concurrent threads. If you choose to write your server
> in some other way, where the relationship between threads and requests is
> not as simple, you can do the same reasoning on whatever construct it is
> that you use to represent an async request. If it’s, say, a
> CompletableFuture, you know you’ll need to have the capacity for at least L
> CompletableFutures.
>
> Now, because many people want to write thread-per-request servers — for
> aesthetic reasons as well as because that’s the only design that is in
> harmony with the way the Java platform is designed — and because it’s been
> empirically established that the capacity of the CPU, RAM, network
> bandwidth and file descriptors often allows many real-world servers to
> easily contain tens or even hundreds of thousands of concurrent requests.
> In asynchronous servers, the capacity for CompletableFutures easily matches
> that of the other resources, and so such servers can make optimal use of
> available resources at the cost of using a design that isn’t very friendly
> to Java. But to support making optimal use of all other resources in
> thread-per-request servers, the number of threads that you can support
> needs to be as high, and that is why virtual threads are needed in
> thread-per-request servers.
>
> In fact, in many thread-per-request servers a requests doesn’t consume
> just one thread, but several times that (could be as high as 50 or more),
> the number of threads might need to be 50L.
>
> — Ron
>
> On 2 Aug 2022, at 19:58, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>
> How does this view allow to distinguish the behaviour of the system with 1
> thread vs 2 threads? (Same condition: 66.6667 req/s, 1/100s for each
> request)
>
>
> Alex
>
> On Tue, 2 Aug 2022, 17:29 Ron Pressler, <ron.pressler at oracle.com> wrote:
>
>> P.S.
>>
>> To complete the story, let’s get back to thread-per-request servers with
>> a fixed thread-pools — like the majority of Java servers today.
>>
>> We consider a request to be inside the system once it’s been assigned its
>> own thread, and W the amount of time it hangs on to that thread. If the
>> size of the pool is equal to or greater than L, then the queue formed
>> “outside" will not grow indefinitely. That queue is the amalgamation of any
>> queue you have in the software, network buffers, and requests that are
>> denied and retried. However big *or small* you choose to make the capacity
>> of your software queue — it can be as large as a billion elements and as
>> small as zero — the size of that queue amalgam will not grow without bounds
>> and the system will remain stable, i.e. the number of people waiting to be
>> served will not grow indefinitely.
>>
>> Of course, if we increase the capacity of software queue, the user
>> experience could be better as more of that queue amalgam will be handled
>> automatically and fairly (in a scheduling sense), rather than by retries.
>>
>> So quite the opposite of needing a software queue of unbounded capacity
>> used before assigning the thread — *any* capacity of software queue would
>> do if the system is stable, even of size zero, although a very small queue
>> might increase the number of rejected requests and retries (although not
>> without bound because the system is stable) which is less pleasant.
>>
>> — Ron
>>
>> On 2 Aug 2022, at 17:05, Ron Pressler <ron.pressler at oracle.com> wrote:
>>
>> You choose to draw the boundary of your system wherever you like. W is
>> equal to the average amount of time a request spends *inside* the boundary,
>> L is the average number of requests being concurrently *inside*, and the
>> queue (and there is always one, whether it exists as some software or
>> hardware construct or not) that is implicit in the definition of stability
>> is always *outside*.
>>
>> If you choose to draw the boundaries of the system as those of a bank
>> branch, then L will be equal to the average number of customers inside the
>> branch, and stability will refer to the queue that might form *outside*. It
>> doesn’t care if there are also queues inside the branch or not. If, on the
>> other hand, you choose to draw the boundaries such that the clerk’s counter
>> is the “inside” then the implicit queue will, as always, be the one
>> “outside”, even though it’s inside the bank.
>>
>> So the question whether the elements in a certain software queue are
>> included in L or not depend on where you choose to draw the boundary. So,
>> if you have 67 req/s (I don’t see how the number of threads is relevant, as
>> you haven’t specified how threads are assigned to requests) and each spends
>> 1/100 s inside your system’s chosen boundaries, then there will be an
>> average of 0.67 requests inside your system concurrently. So, if your
>> system is capable of handling 1 request at a time, there will not form a
>> queue outside it that grows indefinitely.
>>
>> So the stability of your system always refers to a queue that’s outside
>> it, and is agnostic to any queues that are inside it and part of its
>> internal working, although the theorem could be applied to them as well. W
>> and L always refer to the behaviour inside, and don't include number of
>> requests or durations spent in the queue outside.
>>
>> In short, the implicit queue whose stability the theorem talks about is
>> always outside the system to which the formula refers, whereas the question
>> of which “physical” queues you want to put inside the boundaries of the
>> system is completely irrelevant to the theorem, which says nothing about
>> them, and that is why your statement is incorrect.
>>
>> — Ron
>>
>> On 2 Aug 2022, at 15:47, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>>
>> I don't know what you find incorrect about the statement that queue wait
>> is included in Little's law.
>>
>> 66.6667 requests per second processed by a single thread, spending 10ms
>> on each request. Little's law says that in this case concurrency is 2.
>> Surely it means that the requests in the queue are included in concurrency?
>>
>>
>> Alex
>>
>> On Tue, 2 Aug 2022, 12:04 Ron Pressler, <ron.pressler at oracle.com> wrote:
>>
>>> I’m afraid that is incorrect and, indeed, quite harmful to understanding
>>> how Little’s law has been used to size systems for decades.
>>>
>>> Little’s law refers to some unit, like “request” or “customer”, and also
>>> some boundary of a “store” or “server” where the request can be either
>>> inside or outside. The definition of what’s inside and outside are
>>> arbitrary, and the law obviously works for any such definition.  W is the
>>> latency that the “request” spends “inside.”
>>>
>>> So, for example, suppose the system is a shop and customers are the
>>> unit. If a customer spends an average of 6 minutes *inside* the shop (so L
>>> = 0.1 hr) and customers arrive at an average rate of 100 per hour (= λ),
>>> then L = 100*0.1 = 10 is the average concurrency, namely the number of
>>> customers, on average, inside the shop.
>>>
>>> If the shop has a capacity for fewer than 10 customers, then the system
>>> will be “unstable”, meaning that a queue will form outside that will keep
>>> growing indefinitely. If, on the other hand, the shop’s capacity is 10
>>> customers, then the system will be stable. That means that no matter what
>>> the distribution of customer arrival is, queues that form outside will not
>>> grow without bound. A capacity of 10 is enough.
>>>
>>> Now, inside the shop there could be clerks, and each of them could be
>>> treated as another system serving customers (with a concurrent capacity of
>>> 1), and we could then apply the theorem to each of them separately, where
>>> “inside” means being served, and “outside” is the queue forming to be
>>> served by the clerk inside the shop.
>>>
>>> Wherever you draw your boundaries between inside and outside, and
>>> whatever the unit of work is, the law tells us which system or subsystem is
>>> stable and which isn’t.
>>>
>>> And now back to software. If our server is thread-per-request, that
>>> means that a request (our unit) consumes a thread for its entire duration
>>> of being served by our system, and so “inside” means from the moment a
>>> request gets its thread, and “outside” means anything prior to that. It
>>> could be some queue in the software, it could be some queue in the network
>>> stack, memory in the network card, and even a virtual queue of users whose
>>> requests might be rejected and keep retrying — it doesn’t matter. If the
>>> capacity of “inside” the server — which, in this case means the maximum
>>> number of threads because the number of requests “inside” such a system
>>> cannot be any greater than the number of threads (or possibly lower due to
>>> other limits) — is at or above L, then the system will be stable, meaning
>>> that all queues, real or virtual, forming anywhere in software hardware or
>>> “in the world”, will not grow without bounds.
>>>
>>> Indeed, this is how and why most Java servers work today. Most of them
>>> are thread-per-request, and most of them use a fixed thread-pool of size X,
>>> meaning that no more than X requests can be “inside” at any one time. But
>>> if X is equal to or greater than L, then the server will not become
>>> overloaded indefinitely and will manage to serve all requests. It does not
>>> mean that no requests will ever be rejected, but it does mean that the
>>> server will carry out its work satisfactorily for an indefinite duration.
>>>
>>> Whether or not a server is thread-per-request, wherever you define the
>>> boundary of “inside”, there can only be some bounded capacity for
>>> concurrent requests “inside”, and some real or virtual queue forming
>>> outside.
>>>
>>> The theorem absolutely does not mean that in order to perform in this
>>> manner the network buffers or internal queue must be of unlimited capacity
>>> (which they can never be). In fact, even if there are no software/hardware
>>> buffers at all and a request is rejected if all threads are busy, we know
>>> that the number of people waiting retrying because their requests are
>>> rejected will not grow without bound. Implementing buffers in software or
>>> hardware mean that some of the queue will be automatically managed by a
>>> digital machine, and could improve momentary user-experience, but whatever
>>> its size limit is, the number of units waiting “outside” in all queues —
>>> virtual or digital — is unaffected by where the queue is, will grow
>>> indefinitely if the capacity “inside” is below L, and will not grow
>>> indefinitely if the capacity inside is L or above.
>>>
>>> That is why servers work, and that is why they work exactly as
>>> mathematical theorems tell us they must.
>>>
>>> — Ron
>>>
>>>
>>>
>>>
>>>
>>> On 2 Aug 2022, at 08:34, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>>>
>>> Hi Ron,
>>>
>>> The "maximum capacity for concurrent requests" statement uses
>>> "concurrent" in the sense of Java threads. Little's law counts queued
>>> requests as "concurrent", too. Then assuming queue can be arbitrarily
>>> large, we can't have a finite max capacity for concurrent requests.
>>>
>>>
>>> Alex
>>>
>>> On Sun, 31 Jul 2022, 17:38 Ron Pressler, <ron.pressler at oracle.com>
>>> wrote:
>>>
>>>> A thread-per-request server under an average load 100req/s and an
>>>> average request processing duration of 10ms will not destabilise if it can
>>>> have no more than 10 threads, where “destabilise” means that the queue
>>>> grows indefinitely. I.e. in a a stable system is defined as one where
>>>> queues do not grow indefinitely.
>>>>
>>>> While it is true that for any queue size there is some shrinking though
>>>> non-zero probability that it might be reached, it is absolutely unrelated
>>>> to the server being thread-per-request. However it is that you choose to
>>>> represent a request, if your maximum capacity for concurrent requests is at
>>>> or above L, the system will be stable.
>>>>
>>>> — Ron
>>>>
>>>> On 31 Jul 2022, at 13:27, Alex Otenko <oleksandr.otenko at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Ron,
>>>>
>>>> I am glad you mentioned this visualisation experiment. That's the sort
>>>> of experiment I proposed (and done) what seems like a week ago. You may
>>>> find some unintuitive things coming from this experiment, since you called
>>>> them untrue.
>>>>
>>>>
>>>> Alex
>>>>
>>>> On Fri, 29 Jul 2022, 14:53 Ron Pressler, <ron.pressler at oracle.com>
>>>> wrote:
>>>>
>>>>>
>>>>> BTW, the easiest way to visualise the temporary discrepancies between
>>>>> the averages in Little’s law and instantaneous behaviour is to think of the
>>>>> queue forming at the entry to the system.
>>>>>
>>>>> If the system is unstable (i.e. the equation doesn’t hold), the queue
>>>>> will grow without bounds. If it is stable, it can momentarily grow but will
>>>>> then shrink as we regress to the mean. So suppose λ = 100 and W = 1/20, and
>>>>> therefore the average concurrency 5. If the *maximum* capacity for
>>>>> concurrent operations is also 5 (e.g we have a thread-per-request server
>>>>> that uses just one thread per request and the maximum number of threads we
>>>>> can support is 5), then if the rate of requests momentarily rises above 100
>>>>> then the queue will grow, but will eventually shrink when the rate drops
>>>>> below 100 (as it must).
>>>>>
>>>>> So if our throughput/rate-of-requests is expected to not exceed 100,
>>>>> we should be fine supporting just 5 threads. To get a feel that this
>>>>> actually works, consider that the world’s most popular thread-per-request
>>>>> servers already work in exactly this way. Rather than spawning a brand new
>>>>> thread for every request, they borrow one from a pool. The pool is normally
>>>>> fixed and set to something like a few hundred threads by default. They work
>>>>> fine as long as their throughput doesn’t exceed the maximum expected
>>>>> throughput, where the threads in the pool (or perhaps some other resource)
>>>>> is exhausted. I.e. they do *not* work by allowing for an unbounded number
>>>>> of threads, but a bounded one; they are very much thread-per-request, yet
>>>>> their threads are capped. This does place an upper limit on their
>>>>> throughput, but they work fine until they reach it.
>>>>>
>>>>> Of course, if other resources are exhausted before the pool is
>>>>> depleted, then (fanout aside) lightweight threads aren’t needed. But
>>>>> because there are so many systems where the threads are the resource that’s
>>>>> exhausted first, people invented non-thread-per-request (i.e. asynchronous)
>>>>> servers as well as lightweight threads.
>>>>>
>>>>> — Ron
>>>>>
>>>>> On 29 Jul 2022, at 00:46, Ron Pressler <ron.pressler at oracle.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 28 Jul 2022, at 21:31, Alex Otenko <oleksandr.otenko at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Ron,
>>>>>
>>>>> The claim in JEP is the same as in this email thread, so that is not
>>>>> much help. But now I don't need help anymore, because I found the
>>>>> explaination how the thread count, response time, request rate and
>>>>> thread-per-request are connected.
>>>>>
>>>>> Now what makes me bothered about the claims.
>>>>>
>>>>> Little's law connects throughput to concurrency. We agreed it has no
>>>>> connection to thread count. That's a disconnect between the claim about
>>>>> threads and Little's law dictating it.
>>>>>
>>>>>
>>>>> Not really, no. In thread-per-request systems, the number of threads
>>>>> is equal to (or perhaps greater than) the concurrency because that’s the
>>>>> definition of thread-per-request. That’s why we agreed that in
>>>>> thread-per-request systems, Little’s law tells us the (average) number of
>>>>> threads.
>>>>>
>>>>>
>>>>> There's also the assumption that response time remains constant, but
>>>>> that's a mighty assumption - response time changes with thread count.
>>>>>
>>>>>
>>>>> There is absolutely no such assumption. Unless the a rising request
>>>>> rate causes the latency to significantly *drop*, the number of threads will
>>>>> grow. If the latency happens to rise, the number of threads will grow even
>>>>> faster.
>>>>>
>>>>>
>>>>> There's also the claim of needing more threads. That's also not
>>>>> something that follows from thread-per-request. Essentially,
>>>>> thread-per-request is a constructive proof of having an infinite number of
>>>>> threads. How can one want more? Also, from a different angle - the number
>>>>> of threads in the thread-per-request needed does not depend on throughput
>>>>> at all.
>>>>>
>>>>>
>>>>> The average number of requests being processed concurrently is equal
>>>>> to the rate of requests (i.e. throughput) times the average latency. Again,
>>>>> because a thread-per-request system is defined as one that assigns (at
>>>>> least) one thread for every request, the number of threads is therefore
>>>>> proportional to the throughput (as long as the system is stable).
>>>>>
>>>>>
>>>>> Just consider what the request rate means. It means that if you choose
>>>>> however small time frame, and however large request count, there is a
>>>>> nonzero probability that it will happen. Consequently the number of threads
>>>>> needed is arbitrarily large for any throughput.
>>>>>
>>>>>
>>>>> Little’s theorem is about *long-term average* concurrency, latency and
>>>>> throughput, and it is interesting precisely because it holds regardless of
>>>>> the distribution of the requests.
>>>>>
>>>>> Which is just another way to say that the number of threads is
>>>>> effectively infinite and there is no point trying to connect it to Little's
>>>>> law.
>>>>>
>>>>>
>>>>> I don’t understand what that means. A mathematical theorem about some
>>>>> quantity (again the theorem is about concurrency, but we *define* a
>>>>> thread-per-request system to be one where the number of threads is equal to
>>>>> (or greater than) the concurrency) is true whether you think there’s a
>>>>> point to it or not. The (average) number of threads is obviously not
>>>>> infinite, but equal to the throughput times latency (assuming just one
>>>>> thread per request).
>>>>>
>>>>> Since Little’s law has been effectively used to size sever systems for
>>>>> decades, and so obviously there’s also a very practical point to
>>>>> understanding it.
>>>>>
>>>>> Request rate doesn't change the number of threads that can exist at
>>>>> any given time
>>>>>
>>>>>
>>>>> The (average) number of threads in a thread-per-request system rises
>>>>> proportionately with the (average) throughput. We’re not talking about the
>>>>> number of threads that *can* exist, but the number of threads that *do*
>>>>> exist (on average, of course). The number of threads that *can* exist puts
>>>>> a bound on the number of threads that *do* exist, and so on maximum
>>>>> throughput.
>>>>>
>>>>> it only changes the probability of observing any particular number of
>>>>> them in a fixed period of time.
>>>>>
>>>>>
>>>>> It changes their (average) number. You can start any
>>>>> thread-per-request server increase the load and see for yourself (if the
>>>>> server uses a pool, you’ll see an increase not in live threads but in
>>>>> non-idle threads, but it’s the same thing).
>>>>>
>>>>>
>>>>> All this is only criticism of the formal mathematical claims made here
>>>>> and in the JEP. Nothing needs doing, if no one is interested in formal
>>>>> claims being perfect.
>>>>>
>>>>>
>>>>> The claims, however, were written with care for precision, while your
>>>>> reading of them is not only imprecise and at times incorrect, but may lead
>>>>> people to misunderstand how concurrent systems behave.
>>>>>
>>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220802/e14b7ebd/attachment-0001.htm>