[External] : Re: jstack, profilers and other tools

Tue Aug 2 18:58:22 UTC 2022

How does this view allow to distinguish the behaviour of the system with 1
thread vs 2 threads? (Same condition: 66.6667 req/s, 1/100s for each
request)

Alex

On Tue, 2 Aug 2022, 17:29 Ron Pressler, <ron.pressler at oracle.com> wrote:

> P.S.
>
> To complete the story, let’s get back to thread-per-request servers with a
> fixed thread-pools — like the majority of Java servers today.
>
> We consider a request to be inside the system once it’s been assigned its
> own thread, and W the amount of time it hangs on to that thread. If the
> size of the pool is equal to or greater than L, then the queue formed
> “outside" will not grow indefinitely. That queue is the amalgamation of any
> queue you have in the software, network buffers, and requests that are
> denied and retried. However big *or small* you choose to make the capacity
> of your software queue — it can be as large as a billion elements and as
> small as zero — the size of that queue amalgam will not grow without bounds
> and the system will remain stable, i.e. the number of people waiting to be
> served will not grow indefinitely.
>
> Of course, if we increase the capacity of software queue, the user
> experience could be better as more of that queue amalgam will be handled
> automatically and fairly (in a scheduling sense), rather than by retries.
>
> So quite the opposite of needing a software queue of unbounded capacity
> used before assigning the thread — *any* capacity of software queue would
> do if the system is stable, even of size zero, although a very small queue
> might increase the number of rejected requests and retries (although not
> without bound because the system is stable) which is less pleasant.
>
> — Ron
>
> On 2 Aug 2022, at 17:05, Ron Pressler <ron.pressler at oracle.com> wrote:
>
> You choose to draw the boundary of your system wherever you like. W is
> equal to the average amount of time a request spends *inside* the boundary,
> L is the average number of requests being concurrently *inside*, and the
> queue (and there is always one, whether it exists as some software or
> hardware construct or not) that is implicit in the definition of stability
> is always *outside*.
>
> If you choose to draw the boundaries of the system as those of a bank
> branch, then L will be equal to the average number of customers inside the
> branch, and stability will refer to the queue that might form *outside*. It
> doesn’t care if there are also queues inside the branch or not. If, on the
> other hand, you choose to draw the boundaries such that the clerk’s counter
> is the “inside” then the implicit queue will, as always, be the one
> “outside”, even though it’s inside the bank.
>
> So the question whether the elements in a certain software queue are
> included in L or not depend on where you choose to draw the boundary. So,
> if you have 67 req/s (I don’t see how the number of threads is relevant, as
> you haven’t specified how threads are assigned to requests) and each spends
> 1/100 s inside your system’s chosen boundaries, then there will be an
> average of 0.67 requests inside your system concurrently. So, if your
> system is capable of handling 1 request at a time, there will not form a
> queue outside it that grows indefinitely.
>
> So the stability of your system always refers to a queue that’s outside
> it, and is agnostic to any queues that are inside it and part of its
> internal working, although the theorem could be applied to them as well. W
> and L always refer to the behaviour inside, and don't include number of
> requests or durations spent in the queue outside.
>
> In short, the implicit queue whose stability the theorem talks about is
> always outside the system to which the formula refers, whereas the question
> of which “physical” queues you want to put inside the boundaries of the
> system is completely irrelevant to the theorem, which says nothing about
> them, and that is why your statement is incorrect.
>
> — Ron
>
> On 2 Aug 2022, at 15:47, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>
> I don't know what you find incorrect about the statement that queue wait
> is included in Little's law.
>
> 66.6667 requests per second processed by a single thread, spending 10ms on
> each request. Little's law says that in this case concurrency is 2. Surely
> it means that the requests in the queue are included in concurrency?
>
>
> Alex
>
> On Tue, 2 Aug 2022, 12:04 Ron Pressler, <ron.pressler at oracle.com> wrote:
>
>> I’m afraid that is incorrect and, indeed, quite harmful to understanding
>> how Little’s law has been used to size systems for decades.
>>
>> Little’s law refers to some unit, like “request” or “customer”, and also
>> some boundary of a “store” or “server” where the request can be either
>> inside or outside. The definition of what’s inside and outside are
>> arbitrary, and the law obviously works for any such definition.  W is the
>> latency that the “request” spends “inside.”
>>
>> So, for example, suppose the system is a shop and customers are the unit.
>> If a customer spends an average of 6 minutes *inside* the shop (so L = 0.1
>> hr) and customers arrive at an average rate of 100 per hour (= λ), then L =
>> 100*0.1 = 10 is the average concurrency, namely the number of customers, on
>> average, inside the shop.
>>
>> If the shop has a capacity for fewer than 10 customers, then the system
>> will be “unstable”, meaning that a queue will form outside that will keep
>> growing indefinitely. If, on the other hand, the shop’s capacity is 10
>> customers, then the system will be stable. That means that no matter what
>> the distribution of customer arrival is, queues that form outside will not
>> grow without bound. A capacity of 10 is enough.
>>
>> Now, inside the shop there could be clerks, and each of them could be
>> treated as another system serving customers (with a concurrent capacity of
>> 1), and we could then apply the theorem to each of them separately, where
>> “inside” means being served, and “outside” is the queue forming to be
>> served by the clerk inside the shop.
>>
>> Wherever you draw your boundaries between inside and outside, and
>> whatever the unit of work is, the law tells us which system or subsystem is
>> stable and which isn’t.
>>
>> And now back to software. If our server is thread-per-request, that means
>> that a request (our unit) consumes a thread for its entire duration of
>> being served by our system, and so “inside” means from the moment a request
>> gets its thread, and “outside” means anything prior to that. It could be
>> some queue in the software, it could be some queue in the network stack,
>> memory in the network card, and even a virtual queue of users whose
>> requests might be rejected and keep retrying — it doesn’t matter. If the
>> capacity of “inside” the server — which, in this case means the maximum
>> number of threads because the number of requests “inside” such a system
>> cannot be any greater than the number of threads (or possibly lower due to
>> other limits) — is at or above L, then the system will be stable, meaning
>> that all queues, real or virtual, forming anywhere in software hardware or
>> “in the world”, will not grow without bounds.
>>
>> Indeed, this is how and why most Java servers work today. Most of them
>> are thread-per-request, and most of them use a fixed thread-pool of size X,
>> meaning that no more than X requests can be “inside” at any one time. But
>> if X is equal to or greater than L, then the server will not become
>> overloaded indefinitely and will manage to serve all requests. It does not
>> mean that no requests will ever be rejected, but it does mean that the
>> server will carry out its work satisfactorily for an indefinite duration.
>>
>> Whether or not a server is thread-per-request, wherever you define the
>> boundary of “inside”, there can only be some bounded capacity for
>> concurrent requests “inside”, and some real or virtual queue forming
>> outside.
>>
>> The theorem absolutely does not mean that in order to perform in this
>> manner the network buffers or internal queue must be of unlimited capacity
>> (which they can never be). In fact, even if there are no software/hardware
>> buffers at all and a request is rejected if all threads are busy, we know
>> that the number of people waiting retrying because their requests are
>> rejected will not grow without bound. Implementing buffers in software or
>> hardware mean that some of the queue will be automatically managed by a
>> digital machine, and could improve momentary user-experience, but whatever
>> its size limit is, the number of units waiting “outside” in all queues —
>> virtual or digital — is unaffected by where the queue is, will grow
>> indefinitely if the capacity “inside” is below L, and will not grow
>> indefinitely if the capacity inside is L or above.
>>
>> That is why servers work, and that is why they work exactly as
>> mathematical theorems tell us they must.
>>
>> — Ron
>>
>>
>>
>>
>>
>> On 2 Aug 2022, at 08:34, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>>
>> Hi Ron,
>>
>> The "maximum capacity for concurrent requests" statement uses
>> "concurrent" in the sense of Java threads. Little's law counts queued
>> requests as "concurrent", too. Then assuming queue can be arbitrarily
>> large, we can't have a finite max capacity for concurrent requests.
>>
>>
>> Alex
>>
>> On Sun, 31 Jul 2022, 17:38 Ron Pressler, <ron.pressler at oracle.com> wrote:
>>
>>> A thread-per-request server under an average load 100req/s and an
>>> average request processing duration of 10ms will not destabilise if it can
>>> have no more than 10 threads, where “destabilise” means that the queue
>>> grows indefinitely. I.e. in a a stable system is defined as one where
>>> queues do not grow indefinitely.
>>>
>>> While it is true that for any queue size there is some shrinking though
>>> non-zero probability that it might be reached, it is absolutely unrelated
>>> to the server being thread-per-request. However it is that you choose to
>>> represent a request, if your maximum capacity for concurrent requests is at
>>> or above L, the system will be stable.
>>>
>>> — Ron
>>>
>>> On 31 Jul 2022, at 13:27, Alex Otenko <oleksandr.otenko at gmail.com>
>>> wrote:
>>>
>>> Hi Ron,
>>>
>>> I am glad you mentioned this visualisation experiment. That's the sort
>>> of experiment I proposed (and done) what seems like a week ago. You may
>>> find some unintuitive things coming from this experiment, since you called
>>> them untrue.
>>>
>>>
>>> Alex
>>>
>>> On Fri, 29 Jul 2022, 14:53 Ron Pressler, <ron.pressler at oracle.com>
>>> wrote:
>>>
>>>>
>>>> BTW, the easiest way to visualise the temporary discrepancies between
>>>> the averages in Little’s law and instantaneous behaviour is to think of the
>>>> queue forming at the entry to the system.
>>>>
>>>> If the system is unstable (i.e. the equation doesn’t hold), the queue
>>>> will grow without bounds. If it is stable, it can momentarily grow but will
>>>> then shrink as we regress to the mean. So suppose λ = 100 and W = 1/20, and
>>>> therefore the average concurrency 5. If the *maximum* capacity for
>>>> concurrent operations is also 5 (e.g we have a thread-per-request server
>>>> that uses just one thread per request and the maximum number of threads we
>>>> can support is 5), then if the rate of requests momentarily rises above 100
>>>> then the queue will grow, but will eventually shrink when the rate drops
>>>> below 100 (as it must).
>>>>
>>>> So if our throughput/rate-of-requests is expected to not exceed 100, we
>>>> should be fine supporting just 5 threads. To get a feel that this actually
>>>> works, consider that the world’s most popular thread-per-request servers
>>>> already work in exactly this way. Rather than spawning a brand new thread
>>>> for every request, they borrow one from a pool. The pool is normally fixed
>>>> and set to something like a few hundred threads by default. They work fine
>>>> as long as their throughput doesn’t exceed the maximum expected throughput,
>>>> where the threads in the pool (or perhaps some other resource) is
>>>> exhausted. I.e. they do *not* work by allowing for an unbounded number of
>>>> threads, but a bounded one; they are very much thread-per-request, yet
>>>> their threads are capped. This does place an upper limit on their
>>>> throughput, but they work fine until they reach it.
>>>>
>>>> Of course, if other resources are exhausted before the pool is
>>>> depleted, then (fanout aside) lightweight threads aren’t needed. But
>>>> because there are so many systems where the threads are the resource that’s
>>>> exhausted first, people invented non-thread-per-request (i.e. asynchronous)
>>>> servers as well as lightweight threads.
>>>>
>>>> — Ron
>>>>
>>>> On 29 Jul 2022, at 00:46, Ron Pressler <ron.pressler at oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 28 Jul 2022, at 21:31, Alex Otenko <oleksandr.otenko at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Ron,
>>>>
>>>> The claim in JEP is the same as in this email thread, so that is not
>>>> much help. But now I don't need help anymore, because I found the
>>>> explaination how the thread count, response time, request rate and
>>>> thread-per-request are connected.
>>>>
>>>> Now what makes me bothered about the claims.
>>>>
>>>> Little's law connects throughput to concurrency. We agreed it has no
>>>> connection to thread count. That's a disconnect between the claim about
>>>> threads and Little's law dictating it.
>>>>
>>>>
>>>> Not really, no. In thread-per-request systems, the number of threads is
>>>> equal to (or perhaps greater than) the concurrency because that’s the
>>>> definition of thread-per-request. That’s why we agreed that in
>>>> thread-per-request systems, Little’s law tells us the (average) number of
>>>> threads.
>>>>
>>>>
>>>> There's also the assumption that response time remains constant, but
>>>> that's a mighty assumption - response time changes with thread count.
>>>>
>>>>
>>>> There is absolutely no such assumption. Unless the a rising request
>>>> rate causes the latency to significantly *drop*, the number of threads will
>>>> grow. If the latency happens to rise, the number of threads will grow even
>>>> faster.
>>>>
>>>>
>>>> There's also the claim of needing more threads. That's also not
>>>> something that follows from thread-per-request. Essentially,
>>>> thread-per-request is a constructive proof of having an infinite number of
>>>> threads. How can one want more? Also, from a different angle - the number
>>>> of threads in the thread-per-request needed does not depend on throughput
>>>> at all.
>>>>
>>>>
>>>> The average number of requests being processed concurrently is equal to
>>>> the rate of requests (i.e. throughput) times the average latency. Again,
>>>> because a thread-per-request system is defined as one that assigns (at
>>>> least) one thread for every request, the number of threads is therefore
>>>> proportional to the throughput (as long as the system is stable).
>>>>
>>>>
>>>> Just consider what the request rate means. It means that if you choose
>>>> however small time frame, and however large request count, there is a
>>>> nonzero probability that it will happen. Consequently the number of threads
>>>> needed is arbitrarily large for any throughput.
>>>>
>>>>
>>>> Little’s theorem is about *long-term average* concurrency, latency and
>>>> throughput, and it is interesting precisely because it holds regardless of
>>>> the distribution of the requests.
>>>>
>>>> Which is just another way to say that the number of threads is
>>>> effectively infinite and there is no point trying to connect it to Little's
>>>> law.
>>>>
>>>>
>>>> I don’t understand what that means. A mathematical theorem about some
>>>> quantity (again the theorem is about concurrency, but we *define* a
>>>> thread-per-request system to be one where the number of threads is equal to
>>>> (or greater than) the concurrency) is true whether you think there’s a
>>>> point to it or not. The (average) number of threads is obviously not
>>>> infinite, but equal to the throughput times latency (assuming just one
>>>> thread per request).
>>>>
>>>> Since Little’s law has been effectively used to size sever systems for
>>>> decades, and so obviously there’s also a very practical point to
>>>> understanding it.
>>>>
>>>> Request rate doesn't change the number of threads that can exist at any
>>>> given time
>>>>
>>>>
>>>> The (average) number of threads in a thread-per-request system rises
>>>> proportionately with the (average) throughput. We’re not talking about the
>>>> number of threads that *can* exist, but the number of threads that *do*
>>>> exist (on average, of course). The number of threads that *can* exist puts
>>>> a bound on the number of threads that *do* exist, and so on maximum
>>>> throughput.
>>>>
>>>> it only changes the probability of observing any particular number of
>>>> them in a fixed period of time.
>>>>
>>>>
>>>> It changes their (average) number. You can start any thread-per-request
>>>> server increase the load and see for yourself (if the server uses a pool,
>>>> you’ll see an increase not in live threads but in non-idle threads, but
>>>> it’s the same thing).
>>>>
>>>>
>>>> All this is only criticism of the formal mathematical claims made here
>>>> and in the JEP. Nothing needs doing, if no one is interested in formal
>>>> claims being perfect.
>>>>
>>>>
>>>> The claims, however, were written with care for precision, while your
>>>> reading of them is not only imprecise and at times incorrect, but may lead
>>>> people to misunderstand how concurrent systems behave.
>>>>
>>>>
>>>>
>>>> Alex
>>>>
>>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220802/674688d3/attachment-0001.htm>