[External] : Re: jstack, profilers and other tools

Tue Aug 2 14:47:02 UTC 2022

I don't know what you find incorrect about the statement that queue wait is
included in Little's law.

66.6667 requests per second processed by a single thread, spending 10ms on
each request. Little's law says that in this case concurrency is 2. Surely
it means that the requests in the queue are included in concurrency?

Alex

On Tue, 2 Aug 2022, 12:04 Ron Pressler, <ron.pressler at oracle.com> wrote:

> I’m afraid that is incorrect and, indeed, quite harmful to understanding
> how Little’s law has been used to size systems for decades.
>
> Little’s law refers to some unit, like “request” or “customer”, and also
> some boundary of a “store” or “server” where the request can be either
> inside or outside. The definition of what’s inside and outside are
> arbitrary, and the law obviously works for any such definition.  W is the
> latency that the “request” spends “inside.”
>
> So, for example, suppose the system is a shop and customers are the unit.
> If a customer spends an average of 6 minutes *inside* the shop (so L = 0.1
> hr) and customers arrive at an average rate of 100 per hour (= λ), then L =
> 100*0.1 = 10 is the average concurrency, namely the number of customers, on
> average, inside the shop.
>
> If the shop has a capacity for fewer than 10 customers, then the system
> will be “unstable”, meaning that a queue will form outside that will keep
> growing indefinitely. If, on the other hand, the shop’s capacity is 10
> customers, then the system will be stable. That means that no matter what
> the distribution of customer arrival is, queues that form outside will not
> grow without bound. A capacity of 10 is enough.
>
> Now, inside the shop there could be clerks, and each of them could be
> treated as another system serving customers (with a concurrent capacity of
> 1), and we could then apply the theorem to each of them separately, where
> “inside” means being served, and “outside” is the queue forming to be
> served by the clerk inside the shop.
>
> Wherever you draw your boundaries between inside and outside, and whatever
> the unit of work is, the law tells us which system or subsystem is stable
> and which isn’t.
>
> And now back to software. If our server is thread-per-request, that means
> that a request (our unit) consumes a thread for its entire duration of
> being served by our system, and so “inside” means from the moment a request
> gets its thread, and “outside” means anything prior to that. It could be
> some queue in the software, it could be some queue in the network stack,
> memory in the network card, and even a virtual queue of users whose
> requests might be rejected and keep retrying — it doesn’t matter. If the
> capacity of “inside” the server — which, in this case means the maximum
> number of threads because the number of requests “inside” such a system
> cannot be any greater than the number of threads (or possibly lower due to
> other limits) — is at or above L, then the system will be stable, meaning
> that all queues, real or virtual, forming anywhere in software hardware or
> “in the world”, will not grow without bounds.
>
> Indeed, this is how and why most Java servers work today. Most of them are
> thread-per-request, and most of them use a fixed thread-pool of size X,
> meaning that no more than X requests can be “inside” at any one time. But
> if X is equal to or greater than L, then the server will not become
> overloaded indefinitely and will manage to serve all requests. It does not
> mean that no requests will ever be rejected, but it does mean that the
> server will carry out its work satisfactorily for an indefinite duration.
>
> Whether or not a server is thread-per-request, wherever you define the
> boundary of “inside”, there can only be some bounded capacity for
> concurrent requests “inside”, and some real or virtual queue forming
> outside.
>
> The theorem absolutely does not mean that in order to perform in this
> manner the network buffers or internal queue must be of unlimited capacity
> (which they can never be). In fact, even if there are no software/hardware
> buffers at all and a request is rejected if all threads are busy, we know
> that the number of people waiting retrying because their requests are
> rejected will not grow without bound. Implementing buffers in software or
> hardware mean that some of the queue will be automatically managed by a
> digital machine, and could improve momentary user-experience, but whatever
> its size limit is, the number of units waiting “outside” in all queues —
> virtual or digital — is unaffected by where the queue is, will grow
> indefinitely if the capacity “inside” is below L, and will not grow
> indefinitely if the capacity inside is L or above.
>
> That is why servers work, and that is why they work exactly as
> mathematical theorems tell us they must.
>
> — Ron
>
>
>
>
>
> On 2 Aug 2022, at 08:34, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>
> Hi Ron,
>
> The "maximum capacity for concurrent requests" statement uses "concurrent"
> in the sense of Java threads. Little's law counts queued requests as
> "concurrent", too. Then assuming queue can be arbitrarily large, we can't
> have a finite max capacity for concurrent requests.
>
>
> Alex
>
> On Sun, 31 Jul 2022, 17:38 Ron Pressler, <ron.pressler at oracle.com> wrote:
>
>> A thread-per-request server under an average load 100req/s and an average
>> request processing duration of 10ms will not destabilise if it can have no
>> more than 10 threads, where “destabilise” means that the queue grows
>> indefinitely. I.e. in a a stable system is defined as one where queues do
>> not grow indefinitely.
>>
>> While it is true that for any queue size there is some shrinking though
>> non-zero probability that it might be reached, it is absolutely unrelated
>> to the server being thread-per-request. However it is that you choose to
>> represent a request, if your maximum capacity for concurrent requests is at
>> or above L, the system will be stable.
>>
>> — Ron
>>
>> On 31 Jul 2022, at 13:27, Alex Otenko <oleksandr.otenko at gmail.com> wrote:
>>
>> Hi Ron,
>>
>> I am glad you mentioned this visualisation experiment. That's the sort of
>> experiment I proposed (and done) what seems like a week ago. You may find
>> some unintuitive things coming from this experiment, since you called them
>> untrue.
>>
>>
>> Alex
>>
>> On Fri, 29 Jul 2022, 14:53 Ron Pressler, <ron.pressler at oracle.com> wrote:
>>
>>>
>>> BTW, the easiest way to visualise the temporary discrepancies between
>>> the averages in Little’s law and instantaneous behaviour is to think of the
>>> queue forming at the entry to the system.
>>>
>>> If the system is unstable (i.e. the equation doesn’t hold), the queue
>>> will grow without bounds. If it is stable, it can momentarily grow but will
>>> then shrink as we regress to the mean. So suppose λ = 100 and W = 1/20, and
>>> therefore the average concurrency 5. If the *maximum* capacity for
>>> concurrent operations is also 5 (e.g we have a thread-per-request server
>>> that uses just one thread per request and the maximum number of threads we
>>> can support is 5), then if the rate of requests momentarily rises above 100
>>> then the queue will grow, but will eventually shrink when the rate drops
>>> below 100 (as it must).
>>>
>>> So if our throughput/rate-of-requests is expected to not exceed 100, we
>>> should be fine supporting just 5 threads. To get a feel that this actually
>>> works, consider that the world’s most popular thread-per-request servers
>>> already work in exactly this way. Rather than spawning a brand new thread
>>> for every request, they borrow one from a pool. The pool is normally fixed
>>> and set to something like a few hundred threads by default. They work fine
>>> as long as their throughput doesn’t exceed the maximum expected throughput,
>>> where the threads in the pool (or perhaps some other resource) is
>>> exhausted. I.e. they do *not* work by allowing for an unbounded number of
>>> threads, but a bounded one; they are very much thread-per-request, yet
>>> their threads are capped. This does place an upper limit on their
>>> throughput, but they work fine until they reach it.
>>>
>>> Of course, if other resources are exhausted before the pool is depleted,
>>> then (fanout aside) lightweight threads aren’t needed. But because there
>>> are so many systems where the threads are the resource that’s exhausted
>>> first, people invented non-thread-per-request (i.e. asynchronous) servers
>>> as well as lightweight threads.
>>>
>>> — Ron
>>>
>>> On 29 Jul 2022, at 00:46, Ron Pressler <ron.pressler at oracle.com> wrote:
>>>
>>>
>>>
>>> On 28 Jul 2022, at 21:31, Alex Otenko <oleksandr.otenko at gmail.com>
>>> wrote:
>>>
>>> Hi Ron,
>>>
>>> The claim in JEP is the same as in this email thread, so that is not
>>> much help. But now I don't need help anymore, because I found the
>>> explaination how the thread count, response time, request rate and
>>> thread-per-request are connected.
>>>
>>> Now what makes me bothered about the claims.
>>>
>>> Little's law connects throughput to concurrency. We agreed it has no
>>> connection to thread count. That's a disconnect between the claim about
>>> threads and Little's law dictating it.
>>>
>>>
>>> Not really, no. In thread-per-request systems, the number of threads is
>>> equal to (or perhaps greater than) the concurrency because that’s the
>>> definition of thread-per-request. That’s why we agreed that in
>>> thread-per-request systems, Little’s law tells us the (average) number of
>>> threads.
>>>
>>>
>>> There's also the assumption that response time remains constant, but
>>> that's a mighty assumption - response time changes with thread count.
>>>
>>>
>>> There is absolutely no such assumption. Unless the a rising request rate
>>> causes the latency to significantly *drop*, the number of threads will
>>> grow. If the latency happens to rise, the number of threads will grow even
>>> faster.
>>>
>>>
>>> There's also the claim of needing more threads. That's also not
>>> something that follows from thread-per-request. Essentially,
>>> thread-per-request is a constructive proof of having an infinite number of
>>> threads. How can one want more? Also, from a different angle - the number
>>> of threads in the thread-per-request needed does not depend on throughput
>>> at all.
>>>
>>>
>>> The average number of requests being processed concurrently is equal to
>>> the rate of requests (i.e. throughput) times the average latency. Again,
>>> because a thread-per-request system is defined as one that assigns (at
>>> least) one thread for every request, the number of threads is therefore
>>> proportional to the throughput (as long as the system is stable).
>>>
>>>
>>> Just consider what the request rate means. It means that if you choose
>>> however small time frame, and however large request count, there is a
>>> nonzero probability that it will happen. Consequently the number of threads
>>> needed is arbitrarily large for any throughput.
>>>
>>>
>>> Little’s theorem is about *long-term average* concurrency, latency and
>>> throughput, and it is interesting precisely because it holds regardless of
>>> the distribution of the requests.
>>>
>>> Which is just another way to say that the number of threads is
>>> effectively infinite and there is no point trying to connect it to Little's
>>> law.
>>>
>>>
>>> I don’t understand what that means. A mathematical theorem about some
>>> quantity (again the theorem is about concurrency, but we *define* a
>>> thread-per-request system to be one where the number of threads is equal to
>>> (or greater than) the concurrency) is true whether you think there’s a
>>> point to it or not. The (average) number of threads is obviously not
>>> infinite, but equal to the throughput times latency (assuming just one
>>> thread per request).
>>>
>>> Since Little’s law has been effectively used to size sever systems for
>>> decades, and so obviously there’s also a very practical point to
>>> understanding it.
>>>
>>> Request rate doesn't change the number of threads that can exist at any
>>> given time
>>>
>>>
>>> The (average) number of threads in a thread-per-request system rises
>>> proportionately with the (average) throughput. We’re not talking about the
>>> number of threads that *can* exist, but the number of threads that *do*
>>> exist (on average, of course). The number of threads that *can* exist puts
>>> a bound on the number of threads that *do* exist, and so on maximum
>>> throughput.
>>>
>>> it only changes the probability of observing any particular number of
>>> them in a fixed period of time.
>>>
>>>
>>> It changes their (average) number. You can start any thread-per-request
>>> server increase the load and see for yourself (if the server uses a pool,
>>> you’ll see an increase not in live threads but in non-idle threads, but
>>> it’s the same thing).
>>>
>>>
>>> All this is only criticism of the formal mathematical claims made here
>>> and in the JEP. Nothing needs doing, if no one is interested in formal
>>> claims being perfect.
>>>
>>>
>>> The claims, however, were written with care for precision, while your
>>> reading of them is not only imprecise and at times incorrect, but may lead
>>> people to misunderstand how concurrent systems behave.
>>>
>>>
>>>
>>> Alex
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220802/0883c177/attachment-0001.htm>