effectiveness of jdk.virtualThreadScheduler.maxPoolSize

Mon Jan 9 08:48:58 UTC 2023

Ron, I think you are being purposefully obtuse by not recognizing that some
folks are going to run high CPU jobs in vthreads. The proof is with the
folks using Go that already encountered it and fixed it.

On Mon, Jan 9, 2023 at 12:46 AM Arnaud Masson <arnaud.masson at fr.ibm.com>
wrote:

> Side note : it seems “more” preemptive time sharing was added for
> goroutines in Go 1.14 to avoid the kind of scheduling starvation we
> discussed:
>
>
>
>
> https://medium.com/a-journey-with-go/go-asynchronous-preemption-b5194227371c
>
>
>
> Thanks
>
> Arnaud
>
>
>
> I don’t know how to ev
>
> I don’t know how to evaluate and compare solutions before knowing what the
> problem they supposedly solve is, so I have no way of knowing which if any
> of time-sharing for virtual threads or adding scheduler workers is a better
> solution to a problem that hasn’t been reported.
>
>
>
> If servers employing virtual threads tend to reach conditions where
> time-sharing could help, then when the problem is reported we would be more
> than happy to fix it with that solution. What I’m trying to convey is not
> that I think your hypothesis must be wrong, but that it is not necessarily
> correct, either, and so we are simply unable to fix hypothetical bugs. It
> really doesn’t matter how strongly anyone feels that some problem *could*
> arise. We cannot fix bugs before they are reported, so that we can assess
> their severity, prioritise them, and consider solutions. If you do find a
> problem with virtual threads, please report it to this mailing list.
>
>
>
> — Ron
>
>
>
>
>
>
>
> On 8 Jan 2023, at 23:19, Robert Engels <rengels at ix.netcom.com> wrote:
>
>
>
> We’ll have to agree to disagree. I think servers routinely hit 100% cpu
> and rely on the scheduler to deprioritize tasks to be fair - maybe not
> “forever” but for extended periods.
>
> This is not dissimilar to the various background tools for cosmos
> searching that use the available cycles - as long as nothing else needs
> them.
>
> I am guessing a lot of long running simulations work similarly.
>
> As I’ve said though I don’t think it’s a huge deal - move things that have
> to run to their own native thread pool. Maybe that’s a better and simpler
> solution than trying to add time slicing to vthreads anyway.
>
>
> On Jan 8, 2023, at 5:05 PM, Ron Pressler <ron.pressler at oracle.com> wrote:
>
> 
>
>
> On 8 Jan 2023, at 18:58, robert engels <rengels at ix.netcom.com> wrote:
>
> But even if not using spin locks - with fair scheduling you expect shorter
> runtime tasks to be completed before long-running cpu bound tasks. The
> linux scheduler will lower the priority of threads that hog the cpu too
> much to facilitate this even further (or use a scheduler type of
> ‘batch/idle’ - i.e. only run when nothing else needs to run).
>
>
> If you use spin locks and have significantly more threads than cores, you
> may well experience an orders of magnitud slow-down. I.e., you don’t use
> spin locks and rely on time-sharing to make them work well — there won’t be
> a deadlock, but you won’t get an acceptable behaviour, either.
>
>
>
> So if very short tasks get stuck behind long-running cpu bound tasks this
> is unexpected behavior - it is not necessarily incorrect. If you spawned
> more carrier threads (i.e. when the scheduler feels the tasks are not
> making “progress”) you give more of a chance for the OS scheduler to give
> cpu time to the short lived tasks. I think that is what the OP was trying
> to say.
>
>
> This may seem obvious at first, but the easiest way to explain why it’s
> hard to find an actual problem caused by this (and why you don’t actually
> rely on the “expected behaviour”) is to remember that the OS will do this
> (pretty-much) only when your CPU is at 100%. But at that point, there’s
> very little the OS can do to make your server behave well. When you’re at
> 100% CPU for any significant duration, since requests keep coming, they’re
> piling up and your server is now unstable. In other words, time sharing
> only kicks in when it can’t do much good.
>
> In non-realtime kernels, time-sharing is mostly used to run a small number
> of background tasks or to keep the machine responsive to an operator in
> cases of emergency. Clever scheduling is not enough to compensate for lack
> of resources. Time sharing can also be useful for to smooth over the
> latency of CPU-bound batch-processing tasks, but since it only makes sense
> to run only a few of them in parallel, virtual threads can’t give you any
> benefit there anyway.
>
> However, as I’ve said multiple times already, we don’t rule out the
> possibility of a workload where time sharing could solve an actual problem.
> So if you encounter a problem, please report it.
>
> — Ron
>
>
> Unless otherwise stated above:
>
> Compagnie IBM France
> Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
> RCS Nanterre 552 118 465
> Forme Sociale : S.A.S.
> Capital Social : 664 069 390,60 €
> SIRET : 552 118 465 03644 - Code NAF 6203Z
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230109/6c09617e/attachment.htm>