effectiveness of jdk.virtualThreadScheduler.maxPoolSize

Sun Jan 8 23:05:29 UTC 2023

> On 8 Jan 2023, at 18:58, robert engels <rengels at ix.netcom.com> wrote:
> 
> But even if not using spin locks - with fair scheduling you expect shorter runtime tasks to be completed before long-running cpu bound tasks. The linux scheduler will lower the priority of threads that hog the cpu too much to facilitate this even further (or use a scheduler type of ‘batch/idle’ - i.e. only run when nothing else needs to run).
> 

If you use spin locks and have significantly more threads than cores, you may well experience an orders of magnitud slow-down. I.e., you don’t use spin locks and rely on time-sharing to make them work well — there won’t be a deadlock, but you won’t get an acceptable behaviour, either.

> So if very short tasks get stuck behind long-running cpu bound tasks this is unexpected behavior - it is not necessarily incorrect. If you spawned more carrier threads (i.e. when the scheduler feels the tasks are not making “progress”) you give more of a chance for the OS scheduler to give cpu time to the short lived tasks. I think that is what the OP was trying to say.

This may seem obvious at first, but the easiest way to explain why it’s hard to find an actual problem caused by this (and why you don’t actually rely on the “expected behaviour”) is to remember that the OS will do this (pretty-much) only when your CPU is at 100%. But at that point, there’s very little the OS can do to make your server behave well. When you’re at 100% CPU for any significant duration, since requests keep coming, they’re piling up and your server is now unstable. In other words, time sharing only kicks in when it can’t do much good.

In non-realtime kernels, time-sharing is mostly used to run a small number of background tasks or to keep the machine responsive to an operator in cases of emergency. Clever scheduling is not enough to compensate for lack of resources. Time sharing can also be useful for to smooth over the latency of CPU-bound batch-processing tasks, but since it only makes sense to run only a few of them in parallel, virtual threads can’t give you any benefit there anyway.

However, as I’ve said multiple times already, we don’t rule out the possibility of a workload where time sharing could solve an actual problem. So if you encounter a problem, please report it.

— Ron