Cache topology aware scheduling

Danny Thomas dannyt at netflix.com
Fri Sep 13 03:55:50 UTC 2024


Even with 10s of thousands of tasks queued, it looks like it's more than
fast enough as a heuristic. I'm now doing a choice of two, with the current
processor's pool being the preferred choice. For the simple external
submit, starting a virtual thread that spawns another sharing data, I see
up to a 25% improvement in throughput (pleasingly, the default scheduler
occasionally accidentally lands workers close to each other and comes
within a few percent).

I think we want to be as sticky as we can to the current worker/cluster, so
ForkJoinWorkerThread.hasKnownQueuedWork is probably too conservative as a
heuristic, but thanks for the heads up.

Have you gotten as far as thinking about how yielding and compensation will
be exposed?

On Thu, Sep 12, 2024 at 3:53 AM Alan Bateman <alan.bateman at oracle.com>
wrote:

> On 10/09/2024 05:10, Danny Thomas wrote:
>
> I've switched to foreign functions for the native calls, using the current
> CPU for external submissions, and a queuing threshold to decide when to
> select the least loaded pool. Significantly improved CPU utilization versus
> the default scheduler with a slight throughput bump:
>
>
> https://github.com/DanielThomas/virtual-threads-cluster-aware/commit/c0e7b6141a84eb77e6848fa84014e7a98ddfc75b
> <https://urldefense.com/v3/__https://github.com/DanielThomas/virtual-threads-cluster-aware/commit/c0e7b6141a84eb77e6848fa84014e7a98ddfc75b__;!!ACWV5N9M2RV99hQ!NYPwGwIo_GAtKLVsGpCxRAPgR40IvyvZFfG8b3prgbw7T9DAJqjaElA9yjBq1CaMR0Gef5W4FwxUK50Hxw$>
>
> I'll improve the benchmark to be lumpier with more submission pressure to
> make work stealing more of a factor, and then look at balancing with
> pollSubmission.
>
>
> Good use of FFM.
>
> You probably know this already: ForkJoinPool::getQueuedSubmissionCount is
> a O(n) scan so will be interesting to see how this performs as a heuristic.
>
> Related is that ForkWorkWorkerThread has a method that tests two queues
> (local and "current source") as a cheap way to test if it could execute
> something immediately.  This is currently used by Exchanger and
> LinkedTransferQueue to influence whether to spin.  Doug Lea has been
> thinking about whether to expose. Your experiments may be case that could
> use it.
>
> -Alan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240913/a6a42d84/attachment-0001.htm>


More information about the loom-dev mailing list