Cache topology aware scheduling

Fri Sep 13 16:35:32 UTC 2024

On 13/09/2024 04:55, Danny Thomas wrote:
> Even with 10s of thousands of tasks queued, it looks like it's more 
> than fast enough as a heuristic. I'm now doing a choice of two, with 
> the current processor's pool being the preferred choice. For the 
> simple external submit, starting a virtual thread that spawns another 
> sharing data, I see up to a 25% improvement in throughput (pleasingly, 
> the default scheduler occasionally accidentally lands workers close to 
> each other and comes within a few percent).
>
> I think we want to be as sticky as we can to the current 
> worker/cluster, so ForkJoinWorkerThread.hasKnownQueuedWork is probably 
> too conservative as a heuristic, but thanks for the heads up.
>
> Have you gotten as far as thinking about how yielding and compensation 
> will be exposed?

I think the experiment that Francesco may be based on the prototype 
VirtualThreadTask interface that we had temporarily exposed in EA builds 
some time ago. That gave the mapping of task to virtual Thread that thus 
thread state and park blocker when yielding.

There isn't much need for compensating right now, at least not since the 
changes to Object.wait to preempt when waiting. There is still a need to 
support reverse DNS lookups but that has an SPI now [1] so a different 
resolver can be deployed if needed.

As to your question, then this project hasn't decided whether to expose 
anything. There at least 3 exploration efforts going on right now, two 
with implClass, the other (I think) with prototype API, and we want to 
see what we can learn from these experiments.

-Alan

[1] https://openjdk.org/jeps/418