Cache topology aware scheduling

Danny Thomas dannyt at netflix.com
Fri Oct 4 07:31:24 UTC 2024


I’d wondered about the external submit changes; I’ll take your word for it
:)

Does seem that if you do expose an API, you’d want to differentiate things
such as unpark/yield from other submissions.

I guess there’s also pathological cases where a VT is primarily driving
creation of virtual threads in an application. I’ve tried a few heuristics
for deciding when to externally submit instead of locally submitting to a
FJP, and I found it difficult to avoid prematurely and ineffectively moving
work; but I am working with purposefully contrived benchmarks right now,
rather than something that looks like a real application.

On Fri, 4 Oct 2024 at 4:34 pm, Alan Bateman <alan.bateman at oracle.com> wrote:

> On 04/10/2024 07:13, Danny Thomas wrote:
>
> After quite a bit of experimentation, I can at least say that last level
> cache aware task placement on 4th Generation EPYC (Genoa) is a real boon. I
> generalised my original approach, because it doesn't involve customizing
> the nodes-per-socket setting (which we can't do on AWS anyway, NPS = 1),
> introduce the risks/complexity of processor isolation and per-thread
> affinity, or make the scheduler's life too difficult:
>
>
> https://github.com/DanielThomas/virtual-threads-cluster-aware/blob/main/src/main/java/com/netflix/sandbox/ClusteredExecutors.java
> <https://urldefense.com/v3/__https://github.com/DanielThomas/virtual-threads-cluster-aware/blob/main/src/main/java/com/netflix/sandbox/ClusteredExecutors.java__;!!ACWV5N9M2RV99hQ!NSA8NGBYdS4YOkzpmpUb-x03qPs0pd5lMgqqL7SjRr1z53q7xWTCHwpaUxPCDX-TTx3NcuqqRnQtvCsTfw$>
>
> With virtual-to-virtual-thread submissions and particularly structured
> concurrency providing a heuristic for locality, I'm convinced there's a
> significant opportunity here. I've still got some more real world
> experiments to run, but will get a TechBlog post up when I have something
> to share.
>
> Thanks for the update, it's nice to get periodic "status reports" on
> experiments in this area.
>
> One issue that I assume will be problematic without API support is when a
> virtual thread T unparks another virtual thread. The execute method will be
> called in the context of currentThread=T, not T's carrier. Until recently
> it was called in the context of T's carrier (for reasons that are too
> complicated to get into here) which has a bunch of implications that have
> now been smoothed out. For your experiments I suspect this means it will
> fallback to round robin or choose a random pool. It's a topic that needs a
> more thought as some custom schedulers will need the carrier's identity or
> maintain a mapping of virtual thread to "place", "place" in this case is
> the cluster.
>
> -Alan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20241004/77fcf551/attachment-0001.htm>


More information about the loom-dev mailing list