<div dir="auto">I’d wondered about the external submit changes; I’ll take your word for it :)</div><div dir="auto"><br></div><div dir="auto">Does seem that if you do expose an API, you’d want to differentiate things such as unpark/yield from other submissions. <br></div><div dir="auto"><br></div><div dir="auto">I guess there’s also pathological cases where a VT is primarily driving creation of virtual threads in an application. <span style="font-family:-apple-system,helveticaneue;background-color:rgba(0,0,0,0);border-color:rgb(0,0,0);color:rgb(0,0,0)">I’ve tried a few heuristics for deciding when to externally submit instead of locally submitting to a FJP, and I found it difficult to avoid prematurely and ineffectively moving work; but I am working with purposefully contrived benchmarks right now, rather than something that looks like a real application.</span></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 4 Oct 2024 at 4:34 pm, Alan Bateman <<a href="mailto:alan.bateman@oracle.com">alan.bateman@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><u></u>
<div>
On 04/10/2024 07:13, Danny Thomas wrote:<br>
<blockquote type="cite">
<div dir="ltr">After quite a bit of experimentation, I can at
least say that last level cache aware task placement on 4th
Generation EPYC (Genoa) is a real boon. I generalised my
original approach, because it doesn't involve customizing the
nodes-per-socket setting (which we can't do on AWS anyway, NPS =
1), introduce the risks/complexity of processor isolation and
per-thread affinity, or make the scheduler's life too difficult:<br>
<br>
<a href="https://urldefense.com/v3/__https://github.com/DanielThomas/virtual-threads-cluster-aware/blob/main/src/main/java/com/netflix/sandbox/ClusteredExecutors.java__;!!ACWV5N9M2RV99hQ!NSA8NGBYdS4YOkzpmpUb-x03qPs0pd5lMgqqL7SjRr1z53q7xWTCHwpaUxPCDX-TTx3NcuqqRnQtvCsTfw$" target="_blank">https://github.com/DanielThomas/virtual-threads-cluster-aware/blob/main/src/main/java/com/netflix/sandbox/ClusteredExecutors.java</a><br>
<br>
With virtual-to-virtual-thread submissions and particularly
structured concurrency providing a heuristic for locality, I'm
convinced there's a significant opportunity here. I've still got
some more real world experiments to run, but will get a TechBlog
post up when I have something to share.</div>
<br>
</blockquote>
Thanks for the update, it's nice to get periodic "status reports" on
experiments in this area.<br>
<br>
One issue that I assume will be problematic without API support is
when a virtual thread T unparks another virtual thread. The execute
method will be called in the context of currentThread=T, not T's
carrier. Until recently it was called in the context of T's carrier
(for reasons that are too complicated to get into here) which has a
bunch of implications that have now been smoothed out. For your
experiments I suspect this means it will fallback to round robin or
choose a random pool. It's a topic that needs a more thought as some
custom schedulers will need the carrier's identity or maintain a
mapping of virtual thread to "place", "place" in this case is the
cluster. <br></div><div>
<br>
-Alan<br>
</div>
</blockquote></div></div>