<div dir="auto">I’d wondered about the external submit changes; I’ll take your word for it :)</div><div dir="auto"><br></div><div dir="auto">Does seem that if you do expose an API, you’d want to differentiate things such as unpark/yield from other submissions. <br></div><div dir="auto"><br></div><div dir="auto">I guess there’s also pathological cases where a VT is primarily driving creation of virtual threads in an application. <span style="font-family:-apple-system,helveticaneue;background-color:rgba(0,0,0,0);border-color:rgb(0,0,0);color:rgb(0,0,0)">I’ve tried a few heuristics for deciding when to externally submit instead of locally submitting to a FJP, and I found it difficult to avoid prematurely and ineffectively moving work; but I am working with purposefully contrived benchmarks right now, rather than something that looks like a real application.</span></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 4 Oct 2024 at 4:34 pm, Alan Bateman <<a href="mailto:alan.bateman@oracle.com">alan.bateman@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><u></u>


  <div>

    On 04/10/2024 07:13, Danny Thomas wrote:<br>

    <blockquote type="cite">

      
      <div dir="ltr">After quite a bit of experimentation, I can at

        least say that last level cache aware task placement on 4th

        Generation EPYC (Genoa) is a real boon. I generalised my

        original approach, because it doesn't involve customizing the

        nodes-per-socket setting (which we can't do on AWS anyway, NPS =

        1), introduce the risks/complexity of processor isolation and

        per-thread affinity, or make the scheduler's life too difficult:<br>

        <br>

        <a href="https://urldefense.com/v3/__https://github.com/DanielThomas/virtual-threads-cluster-aware/blob/main/src/main/java/com/netflix/sandbox/ClusteredExecutors.java__;!!ACWV5N9M2RV99hQ!NSA8NGBYdS4YOkzpmpUb-x03qPs0pd5lMgqqL7SjRr1z53q7xWTCHwpaUxPCDX-TTx3NcuqqRnQtvCsTfw$" target="_blank">https://github.com/DanielThomas/virtual-threads-cluster-aware/blob/main/src/main/java/com/netflix/sandbox/ClusteredExecutors.java</a><br>

        <br>

        With virtual-to-virtual-thread submissions and particularly

        structured concurrency providing a heuristic for locality, I'm

        convinced there's a significant opportunity here. I've still got

        some more real world experiments to run, but will get a TechBlog

        post up when I have something to share.</div>

      <br>

    </blockquote>

    Thanks for the update, it's nice to get periodic "status reports" on

    experiments in this area.<br>

    <br>

    One issue that I assume will be problematic without API support is

    when a virtual thread T unparks another virtual thread. The execute

    method will be called in the context of currentThread=T, not T's

    carrier. Until recently it was called in the context of T's carrier

    (for reasons that are too complicated to get into here) which has a

    bunch of implications that have now been smoothed out. For your

    experiments I suspect this means it will fallback to round robin or

    choose a random pool. It's a topic that needs a more thought as some

    custom schedulers will need the carrier's identity or maintain a

    mapping of virtual thread to "place", "place" in this case is the

    cluster. <br></div><div>

    <br>

    -Alan<br>

  </div>


</blockquote></div></div>