<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">Btw - if the “periodically yield” is implicitly implemented by using a fair semaphore where the number of permits equals the number of cpu cores then you have to ensure you bracket “non yielding code”. </div><div dir="ltr"><br></div><div dir="ltr">I think it is safer to use a maybeYield() call that uses a thread local and tests the last yield time.</div><div dir="ltr"><br></div><div dir="ltr">To Ron, subclassing yield() in VirtualThread would allow easy user implemented scheduling without the complexity of a full scheduler. Only need access to last context switch time for the thread to implement some fairly capable scheduling models. </div><div dir="ltr"><br><blockquote type="cite">On Apr 16, 2023, at 3:17 PM, Robert Engels <rengels@ix.netcom.com> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div dir="ltr"></div><div dir="ltr">I agree. With virtual threads I believe the design of Truno could be simplified. Simply start tasks to handle requests, maybe pass/block to other IO handlers. </div><div dir="ltr"><br></div><div dir="ltr">Just let them loose! The context switching is negligible. The only time this breaks down is with long running cpu only tasks - then you need to periodically yield. </div><div dir="ltr"><br><blockquote type="cite">On Apr 16, 2023, at 3:10 PM, Patrick Bolden <boldenpm@gmail.com> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr">Forgive my naivety, but isn't the whole point of virtual threads to abstract the number of physical cores?  I'm just trying to understand the reasoning behind the use case.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Apr 16, 2023 at 2:07 PM Martin Traverso <<a href="mailto:mtraverso@gmail.com">mtraverso@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Robert and Ron,<div><br></div><div>Thanks for your replies.<br><br><div>> Why would you expect or want this to be balanced? If the threads do IO it should naturally balance. If they are all cpu bound balancing is needed for fairness of requesting handling.<br><br>I would expect it to be balanced because:<br>* There's a fair semaphore that limits concurrency<br>* There are more virtual threads than semaphore permits<br>* Some of the threads are "presumably" blocked in the call to acquire() and would take turns to unblock, fairly<br><br>From what Ron described above, my understanding is that some threads don't even get a chance to start since the first N threads keep looping with no contention (acquiring and releasing the semaphore) and thereby preventing the second N threads from ever being scheduled.<br><br>If I add a call to Thread.yield() or Thread.sleep() for 1 ns just *once* before the loop while holding the semaphore, then it works as I would expect. My hypothesis is that that causes the scheduler to pick one of the threads that has been waiting to start to run, which subsequently makes it to the acquire() call and at some point it becomes a contended acquire. The fair semaphore does its job from then on.<br><br><font face="monospace">    Thread.ofVirtual().start(() -> {<br>        semaphore.acquireUninterruptibly();<br>        Thread.yield();<br>        semaphore.release();<br><br>        while (true) {<br>            semaphore.acquireUninterruptibly();<br>            counter.incrementAndGet();<br>            semaphore.release();<br>        }<br>    });</font><br><br>Ron, let me describe our use case. I hope it helps inform future directions for this feature.<br><br>Trino (<a href="https://trino.io" target="_blank">https://trino.io</a>) is a distributed SQL engine. Queries are decomposed into tasks that run in a cluster of workers. Each task performs a series of transformations (filtering, computing new columns, aggregations, etc). In an ideal world, we would model this as a series of nested loops, similar to how you'd implement the equivalent of a Java Stream pipeline using traditional imperative code. The problem with this approach is that these tasks can take a long time to complete. We need to be able to handle more tasks than there are available processors and share time among them.<br><br>To do this, we implemented a cooperative multitasking framework, where each of the tasks does a bit of work and then relinquishes control. A scheduler within each of the workers the decides which task to run next based on a prioritization scheme. This is all very unnatural and complex, and it prevents certain optimizations by forcing the actions within the tasks to have explicit boundaries, materialize intermediate data structures before giving up control, etc.<br><br>We're hoping that virtual threads will allow us to simplify all of this. We're also hoping that someday we'll be able to control the scheduling policies to be able to implement our own prioritization scheme -- although we have some ideas on how to work around this limitation for now.<br><div><div dir="ltr"><div dir="ltr"><div><div><br></div><div>- Martin</div><div><br></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Apr 16, 2023 at 7:55 AM Robert Engels <<a href="mailto:rengels@ix.netcom.com" target="_blank">rengels@ix.netcom.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"></div><div dir="ltr">Hi Martin,</div><div dir="ltr"><br></div><div dir="ltr">Why would you expect or want this to be balanced? If the threads do IO it should naturally balance. If they are all cpu bound balancing is needed for fairness of requesting handling. </div><div dir="ltr"><br></div><div dir="ltr">This has been brought up a few times. Small tasks can be blocked for a long time behind long cpu bound tasks. The only solution is to periodically yield() those tasks. </div><div dir="ltr"><br><blockquote type="cite">On Apr 16, 2023, at 9:40 AM, Ron Pressler <<a href="mailto:ron.pressler@oracle.com" target="_blank">ron.pressler@oracle.com</a>> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr">

Hi.

<div><br>

</div>

<div>What you’re seeing is the result of the virtual thread scheduler not employing time sharing. That is because we have yet to identify workloads, especially those that are best served by virtual threads — namely, servers — that can benefit from

 it. Once we find such workloads we’ll be able to utilise time sharing.</div>

<div><br>

</div>

<div>In your example, the scheduler is able to keep all threads busy with work without blocking on the semaphore by just running some threads.</div>

<div><br>

</div>

<div>— Ron<br>

<div><br>

<blockquote type="cite">

<div>On 16 Apr 2023, at 06:30, Martin Traverso <<a href="mailto:mtraverso@gmail.com" target="_blank">mtraverso@gmail.com</a>> wrote:</div>

<br>

<div>

<div dir="ltr">Hi,<br>

<br>

First of all, I'd like to thank you for this feature! We've been eagerly awaiting it in the Trino project and we believe it will help us dramatically simplify many parts of the codebase.<br>

<br>

I've been playing around with virtual threads and I've noticed some odd behaviors. Given the following code:<br>

<br>

<font face="monospace">    import java.util.ArrayList;<br>

    import java.util.List;<br>

    import java.util.concurrent.ExecutionException;<br>

    import java.util.concurrent.Semaphore;<br>

    import java.util.concurrent.atomic.AtomicLong;<br>

<br>

    public class Test<br>

    {<br>

        public static void main(String[] args)<br>

                throws InterruptedException<br>

        {<br>

            int processors = Runtime.getRuntime().availableProcessors();<br>

<br>

            Semaphore semaphore = new Semaphore(processors, true);<br>

            List<AtomicLong> counters = new ArrayList<>();<br>

            for (int i = 0; i < 2 * processors; i++) {<br>

                AtomicLong counter = new AtomicLong();<br>

                counters.add(counter);<br>

                Thread.ofVirtual().start(() -> {<br>

                    while (true) {<br>

                        semaphore.acquireUninterruptibly();<br>

                        counter.incrementAndGet();<br>

                        semaphore.release();<br>

                    }<br>

                });<br>

            }<br>

<br>

            Thread.sleep(10_000);<br>

<br>

            counters.stream()<br>

                    .map(AtomicLong::get)<br>

                    .sorted()<br>

                    .forEach(System.out::println);<br>

        }<br>

    }</font><br>

<br>

I would expect the counts to be approximately equal, but I'm getting the following result:<br>

<br>

    0<br>

    0<br>

    0<br>

    0<br>

    0<br>

    0<br>

    0<br>

    0<br>

    0<br>

    0<br>

    2435341<br>

    2448274<br>

    2466202<br>

    2497258<br>

    2539030<br>

    2572744<br>

    2592871<br>

    2611658<br>

    2651392<br>

    2657913<br>

<br>

If I change the number of permits for the semaphore to a value smaller than the number of processors, then the results come out as expected. It also works as expected if I change the core loop to make a call to Thread.yield() on the first iteration:<br>

<br>

<font face="monospace">    while (true) {<br>

        semaphore.acquireUninterruptibly();<br>

        if (counter.incrementAndGet() == 1) {<br>

            Thread.yield();<br>

        }<br>

        semaphore.release();<br>

    }<br>

</font><br>

<br>

If I place a call to Thread.yield() after the semaphore.release() call, then all the threads make some progress, but the values are still unbalanced:<br>

<br>

<font face="monospace">    while (true) {<br>

        semaphore.acquireUninterruptibly();<br>

        counter.incrementAndGet();<br>

        semaphore.release();<br>

        Thread.yield();<br>

    }</font><br>

<br>

    196257<br>

    196257<br>

    196258<br>

    196260<br>

    196260<br>

    196260<br>

    196261<br>

    196261<br>

    401737<br>

    401740<br>

    401744<br>

    401757<br>

    1644985<br>

    1651301<br>

    1677466<br>

    1683009<br>

    1694577<br>

    1702710<br>

    1710970<br>

    1843037<br>

<br>

I'm running the following version of the JDK on an Macbook Pro with an M1 Max CPU:<br>

<br>

openjdk version "20" 2023-03-21<br>

OpenJDK Runtime Environment Zulu20.28+85-CA (build 20+36)<br>

OpenJDK 64-Bit Server VM Zulu20.28+85-CA (build 20+36, mixed mode, sharing)<br>

<br>

I'm not sure if this is a bug or if I'm misunderstanding how virtual threads are supposed to work. Any help or clarification would be greatly appreciated!<br>

<br>

Thanks!<br>

- Martin<br>

<div>

<div dir="ltr">

<div dir="ltr">

<div>

<div><br>

</div>

<div>----</div>

</div>

<div>

<div>Martin Traverso</div>

<div>Co-founder @ Trino Software Foundation, Co-creator of Presto and Trino (<a href="https://trino.io/" target="_blank">https://trino.io</a>)</div>

</div>

<div><br>

</div>

</div>

</div>

</div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div></blockquote></div></blockquote></div>

</blockquote></div>

</div></blockquote></div></blockquote></body></html>