Thread scheduling imbalance / starvation

Robert Engels rengels at ix.netcom.com
Sun Apr 16 20:16:56 UTC 2023


I agree. With virtual threads I believe the design of Truno could be simplified. Simply start tasks to handle requests, maybe pass/block to other IO handlers. 

Just let them loose! The context switching is negligible. The only time this breaks down is with long running cpu only tasks - then you need to periodically yield. 

> On Apr 16, 2023, at 3:10 PM, Patrick Bolden <boldenpm at gmail.com> wrote:
> 
> 
> Forgive my naivety, but isn't the whole point of virtual threads to abstract the number of physical cores?  I'm just trying to understand the reasoning behind the use case.
> 
>> On Sun, Apr 16, 2023 at 2:07 PM Martin Traverso <mtraverso at gmail.com> wrote:
>> Hi Robert and Ron,
>> 
>> Thanks for your replies.
>> 
>> > Why would you expect or want this to be balanced? If the threads do IO it should naturally balance. If they are all cpu bound balancing is needed for fairness of requesting handling.
>> 
>> I would expect it to be balanced because:
>> * There's a fair semaphore that limits concurrency
>> * There are more virtual threads than semaphore permits
>> * Some of the threads are "presumably" blocked in the call to acquire() and would take turns to unblock, fairly
>> 
>> From what Ron described above, my understanding is that some threads don't even get a chance to start since the first N threads keep looping with no contention (acquiring and releasing the semaphore) and thereby preventing the second N threads from ever being scheduled.
>> 
>> If I add a call to Thread.yield() or Thread.sleep() for 1 ns just *once* before the loop while holding the semaphore, then it works as I would expect. My hypothesis is that that causes the scheduler to pick one of the threads that has been waiting to start to run, which subsequently makes it to the acquire() call and at some point it becomes a contended acquire. The fair semaphore does its job from then on.
>> 
>>     Thread.ofVirtual().start(() -> {
>>         semaphore.acquireUninterruptibly();
>>         Thread.yield();
>>         semaphore.release();
>> 
>>         while (true) {
>>             semaphore.acquireUninterruptibly();
>>             counter.incrementAndGet();
>>             semaphore.release();
>>         }
>>     });
>> 
>> Ron, let me describe our use case. I hope it helps inform future directions for this feature.
>> 
>> Trino (https://trino.io) is a distributed SQL engine. Queries are decomposed into tasks that run in a cluster of workers. Each task performs a series of transformations (filtering, computing new columns, aggregations, etc). In an ideal world, we would model this as a series of nested loops, similar to how you'd implement the equivalent of a Java Stream pipeline using traditional imperative code. The problem with this approach is that these tasks can take a long time to complete. We need to be able to handle more tasks than there are available processors and share time among them.
>> 
>> To do this, we implemented a cooperative multitasking framework, where each of the tasks does a bit of work and then relinquishes control. A scheduler within each of the workers the decides which task to run next based on a prioritization scheme. This is all very unnatural and complex, and it prevents certain optimizations by forcing the actions within the tasks to have explicit boundaries, materialize intermediate data structures before giving up control, etc.
>> 
>> We're hoping that virtual threads will allow us to simplify all of this. We're also hoping that someday we'll be able to control the scheduling policies to be able to implement our own prioritization scheme -- although we have some ideas on how to work around this limitation for now.
>> 
>> - Martin
>> 
>> 
>> 
>>> On Sun, Apr 16, 2023 at 7:55 AM Robert Engels <rengels at ix.netcom.com> wrote:
>>> Hi Martin,
>>> 
>>> Why would you expect or want this to be balanced? If the threads do IO it should naturally balance. If they are all cpu bound balancing is needed for fairness of requesting handling. 
>>> 
>>> This has been brought up a few times. Small tasks can be blocked for a long time behind long cpu bound tasks. The only solution is to periodically yield() those tasks. 
>>> 
>>>>> On Apr 16, 2023, at 9:40 AM, Ron Pressler <ron.pressler at oracle.com> wrote:
>>>>> 
>>>>  Hi.
>>>> 
>>>> What you’re seeing is the result of the virtual thread scheduler not employing time sharing. That is because we have yet to identify workloads, especially those that are best served by virtual threads — namely, servers — that can benefit from it. Once we find such workloads we’ll be able to utilise time sharing.
>>>> 
>>>> In your example, the scheduler is able to keep all threads busy with work without blocking on the semaphore by just running some threads.
>>>> 
>>>> — Ron
>>>> 
>>>>> On 16 Apr 2023, at 06:30, Martin Traverso <mtraverso at gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> First of all, I'd like to thank you for this feature! We've been eagerly awaiting it in the Trino project and we believe it will help us dramatically simplify many parts of the codebase.
>>>>> 
>>>>> I've been playing around with virtual threads and I've noticed some odd behaviors. Given the following code:
>>>>> 
>>>>>     import java.util.ArrayList;
>>>>>     import java.util.List;
>>>>>     import java.util.concurrent.ExecutionException;
>>>>>     import java.util.concurrent.Semaphore;
>>>>>     import java.util.concurrent.atomic.AtomicLong;
>>>>> 
>>>>>     public class Test
>>>>>     {
>>>>>         public static void main(String[] args)
>>>>>                 throws InterruptedException
>>>>>         {
>>>>>             int processors = Runtime.getRuntime().availableProcessors();
>>>>> 
>>>>>             Semaphore semaphore = new Semaphore(processors, true);
>>>>>             List<AtomicLong> counters = new ArrayList<>();
>>>>>             for (int i = 0; i < 2 * processors; i++) {
>>>>>                 AtomicLong counter = new AtomicLong();
>>>>>                 counters.add(counter);
>>>>>                 Thread.ofVirtual().start(() -> {
>>>>>                     while (true) {
>>>>>                         semaphore.acquireUninterruptibly();
>>>>>                         counter.incrementAndGet();
>>>>>                         semaphore.release();
>>>>>                     }
>>>>>                 });
>>>>>             }
>>>>> 
>>>>>             Thread.sleep(10_000);
>>>>> 
>>>>>             counters.stream()
>>>>>                     .map(AtomicLong::get)
>>>>>                     .sorted()
>>>>>                     .forEach(System.out::println);
>>>>>         }
>>>>>     }
>>>>> 
>>>>> I would expect the counts to be approximately equal, but I'm getting the following result:
>>>>> 
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     0
>>>>>     2435341
>>>>>     2448274
>>>>>     2466202
>>>>>     2497258
>>>>>     2539030
>>>>>     2572744
>>>>>     2592871
>>>>>     2611658
>>>>>     2651392
>>>>>     2657913
>>>>> 
>>>>> If I change the number of permits for the semaphore to a value smaller than the number of processors, then the results come out as expected. It also works as expected if I change the core loop to make a call to Thread.yield() on the first iteration:
>>>>> 
>>>>>     while (true) {
>>>>>         semaphore.acquireUninterruptibly();
>>>>>         if (counter.incrementAndGet() == 1) {
>>>>>             Thread.yield();
>>>>>         }
>>>>>         semaphore.release();
>>>>>     }
>>>>> 
>>>>> 
>>>>> If I place a call to Thread.yield() after the semaphore.release() call, then all the threads make some progress, but the values are still unbalanced:
>>>>> 
>>>>>     while (true) {
>>>>>         semaphore.acquireUninterruptibly();
>>>>>         counter.incrementAndGet();
>>>>>         semaphore.release();
>>>>>         Thread.yield();
>>>>>     }
>>>>> 
>>>>>     196257
>>>>>     196257
>>>>>     196258
>>>>>     196260
>>>>>     196260
>>>>>     196260
>>>>>     196261
>>>>>     196261
>>>>>     401737
>>>>>     401740
>>>>>     401744
>>>>>     401757
>>>>>     1644985
>>>>>     1651301
>>>>>     1677466
>>>>>     1683009
>>>>>     1694577
>>>>>     1702710
>>>>>     1710970
>>>>>     1843037
>>>>> 
>>>>> I'm running the following version of the JDK on an Macbook Pro with an M1 Max CPU:
>>>>> 
>>>>> openjdk version "20" 2023-03-21
>>>>> OpenJDK Runtime Environment Zulu20.28+85-CA (build 20+36)
>>>>> OpenJDK 64-Bit Server VM Zulu20.28+85-CA (build 20+36, mixed mode, sharing)
>>>>> 
>>>>> I'm not sure if this is a bug or if I'm misunderstanding how virtual threads are supposed to work. Any help or clarification would be greatly appreciated!
>>>>> 
>>>>> Thanks!
>>>>> - Martin
>>>>> 
>>>>> ----
>>>>> Martin Traverso
>>>>> Co-founder @ Trino Software Foundation, Co-creator of Presto and Trino (https://trino.io)
>>>>> 
>>>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230416/f25bcd7e/attachment.htm>


More information about the loom-dev mailing list