Project Loom VirtualThreads hang

Wed Jan 4 04:59:35 UTC 2023

For some further data points. Using VthreadTest here <https://github.com/robaho/vthread_test> (essentially a message passing system):

With 32 producers, 32 consumers, 500k messages on an 4/8 core machine:

1a. native threads w ABQ: 60-65% system cpu, 20% user cpu, 15-20% idle, total time 189 seconds
1b. vthreads w ABQ: 5-10% system cpu, 75% user cpu, 15% idle, total time 63 seconds
2a. native threads w RingBuffer, spin=1: 70% system cpu, 30% user cpu, 0% idle, total time 174 seconds
2b. vthreads w RingBuffer, spin=1: 13% system cpu, 85% user, 2% idle, total time 37 seconds
3a. native threads w RingBuffer, spin=32: 68% system cpu, 30% user cpu, 2% idle, total time 164 seconds
3b. vthreads w RingBuffer, spin=32: 13% system cpu, 85% user, 3% idle, total time 40 seconds

(ABQ is stdlib ArrayBlockingQueue)

The above times have a lot of variance which is not fully accounted for but the interesting thing is that the RingBuffer makes such a huge difference between 1 & 2.

Even in 2b, there is 13% taken up by the OS - I assume due to thread switching as there is no IO in the test, which means the scheduling can probably be improved.

I would expect a green thread system to approach 0% idle and 0% system utilization in this type of test. I am “fairly cetain” the code should able to use all carrier threads 100%. Maybe the system % is going to something else? (You can use the SpinTest - comment out the println - and see that 100% cpu bound “do nothing” test that allocates no objects still uses more than 25% system cpu - which seems odd).

Here is a async profile capture of 3b:

Notice that the vast majority of the time is used in internal context switching.

I can “fairly agree” with the project’s stated bias towards server systems with 1000’s of threads (I do think 1000’s of threads is enough vs. millions of threads), but I hope this can be addressed moving forward. I think the CSP (communicating sequential processes) model (close to Actor model) simplifies a lot of concurrent programming concerns but it requires highly efficient context switching and queues to work well.

> On Jan 3, 2023, at 4:16 PM, Ron Pressler <ron.pressler at oracle.com <mailto:ron.pressler at oracle.com>> wrote:
> 
> 
> 
>> On 3 Jan 2023, at 19:24, thurston N <thurston.nomagicsoftware at gmail.com <mailto:thurston.nomagicsoftware at gmail.com>> wrote:
>> 
>> Hello Ron,
>> 
>> Quoting:
>> "Virtual threads replace short individual *tasks* in your application, not platform threads."
>> 
>>  Why short?
>> 
>> Take a TCP server's accept loop.  I would have thought that's a natural target for execution in a virtual thread. 
>> 
>> And it's more or less intended to run forever.
> 
> A small number of virtual threads may run for a long time, but the vast majority would be short lived. They would naturally form a hierarchy: you’d have one or a couple of threads accepting connections that live forever, several thousand request-handling threads that live as long as a single request, and then tens of thousands or more each servicing a single outgoing service request in a fanout, and those would be even shorter-lived.
>> 
>> "They are best thought of as a business logic entity representing a task rather than an “execution resource.”"
>> 
>> That's a pretty good definition of an "actor" (as in actors-model).  But there's no (even implicit) restriction on the duration of 
>> an actor's lifetime.
>> Is it your intent to proscribe virtual threads as a possible implementation for actor-model type designs?
> 
> Threads are sequential task that is done concurrently with others. What protocols are used to communicate among threads (and the actor model is a communication protocol) is up to the programmer. An actor model is a bit too high-level for the JDK, but libraries may certainly offer actors based on virtual threads, just as others may build other high level constructs.
>>   
>> I'm thinking of simulations, e.g. modelling a migrating herd, bees at a honeycomb, a very busy traffic intersection, et al.
>> It seems natural (even elegant) to represent each of those entities (deer, bee, car) as a virtual thread, and executing them as a 
>> platform thread isn't an option (because of their sheer number, the essence of the problem that virtual threads is the solution for)
>> And I'm sure there are innumerable other circumstances that could benefit from LWP (as Erlang terms them)
>> IMO, it's awfully reductive to restrict practical uses of virtual threads to writing sequential single request/response scenarios 
>> 
> 
> My intent is not to restrict so much as to focus. Simulations are absolutely a great use-case for lightweight threads, but there are implementation choices that may need to balance some desirable properties with others and could result in virtual threads being a better fit for some use-cases than others. Because there are significantly more Java servers than Java simulations, and because servers are normally more complex (in terms of failure modes and architectural graph) and their developers are more mainstream and could benefit from more help, *if* there’s some prioritisation to be made, the server use-case is the one we prioritise over others by addressing it *first*. More people are interested in that use-case, that’s where virtual threads can contribute the most value, and that’s why it’s our *initial* focus.
> 
> — Ron

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230103/a9ccd334/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 289575 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230103/a9ccd334/PastedGraphic-3-0001.png>