Project Loom technical questions

Sat Jul 31 13:30:36 UTC 2021

P.S.

I think my answer to your last question wasn’t clear, so let me try and clarify.

Virtual threads are intended to be used in cases where replacing a virtual thread with an
OS thread is infeasible due to the limitation of OS threads, so the answer to how a particular
metric of virtual threads compares to that of OS threads in situations where OS threads do not
work does not have an answer. I cannot know whether and by how much the jitter of a timed sleep
differs when running a hundred thousand threads in a normal, real-world server application differs 
between virtual and OS threads, because I cannot run that many OS threads in a normal server 
application.

There might be answer to that question when running fifty or a hundred threads, but that is not a 
primary use case for virtual threads at this point in time, so we haven’t tried measuring that
just yet.

— Ron

> On 31 Jul 2021, at 14:19, Ron Pressler <ron.pressler at oracle.com> wrote:
> 
> Hi.
> 
>> On 31 Jul 2021, at 13:35, Ignaz Birnstingl <ignazb at gmail.com> wrote:
>> 
>> Hello,
>> 
>> I have some questions regarding the motivation and some implementation specifics. I would appreciate if someone finds the time to answer.
>> 
>> 1. The Proposal document [1] says that Fibers are more light-weight than kernel threads. I assume this means with regards to both memory and CPU footprint. What costs so much memory about a kernel thread? I assume the kernel needs some data structures to manage it (a couple of KB?), the stack and on the JVM side also some data structures, and the thread-local allocation buffers (TLABs).
>> With fibers you don't need the kernel data structures and the TLABs. Fibers still need the stack and JVM data structures.
>> Assuming the stacks could be managed similar to TLABs in that they start out really small and grow dynamically and the TLABs would/could be core-local allocation buffers instead, is there any significant memory overhead of kernel threads left?
> 
> The default stack size for platform threads in Java is 1 MB on Linux and Mac. Kernel threads cannot resize 
> their stack because they do not know how it’s used by the language; user-mode threads can. Loom’s virtual 
> threads automatically grow and shrink depending on how much stack is currently used. TLABs are unrelated, 
> and are associated with the OS threads internally used by the Java runtime rather than with virtual threads.
> 
> 
>> 
>> 2. Regarding scheduling: According to the user-level threads video [2] kernel scheduling is mainly costly because of the kernel scheduler doing its thing as opposed to context switches (which also occur when you do user-level scheduling).
>> I assume you are confident that the ForkJoinPool performs better than the kernel scheduler?
> 
> 
> The cost of scheduling itself does not matter as much as footprint for throughput. See here
> for an explanation: https://inside.java/2020/08/07/loom-performance/
> 
> However, in addition for the context-switch through the kernel, the kernel scheduler needs to balance
> many different kind of thread behaviour, while virtual threads allow choosing different scheduling algorithms
> for different workloads. For server-side transaction processing workloads, the primary use-case for virtual
> threads, a work-stealing scheduler is a good fit, which is why it is the default scheduler. Its particular
> details may likely change to be better tuned for virtual threads. 
> 
>> 
>> 3. Regarding timed sleeps: Since the ForkJoinPool uses FIFO scheduling - would that imply some changes regarding sleeps, for example that under high-load scenarios sleeping fibers could wake up "later" on average than when using kernel threads?
>> 
> 
> 
> ForkJoinPool uses either FIFO or LIFO scheduling, depending on its setting. The default virtual thread
> scheduler uses a ForkJoinPool in LIFO mode.
> 
> The jitter for virtual threads is hard to predict, but unless you’re using a realtime kernel, it’s not
> very stable for kernel threads, either, depending on the load. OpenJDK might also add additional jitter
> on top of that of the OS.
> 
> Virtual threads are optimised for throughput where you have between thousands and a few million threads.
> If your application performs less than, say, 500 concurrent operations, virtual threads are unlikely
> to help, and at this stage, the question of whether or not they might hurt some metric is of little interest, 
> as that is not their intended use.
> 
> For very latency-critical and jitter-sensitive applications, consider using a very small number of OS threads, 
> a realtime kernel, and a realtime implementation of Java or a language like C. Those optimise for worst-case 
> latency at the expense of throughput. In some cases, a well-tuned OS running a well-tuned OpenJDK JVM and a 
> carefully crafted Java application might be sufficient, but these cases also involve relatively low concurrency 
> and are not the focus of Loom.
> 
>> -- 
>> Many thanks,
>> Ignaz
>> 
>> [1] https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.html
>> [2] https://www.youtube.com/watch?v=KXuZi9aeGTw
> 
> — Ron
>