[External] : Re: Project Loom technical questions

Sun Aug 1 11:36:40 UTC 2021

Relying on virtual memory alone is insufficient. Once the memory is committed, it won’t be uncommitted, 
so usage grows, but doesn’t shrink, and all that’s done at page granularity. Once you add guard pages 
to prevent stack overflows, you’ll get close to 10GB of *committed* memory for 1M threads, and that’s 
before they do anything, and the memory would only grow monotonously from there.

Using virtual memory for a contiguous and fixed address space for each virtual thread could be doable 
once it’s managed by the runtime rather than the OS — and so could uncommit pages promptly, albeit 
through lots of interaction with the OS, and that’s something we briefly considered, but concluded it 
isn’t a high priority, so we haven’t explored it further.

— Ron

> On 31 Jul 2021, at 16:25, Ignaz Birnstingl <ignazb at gmail.com> wrote:
> 
> Hi Ron,
> 
> Thanks for replying!
> Questions 2. and 3. are answered.
> 
>>> The default stack size for platform threads in Java is 1 MB on Linux and Mac. Kernel threads cannot resize
>>> their stack because they do not know how it’s used by the language; user-mode threads can. Loom’s virtual
>>> threads automatically grow and shrink depending on how much stack is currently used. TLABs are unrelated,
>>> and are associated with the OS threads internally used by the Java runtime rather than with virtual threads.
> If your process starts a million threads then for each thread 1 MB of stack would be reserved in its address space. Since address space in 64 bit applications is big enough that should not be a problem.
> But since the memory would initially not be used this would not contribute to the process' RSS. Or at least it should not. So it should not contribute to the "memory usage" which is considered for memory limits in container environments.
> Therefore I would argue that the memory usage for stacks should be roughly the same for kernel threads and virtual threads.
> 
> Having one million TLABs would certainly have more memory overhead than - say - 8. That is where I see the biggest benefit of using virtual threads.
> But this problem could theoretically be mitigated with core-local allocation buffers: Instead of having allocation buffers per kernel thread these would have to be per CPU core. Of course that would mean that special care would have to be taken by the JVM if/when a thread gets moved to a different CPU core.
> 
> -- 
> Ignaz