Jetty and Loom

Tue Jan 5 15:48:36 UTC 2021

A stack is only allocated when the thread blocks, but the stack is
not really one object, but multiple ones, each containing a portion
of stack frames. Whether and when they are reused depends on some careful
interaction with the GC and whether or not the particular stack chunk object
currently resides in a GC region that requires barriers. The size and 
reuse of those objects can also depend on the behaviour of the particular
thread — the way it loops over some operation, for example, and there 
are some clever optimisations that we’re planning to do concerning that.

The particular representation, as well as reuse characteristics, also
depends on whether or not the stack contains interpreted frames or
only compiled ones.

I hope to give a talk about the implementation at some point, but
the implementation now is quite different from what it was a year
ago, and it is certainly possible it will change again, either before
or after release (or maybe both).

— Ron

On 5 January 2021 at 15:18:47, Mike Rettig (mike.rettig at gmail.com (mailto:mike.rettig at gmail.com)) wrote:

> > And no, it is not only not obvious but downright wrong that moving stacks from
> > C stacks to the Java heap increases GC work assuming there is actual real
> > work in the system, too.
>
> Are stacks for virtual threads allocated at virtual thread creation or only if a virtual thread is blocked and needs to be unmounted from the carrier thread? Is the virtual thread "stack" not really a stack, but just a temporary storage location for a copy of the native stack?
>
> Mike
>
> On Tue, Jan 5, 2021 at 7:47 AM Ron Pressler wrote:
> > Both the 4% CPU increase and GC pauses (I’ll get to what I mean later) are
> > bugs that we’ll try to fix. Especially the GC interaction uses code that is
> > currently at constant flux and is known to be suboptimal. I’m genuinely happy
> > that you’ve reported those bugs, but bugs are not limitations of the model.
> >
> > Having said that, the interesting thing I see in the GC behaviour may not be
> > what you think is interesting. I don’t think the deep-stack test actually
> > exposed a problem of any kind, because when two things have slightly different
> > kinds of overhead, you can easily reduce the actual work to zero, and
> > make the impact of overhead as high as you like, but that’s not interesting
> > for real work. I could be wrong, but I’ll give it another look.
> >
> > The one thing in the posts — and thank you for them! — that immediately flashed
> > in blinking red to me as some serious issue is the following:
> >
> > Platform:
> >
> > Elapsed Time: 10568 ms
> > Time in Young GC: 5 ms (2 collections)
> > Time in Old GC: 0 ms (0 collections)
> >
> > Virtual:
> >
> > Elapsed Time: 10560 ms
> > Time in Young GC: 23 ms (8 collections)
> > Time in Old GC: 0 ms (0 collections)
> >
> > See that increase in young collection pause? That is the one thing that
> > actually touches on some core issue re virtual-threads’ design (they interact
> > with the GC and GC barriers in a particular way that could change the young
> > collection), and might signify a potentially serious bug.
> >
> > And no, it is not only not obvious but downright wrong that moving stacks from
> > C stacks to the Java heap increases GC work assuming there is actual real
> > work in the system, too. GCs generally don’t work like people imagine they do.
> > The reason I said that GC work might be reduced is because of some internal
> > details: the virtual thread stacks is mutated in a special way and at a special
> > time so that it doesn’t require GC barriers; this is not true for Java objects
> > in general.
> >
> > I’m reminded that about a year ago, I think, I saw a post about some product
> > written in Java. The product appears to be very good, but the post said
> > something specific that induced a face-palm. They said that their product is
> > GC “transparent” because they do all their allocations upfront. I suggested
> > that instead of just using Parallel GC they try with G1, and immediately
> > they came back, surprised, that they’d seen a 15% performance hit. The reason
> > is that allocations and mutation (and even reads) cause different work at
> > different times by different GC, and mutating one object at one specific time
> > might be more or less costly than allocating a new one, depending on the
> > GC and depending on the precise usage and timing of that particular object.
> >
> > The lesson is that trying to reverse engineer and out-think the VM is not
> > only futile — not only because there are too many variables but also because
> > the implementation is constantly changing — but it can result in downright bad
> > advice that’s a result of overfitting the advice to very particular circumstances.
> >
> > Instead, it’s important to focus on generalities. The goal of project Loom
> > is to make resource management around scheduling easy and efficient. When it
> > doesn’t do that, it’s a bug. I don’t agree at all with your characterisation
> > of what’s a limitation and what isn’t, but I don’t care: think of them however
> > you like. If you find bugs, we all win! But try harder, because I think you’ve
> > just scratched the surface.
> >
> > — Ron
> >
> >
> > On 5 January 2021 at 13:58:40, Greg Wilkins (gregw at webtide.com (mailto:gregw at webtide.com) (mailto:gregw at webtide.com)) wrote:
> >
> > >
> > > Ron,
> > >
> > > On Tue, 5 Jan 2021 at 13:19, Ron Pressler wrote:
> > > > If the listener might think it means that virtual
> > > > threads somehow *harm* the execution of CPU bound tasks, then it’s misleading.
> > > I've demonstrated (https://urldefense.com/v3/__https://github.com/webtide/loom-trial/blob/main/src/main/java/org/webtide/loom/CPUBound.java__;!!GqivPVa7Brio!M7gVGdjgN0hBofV52hMhQySIqxgmoHq9HlhG63v-LUCbnq63I7VVbwkfuC4c-kCx-g$) that using virtual threads can defer CPU bound tasks
> > > I've demonstrated (https://urldefense.com/v3/__https://webtide.com/do-looms-claims-stack-up-part-2/__;!!GqivPVa7Brio!M7gVGdjgN0hBofV52hMhQySIqxgmoHq9HlhG63v-LUCbnq63I7VVbwkfuC6tKeoB4Q$) that using virtual threads can double the CPU usage over pool kernel threads. Even their best usage in my tests has a 4% CPU usage increase.
> > >
> > > > The “additional load on GC” statement is not, I believe, demonstrated.
> > >
> > > I've demonstrated (https://urldefense.com/v3/__https://webtide.com/do-looms-claims-stack-up-part-1/__;!!GqivPVa7Brio!M7gVGdjgN0hBofV52hMhQySIqxgmoHq9HlhG63v-LUCbnq63I7VVbwkfuC45Vhd96A$) 1.5s GC pauses when using virtual threads at levels that kernel threads handle without pause.
> > >
> > > Besides, isn't it self evident that moving stacks from static kernel memory to dynamic heap is going to have additional GC load? You've even described how recycling virtual threads will not help reduce that additional load on the GC as a reason not to pool virtual threads!
> > >
> > > > It is tautologically true that if your use case does not benefit from virtual
> > > > threads then it does not benefit from virtual threads.
> > >
> > > Indeed, but half the time it is not clear that you acknowledge that there are use cases that are not suitable for virtual threads. Just paragraphs above you are implying that there is "no *harm*" to use virtual threads for CPU Bound tasks!
> > >
> > > > > Totally confused by the messaging from this project.
> > > > I’m confused by what you find confusing.
> > >
> > > This is not just a hive mind issue as the messaging just from you is inconsistent. One moment you are happy to describe limitations of virtual threads and agree that there are use cases that do benefit. Then the next moment we are back to "what limitations", "no harm" , "not demonstrated" etc..
> > >
> > > None of my demonstrations are fatal flaws. Some may well be fixable, whilst others are just things to note when making a thread choice. But to deny them just encourages dwelling on the negatives rather than the positives!
> > >
> > > cheers
> > >
> > > --
> > > Greg Wilkins CTO http://webtide.com (https://urldefense.com/v3/__http://webtide.com__;!!GqivPVa7Brio!IvQbdwncCw-mYNPnpcJbgTVvE-oz4uTb8KQUQvLd2OUXdoi1CR43f2nSbQqCnksBQw$) (https://urldefense.com/v3/__http://webtide.com__;!!GqivPVa7Brio!M7gVGdjgN0hBofV52hMhQySIqxgmoHq9HlhG63v-LUCbnq63I7VVbwkfuC6uPVrMnA$)