[External] : Re: : Re: Late cleanup of stack objects

Fri Nov 11 17:36:20 UTC 2022

As JEP 425 explains, it is hard to generally compare the number of objects or the amount of memory virtual threads require compared to callbacks, with or without consideration for compiler optimisation. Also, as it always has, compiler behaviour can significantly affect heap consumption with or without virtual threads. While the JIT doesn’t know which code retains more memory, code that runs more is more likely to allocate more, and that code is also more likely to be compiled.

The assumption that the most impactful code is compiled is pretty common in the design of OpenJDK features. Because virtual threads are assumed to be numerous, the assumption that almost all of them will be running compiled code almost all of the time seems more than reasonable, but if you can find a realistic workload where this doesn’t happen we’ll certainly be interested.

— Ron

On 11 Nov 2022, at 15:55, Arnaud Masson <arnaud.masson at fr.ibm.com<mailto:arnaud.masson at fr.ibm.com>> wrote:

What Virtual Threads change is that it’s now considered as cheap to block a thread.
If a blocked thread stack retains more unused objects than the async approach (until JIT threshold at least), that’s not as cheap as it could.

Of course, if we assume that memory sensitive code will be always quickly JIT-ed (and assuming any JIT is aggressive about stack cleanup), there is no problem. (Is this assumption true?...)

I don’t think stack-based allocation impact much here because typically this would be small objects that would be flattened on the stack, no? so no big surprise.

Thanks
Arnaud

>But that’s just how HotSpot works, and virtual threads didn’t change that. There are other memory-related optimisations in the JIT, too. For example, every `new Foo`, when running in the interpreter, will allocate memory in the heap. But when compiled, the object could be allocated on the stack (scalar-replaced) or even not at all (which means there can be cases where JITted code consumes zero heap memory, while the same code, when interpreted, consumes an unbounded amount, i.e. will OOME at any heap size). So the memory profile of Java applications running on OpenJDK can be very different depending on JIT behaviour. This might become even more pronounced with Valhalla.
>
>In this case, the behaviour has nothing to do with virtual threads or with “capture.” The optimising compiler simply analyses the method and sees that the bigBuffer variable is unused after the call to slowIO so it optimises it away (and with the local gone, the GC will not find the buffer and it will be collected).
>
>If the interpreter were to perform this or other similar optimisations based on analysing the method before interpreting it, then it wouldn’t be doing its job as the mechanism that quickly starts execution or the mechanism used when debugging. Even when some optimisations are possible in the interpreter, it’s rarely worth it to invest much time in that because the interpreter runs little code over the program’s overall execution.
>
>It does, however, mean that microbenchmarks need to be done carefully.
>
>— Ron
Unless otherwise stated above:

Compagnie IBM France
Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 664 069 390,60 €
SIRET : 552 118 465 03644 - Code NAF 6203Z

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20221111/1dcfbd17/attachment.htm>