: Re: Project Loom VirtualThreads hang

Arnaud Masson arnaud.masson at fr.ibm.com
Fri Jan 6 20:03:07 UTC 2023


Sure.
Servers with both IO-bound and CPU-bound requests are what I see in the real world. šŸ˜Š

Thanks
Arnaud

Over the years weā€™ve been working on virtual threads, weā€™ve made lots of simulations. But the features we have are not features that address artificial simulations, but only those of them that correspond to real problems our real users face in the real world. What we need to improve things arenā€™t more simulations, but reports from real systems (or simulations that try to mimic behaviour observed in a real system).

ā€” Ron


On 6 Jan 2023, at 19:45, Arnaud Masson <arnaud.masson at fr.ibm.com<mailto:arnaud.masson at fr.ibm.com>> wrote:

I canā€™t see how stop-the-world effect can be avoided once all your carriers are busy with non-switchable CPU-bound tasks. Maybe Iā€™m missing something šŸ˜Š
Not very different from other pinning problems (JNI...), except the argument that 100% CPU usage should never occur so not a problem.

I will try to make some test to simulate and post the result here.

Thanks
Arnaud


I donā€™t think that increasing the schedulerā€™s parallelism would help, nor do I think youā€™d see a ā€œstop-the-worldā€, but again, these hypotheses are just not actionable. Thereā€™s nothing we can do to address them. When you find a problem, please report it and weā€™ll investigate what can be done.



ā€” Ron



On 6 Jan 2023, at 19:11, Arnaud Masson <arnaud.masson at fr.ibm.com<mailto:arnaud.masson at fr.ibm.com>> wrote:

I donā€™t think having 100% CPU usage on a pod is enough to justify a ā€œstop-the-worldā€ effect on Loom scheduling for the other tasks.
Also 100% is the extreme case, but there can be 75% CPU usage, meaning only 1 carrier left for all other tasks in my example.

Again not a blocker I guess, just have to increase the carrier count to mitigate, but itā€™s good old native thread sizing again where it should not be really needed.

ā€œTime-sharing would make those expensive tasks complete in a lot more than 10 secondsā€:
I understand there would be switching overhead (so itā€™s slower), but I donā€™t understand why it would be much slower if there are few of them like in my example.

thanks
Arnaud



Unless otherwise stated above:

Compagnie IBM France
SiĆØge Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 664 069 390,60 ā‚¬
SIRET : 552 118 465 03644 - Code NAF 6203Z


Unless otherwise stated above:

Compagnie IBM France
SiĆØge Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 664 069 390,60 ā‚¬
SIRET : 552 118 465 03644 - Code NAF 6203Z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230106/d306f67f/attachment-0001.htm>


More information about the loom-dev mailing list