<div dir="ltr"><div class="gmail_default" style="font-size:small"><span id="gmail-docs-internal-guid-b48f9181-7fff-5f32-e9ba-7b6ecaa18d4a"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Hello,</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">This email reports our observations around Loom, mainly in the context of Quarkus. It discusses the current approach and our plans.  </span><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">We are sharing this information on our current success and challenges with Loom. Please let us know your thoughts and questions on our approach(es).</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Context</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Since the early days of the Loom project, we have been looking at various approaches to integrate Loom (mostly virtual threads) into Quarkus. Our goal was (and still is) to dispatch </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">processing </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">(HTTP requests, Kafka messages, gRPC calls) on virtual threads. Thus, the user would not have to think about blocking or not blocking (more on that later as it relates to the Quarkus architecture) and can write synchronous code without limiting the application's concurrency.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">To achieve this, we need to dispatch the processing on virtual threads but also have compliant </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">clients</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> to invoke remote services (HTTP, gRPC…), send messages (Kafka, AMQP), or interact with a data store (SQL or NoSQL).</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Quarkus Architecture</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Before going further, we need to explain how Quarkus is structured. Quarkus is based on a reactive engine (Netty + Eclipse Vert.x), so under the hood, Quarkus uses event loops to schedule the workloads and non-blocking I/O. There is also the possibility of using Netty Native Transport (epoll, kqueue, io_uring). </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">processing </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">can be either directly dispatched to the event loop or on a worker thread (OS thread). In the first case, the code must be written in an asynchronous and non-blocking manner. Quarkus proposes a programming model and safety guards to write such a code. In the latter case, the code can be blocking. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Quarkus decides which dispatching strategy it uses for each </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">processing job</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">. The decision is based on the method signatures and annotations (for example, the user can force it to be called on an event loop or a worker thread). </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">When using a worker thread, the request is received on an event loop and dispatched to the worker thread, and when the response is ready to be written (when it fits in memory), Quarkus switches back to the event loop.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The current approach</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The integration of Loom's virtual threads is currently based[1] on a new annotation (@RunOnVirtualThread). It introduces a third dispatching strategy, and methods annotated with this annotation are called on a virtual thread. So, we now have three possibilities:</span></p><br><ul style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Execute the processing on an event loop thread - the code must be non-blocking </span></p></li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Execute the processing on an OS (worker) thread - with the thread cost and concurrency limit</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Execute the processing on a virtual thread</span></p></li></ul><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The following snippet shows an elementary example:</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">@GET</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">@Path("/loom")</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">@RunOnVirtualThread</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Fortune example() {</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">     var list = repo.findAll();</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">     return pickOne(list);</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">}</span></p><br><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">This support is already experimentally available in Quarkus 2.10.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Previous attempts</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The current approach is not our first attempt. We had two other approaches that we discarded,  while the second one is something we want to reconsider.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">First Approach - All workers are virtual threads</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The first approach was straightforward. The idea was to replace the worker (OS) threads with Virtual Threads. However, we quickly realized some limitations. Long-running (purely CPU-bound) processing would block the carrier thread as there is no preemption. While the user should be aware that long-running processing should not be executed on virtual threads, in this model, it was not explicit. We also started capturing carrier thread pinning situation (our current approach still has this issue, we will explain our bandaid later). </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Second Approach - Marry event loops and carrier threads</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Quarkus is designed to reduce the memory usage of the application. We are obsessed with RSS usage, especially when running in a container where resources are scarce. It has driven lots of our architecture choices, including the dimensioning strategies (number of event loops, number of worker threads…). </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Thus, we investigated the possibility of avoiding having a second carrier thread pool and reducing the number of switches between the event loops and the carrier threads. We tried to use Netty event loops as carrier threads to achieve this. We had to use private APIs (which used to be public at some point in early access builds) to implement such an approach [3]. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Unfortunately, we quickly ran into issues (explaining why our method is not part of the public API). Typically we had deadlock situations when a carrier thread shared locks with virtual threads. This made it impossible to use event-loops as carriers considering the probability of lock sharing.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">That custom scheduling strategy also prevents work stealing (Netty event loops do not handle work stealing) and must keep a strict ordering between I/O tasks.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Pros and Cons of the current approach</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Our current approach (based on @RunOnVirtualThread) integrates smoothly with the rest of Quarkus (even if the integration is limited to the HTTP part at that moment, as the integration with Kafka and gRPC are slightly more complicated but not impossible). </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The user's code is written synchronously, and the users are aware of the dispatching strategy. Due to the limitation mentioned before, we still believe it's a good trade-off, even if not ideal. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">However, the chances of pinning the carrier threads are still very high (</span><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(32,33,36);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">caused by pervasive usage in the ecosystem of certain common JDK features - synchronized, JNI, etc.)</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">. Because we would like to reduce the number of carrier threads to the bare minimum (to limit the RSS usage), we can end up with an underperforming application, which would have a concurrency level lower than the </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">classic </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">worker thread approach with pretty lousy response times. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The Netty / Loom dance</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">We implemented a bandaid to reduce the chance of pinning while not limiting the users to a small set of Quarkus APIs. Remember, Quarkus is based on a reactive core, and most of our APIs are available in two forms:</span></p><ul style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">An imperative form blocking the caller thread when dealing with I/O</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">A reactive form that is non-blocking (reactive programming)</span></p></li></ul><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">To avoid thread pinning when running on a virtual thread, we offer the possibility to use the reactive form of our APIs but block the virtual thread while the result is still being computed. These </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">awaits</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> do not block the carrier thread and can be used with API returning 0 or one result, but also with streams (like Kafka topics) where you receive an iterator. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">As said above, this is a band-aid until we have non-pinning clients/APIs.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Under the hood, there is a dance between the virtual thread and the netty event loop (used by the reactive API). It introduces a few unfortunate switches but workaround the pinning issue.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Observations</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Over the past year, we ran many tests to design and implement our integration. The current approach is far from ideal, but it works fine. </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">We have excellent results when we compared with a full reactive approach and a worker approach (Quarkus can have the three variants in the same app).</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The response time under load is close enough to the reactive approach. It is far better than the classic worker thread approach [1][2].</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">However (remember we are obsessed with RSS), the RSS usage is very high. Even higher than the worker thread approach. At that moment, we are investigating where these objects come from. We hope to have a better understanding after the summer. Our observations show that the performance penalty is likely due to memory consumption (and GC cycles). However, as said, we are still investigating.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Ideally</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">For us (Quarkus) and probably several other Java frameworks based on Netty, it would be terrific if we could find a way to reconcile the two scheduling strategies (in a sense, we would use the event loops as carrier thread). Of course, there will be trade-offs and limitations. Our initial attempt didn't end well, but that does not mean it's a dead end. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">An </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">event-loop carrier thread</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> would greatly benefit the underlying reactive engine (Netty/Vert.x in the case of Quarkus). It retains some event-loop execution semantics: code is multithreaded (in the virtual thread meaning) yet executed with a single carrier thread that respects the event-loop principles and shall have decent mechanical sympathy. In addition, it should enable using classic blocking constructs (e.g., </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">java.util.lock.Lock</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">), whereas currently, it can only block on Vert.x (e.g., a Vert.x futures but not </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">java.util.lock.Lock</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">) as Vert.x needs to be aware of the thread suspension to schedule event dispatching in a race-free / deadlock-free manner.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">With such integration, virtual threads would be executed on the event loop. When they "block", they would be unmounted, and I/O or another virtual thread would be processed. That would reduce the number of switches between threads, reduce RSS usage, and allow lots of Java frameworks to leverage Loom virtual threads quickly. Of course, this approach can only be validated empirically. Typically, it adds latency to every virtual thread dispatch. In addition, watchdogs would need to be implemented to prevent (or at least warn the user) the execution of long CPU-intensive actions that do not yield in an acceptable time.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Conclusion</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Our integration of Loom virtual threads in Quarkus is already available to our users, and we will be collecting feedback.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">As explained in this email, we have thus identified </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">two issues</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The first one is purely about performance, and we were able to measure it empirically: the interaction between Loom and the Netty/Vert.x reactive stack seems to create an abundance of data structures that put pressure on the GC and degrade the overall performance of the application. As said above, we are investigating.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The second one is more general and also impacts </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">programming with Quarkus/Vert.x Loom. </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">The goal is to reconcile the scheduling strategies of Loom and Netty/Vert.x. This could improve performance by decreasing the number of context switches (Loom-Netty dance) and the RSS of an application. Moreover, it would enable the use of classic blocking constructs in Vert.x </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">directly</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> -i.e., without wrapping them in Vert.x own abstractions). We could not validate and/or characterize the performance improvement of such a model yet. The result is unclear as we don’t know if the decrease in context switches would be outweighed by the additional latency in virtual threads dispatch.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">We are sharing this information on our current success and challenges with Loom.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> Please let us know your thoughts and concerns on our approach(es). Thanks!</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Clement</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[1] - <a href="https://developers.redhat.com/devnation/tech-talks/integrate-loom-quarkus">https://developers.redhat.com/devnation/tech-talks/integrate-loom-quarkus</a></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[2] - <a href="https://github.com/anavarr/fortunes_benchmark">https://github.com/anavarr/fortunes_benchmark</a></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[3] - <a href="https://github.com/openjdk/loom/commit/cad26ce74c98e28854f02106117fe03741f69ba0">https://github.com/openjdk/loom/commit/cad26ce74c98e28854f02106117fe03741f69ba0</a></span></p></span><br class="gmail-Apple-interchange-newline"></div></div>