<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">

Hi and thank you very much for your report!

<div class=""><br class="">

</div>

<div class="">It has been our experience as well that trying to marry an asynchronous engine with virtual threads is cumbersome and often wasteful. Writing the entire pipeline with simple blocking in mind gave us not only superior performance, but a much smaller

 and simpler codebase, and that would be the approach I’d recommend. I expect that there will soon be HTTP servers demonstrating that simple approach. However, if you wish to use an existing async engine, I think the approach you’ve taken — spawning/unblocking

 a virtual thread running in the virtual thread scheduler — is probably the best one.</div>

<div class=""><br class="">

</div>

<div class="">Integrating explicit scheduler loops with virtual thread via custom schedulers is on the roadmap, but, encouraged by the performance of servers that go the “full simple” approach, this might not be a top priority and might take some time  [1].

 The API was removed for the simple reason that it’s just not ready, as you noticed.</div>

<div class=""><br class="">

</div>

<div class="">As for memory footprint, although this might not be the cause of your issue, it might interest you to know that we’re now working on dramatically reducing the footprint of virtual thread stacks. That work also wasn’t ready for 19, but is a higher

 priority than custom schedulers. So I’m interested to know how much of that excess footprint is due to virtual thread stacks (those would appear as jdk.internal.vm.StackChunk objects in your heap).</div>

<div class=""><br class="">

</div>

<div class="">What I’d like to hear more about is pinning, and what common causes of it you see. I would also be interested to hear your thoughts about how much of it is due to ecosystem readiness (e.g. some JDBC drivers don’t pin while others still do, although

 that’s expected to change).</div>

<div class=""><br class="">

</div>

<div class="">— Ron</div>

<div class=""><br class="">

</div>

<div class="">[1]: The “mechanical sympathy” effects you alluded to are real but too small in comparison to the throughput increase of thread-per-request code for them to be an immediate focus, especially as a work-stealing scheduler has pretty decent mechanical

 sympathy already. On the other hand, there are other reasons to support custom schedulers (e.g. UI event threads) that might shift the priority balance.</div>

<div class=""><br class="">

<div><br class="">

<blockquote type="cite" class="">

<div class="">On 26 Jul 2022, at 15:13, Clement Escoffier <<a href="mailto:clement.escoffier@redhat.com" class="">clement.escoffier@redhat.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div dir="ltr" class="">

<div class="gmail_default" style="font-size:small"><span id="gmail-docs-internal-guid-b48f9181-7fff-5f32-e9ba-7b6ecaa18d4a" class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Hello,</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">This

 email reports our observations around Loom, mainly in the context of Quarkus. It discusses the current approach and our plans. 

</span><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">We are sharing this information on our current success and

 challenges with Loom. Please let us know your thoughts and questions on our approach(es).</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Context</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Since

 the early days of the Loom project, we have been looking at various approaches to integrate Loom (mostly virtual threads) into Quarkus. Our goal was (and still is) to dispatch

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">processing

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">(HTTP requests, Kafka messages, gRPC calls)

 on virtual threads. Thus, the user would not have to think about blocking or not blocking (more on that later as it relates to the Quarkus architecture) and can write synchronous code without limiting the application's concurrency.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">To

 achieve this, we need to dispatch the processing on virtual threads but also have compliant

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">clients</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">

 to invoke remote services (HTTP, gRPC…), send messages (Kafka, AMQP), or interact with a data store (SQL or NoSQL).</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Quarkus

 Architecture</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Before

 going further, we need to explain how Quarkus is structured. Quarkus is based on a reactive engine (Netty + Eclipse Vert.x), so under the hood, Quarkus uses event loops to schedule the workloads and non-blocking I/O. There is also the possibility of using

 Netty Native Transport (epoll, kqueue, io_uring). </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">processing

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">can be either directly dispatched to the

 event loop or on a worker thread (OS thread). In the first case, the code must be written in an asynchronous and non-blocking manner. Quarkus proposes a programming model and safety guards to write such a code. In the latter case, the code can be blocking. </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Quarkus

 decides which dispatching strategy it uses for each </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">processing

 job</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">. The decision is based on the method

 signatures and annotations (for example, the user can force it to be called on an event loop or a worker thread). </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">When

 using a worker thread, the request is received on an event loop and dispatched to the worker thread, and when the response is ready to be written (when it fits in memory), Quarkus switches back to the event loop.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 current approach</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 integration of Loom's virtual threads is currently based[1] on a new annotation (@RunOnVirtualThread). It introduces a third dispatching strategy, and methods annotated with this annotation are called on a virtual thread. So, we now have three possibilities:</span></div>

<br class="">

<ul style="margin-top:0px;margin-bottom:0px" class="">

<li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre" class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Execute

 the processing on an event loop thread - the code must be non-blocking </span></div>

</li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre" class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Execute

 the processing on an OS (worker) thread - with the thread cost and concurrency limit</span></div>

</li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre" class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Execute

 the processing on a virtual thread</span></div>

</li></ul>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 following snippet shows an elementary example:</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">@GET</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">@Path("/loom")</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">@RunOnVirtualThread</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Fortune

 example() {</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">     var

 list = repo.findAll();</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">     return

 pickOne(list);</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">}</span></div>

<br class="">

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">This

 support is already experimentally available in Quarkus 2.10.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Previous

 attempts</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 current approach is not our first attempt. We had two other approaches that we discarded,  while the second one is something we want to reconsider.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">First

 Approach - All workers are virtual threads</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 first approach was straightforward. The idea was to replace the worker (OS) threads with Virtual Threads. However, we quickly realized some limitations. Long-running (purely CPU-bound) processing would block the carrier thread as there is no preemption. While

 the user should be aware that long-running processing should not be executed on virtual threads, in this model, it was not explicit. We also started capturing carrier thread pinning situation (our current approach still has this issue, we will explain our

 bandaid later). </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Second

 Approach - Marry event loops and carrier threads</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Quarkus

 is designed to reduce the memory usage of the application. We are obsessed with RSS usage, especially when running in a container where resources are scarce. It has driven lots of our architecture choices, including the dimensioning strategies (number of event

 loops, number of worker threads…). </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Thus,

 we investigated the possibility of avoiding having a second carrier thread pool and reducing the number of switches between the event loops and the carrier threads. We tried to use Netty event loops as carrier threads to achieve this. We had to use private

 APIs (which used to be public at some point in early access builds) to implement such an approach [3]. </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Unfortunately,

 we quickly ran into issues (explaining why our method is not part of the public API). Typically we had deadlock situations when a carrier thread shared locks with virtual threads. This made it impossible to use event-loops as carriers considering the probability

 of lock sharing.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">That

 custom scheduling strategy also prevents work stealing (Netty event loops do not handle work stealing) and must keep a strict ordering between I/O tasks.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Pros

 and Cons of the current approach</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Our

 current approach (based on @RunOnVirtualThread) integrates smoothly with the rest of Quarkus (even if the integration is limited to the HTTP part at that moment, as the integration with Kafka and gRPC are slightly more complicated but not impossible). </span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 user's code is written synchronously, and the users are aware of the dispatching strategy. Due to the limitation mentioned before, we still believe it's a good trade-off, even if not ideal. </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">However,

 the chances of pinning the carrier threads are still very high (</span><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(32,33,36);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">caused

 by pervasive usage in the ecosystem of certain common JDK features - synchronized, JNI, etc.)</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">.

 Because we would like to reduce the number of carrier threads to the bare minimum (to limit the RSS usage), we can end up with an underperforming application, which would have a concurrency level lower than the

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">classic

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">worker thread approach with pretty lousy

 response times. </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 Netty / Loom dance</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">We

 implemented a bandaid to reduce the chance of pinning while not limiting the users to a small set of Quarkus APIs. Remember, Quarkus is based on a reactive core, and most of our APIs are available in two forms:</span></div>

<ul style="margin-top:0px;margin-bottom:0px" class="">

<li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre" class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">An imperative

 form blocking the caller thread when dealing with I/O</span></div>

</li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre" class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">A reactive

 form that is non-blocking (reactive programming)</span></div>

</li></ul>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">To

 avoid thread pinning when running on a virtual thread, we offer the possibility to use the reactive form of our APIs but block the virtual thread while the result is still being computed. These

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">awaits</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">

 do not block the carrier thread and can be used with API returning 0 or one result, but also with streams (like Kafka topics) where you receive an iterator. </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">As

 said above, this is a band-aid until we have non-pinning clients/APIs.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Under

 the hood, there is a dance between the virtual thread and the netty event loop (used by the reactive API). It introduces a few unfortunate switches but workaround the pinning issue.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Observations</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Over

 the past year, we ran many tests to design and implement our integration. The current approach is far from ideal, but it works fine. </span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">We

 have excellent results when we compared with a full reactive approach and a worker approach (Quarkus can have the three variants in the same app).</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 response time under load is close enough to the reactive approach. It is far better than the classic worker thread approach [1][2].</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">However

 (remember we are obsessed with RSS), the RSS usage is very high. Even higher than the worker thread approach. At that moment, we are investigating where these objects come from. We hope to have a better understanding after the summer. Our observations show

 that the performance penalty is likely due to memory consumption (and GC cycles). However, as said, we are still investigating.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Ideally</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">For

 us (Quarkus) and probably several other Java frameworks based on Netty, it would be terrific if we could find a way to reconcile the two scheduling strategies (in a sense, we would use the event loops as carrier thread). Of course, there will be trade-offs

 and limitations. Our initial attempt didn't end well, but that does not mean it's a dead end. </span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">An

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">event-loop carrier thread</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">

 would greatly benefit the underlying reactive engine (Netty/Vert.x in the case of Quarkus). It retains some event-loop execution semantics: code is multithreaded (in the virtual thread meaning) yet executed with a single carrier thread that respects the event-loop

 principles and shall have decent mechanical sympathy. In addition, it should enable using classic blocking constructs (e.g.,

</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">java.util.lock.Lock</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">),

 whereas currently, it can only block on Vert.x (e.g., a Vert.x futures but not </span>

<span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">java.util.lock.Lock</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">)

 as Vert.x needs to be aware of the thread suspension to schedule event dispatching in a race-free / deadlock-free manner.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">With

 such integration, virtual threads would be executed on the event loop. When they "block", they would be unmounted, and I/O or another virtual thread would be processed. That would reduce the number of switches between threads, reduce RSS usage, and allow lots

 of Java frameworks to leverage Loom virtual threads quickly. Of course, this approach can only be validated empirically. Typically, it adds latency to every virtual thread dispatch. In addition, watchdogs would need to be implemented to prevent (or at least

 warn the user) the execution of long CPU-intensive actions that do not yield in an acceptable time.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Conclusion</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Our

 integration of Loom virtual threads in Quarkus is already available to our users, and we will be collecting feedback.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">As

 explained in this email, we have thus identified </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-weight:700;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">two

 issues</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 first one is purely about performance, and we were able to measure it empirically: the interaction between Loom and the Netty/Vert.x reactive stack seems to create an abundance of data structures that put pressure on the GC and degrade the overall performance

 of the application. As said above, we are investigating.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The

 second one is more general and also impacts </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">programming

 with Quarkus/Vert.x Loom. </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">The goal is to

 reconcile the scheduling strategies of Loom and Netty/Vert.x. This could improve performance by decreasing the number of context switches (Loom-Netty dance) and the RSS of an application. Moreover, it would enable the use of classic blocking constructs in

 Vert.x </span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">directly</span><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">

 -i.e., without wrapping them in Vert.x own abstractions). We could not validate and/or characterize the performance improvement of such a model yet. The result is unclear as we don’t know if the decrease in context switches would be outweighed by the additional

 latency in virtual threads dispatch.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">We

 are sharing this information on our current success and challenges with Loom.</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class=""> Please

 let us know your thoughts and concerns on our approach(es). Thanks!</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class=""><br class="">

</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">Clement</span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:10.5pt;font-family:Roboto,sans-serif;color:rgb(14,16,26);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class=""><br class="">

</span></div>

<br class="">

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">[1]

 - <a href="https://developers.redhat.com/devnation/tech-talks/integrate-loom-quarkus" class="">

https://developers.redhat.com/devnation/tech-talks/integrate-loom-quarkus</a></span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">[2]

 - <a href="https://github.com/anavarr/fortunes_benchmark" class="">https://github.com/anavarr/fortunes_benchmark</a></span></div>

<div style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;" class=""><span style="font-size:11pt;font-family:Arial;color:rgb(14,16,26);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap" class="">[3]

 - <a href="https://github.com/openjdk/loom/commit/cad26ce74c98e28854f02106117fe03741f69ba0" class="">

https://github.com/openjdk/loom/commit/cad26ce74c98e28854f02106117fe03741f69ba0</a></span></div>

</span><br class="gmail-Apple-interchange-newline">

</div>

</div>

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>