<div dir="auto">Thanks Ilya, I sadly agree with your observation: nowadays is difficult to find anything better for an apple to apple comparison...<div dir="auto">In Quarkus, my mate Eric De Andrea is going to provide a "quarkus super heroes" (in collaboration of my team - Performance MW team) benchmark, but is still Quarkus only, although very realistic in term of mix of used technologies (and it will have a reactive vs blocking part too). OT finished, I swear.</div><div dir="auto"><br></div><div dir="auto">Returning on the lack of cache friendly behaviour while directly interacting with sockets, I can give my 2c...</div><div dir="auto"><br></div><div dir="auto">I got very mixed feelings about thread per core architectures (with shared nothing approaches) vs what loom offer, and these are my points, related your fair observation:</div><div dir="auto">- a thread per core (a la Netty, let's say but Hazelcast is the same, or in the C++ world, the seastar framework) approach allows to bind every local access, including socket file descriptors ones, to be numa friendly - by setting affinity of a specific event loop thread to a specific numa node & core, which is awesome 👍 and tremendously effective for HFT or where tail latency really matter</div><div dir="auto">- BUT, a thread per core approach requires, to work at its best, a fair distribution of resources/load, and "rebalancing" such isn't something automatic: Netty, for example, doesn't allow to "move" file descriptors across event loops if some event loop have more free cycles to spare and could pickup that work :"/ additionally, the thread confined lifecycles of sockets makes connection pooling with reactive database drivers to be unable to guarantee to every event loop access to all the available database connections (which are scattered and partitioned statically among event loops) with the same performance: if an event loop which is serving a client request requires to use a database connection, but has exhausted the ones belonging to its event loop, could use the ones in another event loop, but it needs 2 thread handoffs back and forth while doing it, which is a penalty which Loom doesn't have: every virtual thread can hit a cache miss, but no threads handoff while interacting with any borrowed db connection.</div><div dir="auto"><br></div><div dir="auto">The last point is very specific of Netty, indeed Hazelcast afaik allow migration of file descriptors across event loops, but is still a periodic operation and never as natural as just acquire a resource and use it.</div><div dir="auto">If I have to weigh between 2 thread handoffs and the cache unfriendly "cold" access to a socket, I will probably pick the second one.</div><div dir="auto">What's interesting is that in the 1-2 available full cores of this new container world, probably both effects just fade away, or, are less important.</div><div dir="auto"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il mer 1 nov 2023, 21:22 Ilya Starchenko <<a href="mailto:st.ilya.101@gmail.com">st.ilya.101@gmail.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On 1 Nov 2023 at 01:51:44, Francesco Nigro <<a href="mailto:nigro.fra@gmail.com" target="_blank" rel="noreferrer">nigro.fra@gmail.com</a>> wrote:<br></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(155,155,155)!important;padding-left:1ex;color:rgb(155,155,155)!important" type="cite">
Techempower plaintext is highly pipelined (in the worst way, because is http 1.1 and NOT http 2, which is designed for that) and CPU bound, due to http encoding/decoding, especially if the framework is a "proper" one (see my rant at <a href="https://github.com/TechEmpower/FrameworkBenchmarks/discussions/7984" target="_blank" rel="noreferrer">https://github.com/TechEmpower/FrameworkBenchmarks/discussions/7984</a>) and materialize properly the headers; which means that an improvement in that part can be the responsible to achieve better numbers in techempower
</blockquote>
</div>
<br>
<div dir="ltr">
<p style="margin:0px;font-style:normal;font-variant-caps:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal">Franz,</p>
<p style="margin:0px;font-style:normal;font-variant-caps:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal;min-height:14px"><br></p>
<p style="margin:0px;font-style:normal;font-variant-caps:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal">Thank you for the clarification. I've already noticed that some of the Techempower benchmarks don't accurately represent real-world scenarios, but I haven't found another benchmark that would be more representative. I'll try profiling and perhaps look for alternative benchmarks (I've heard that the Quarkus team is working on some benchmarks).</p>
</div></div>
</blockquote></div>