Performance Questions and Poller Implementation in Project Loom

Tue Oct 31 18:44:26 UTC 2023

On 31/10/2023 17:59, Ilya Starchenko wrote:
>
> Hello loom-dev team,
>
>
> I would like to express my gratitude for the work being done on 
> Project Loom. It's an exciting project with a lot of potential. I have 
> some questions related to its performance that I would appreciate some 
> clarification on.
>
>
> I recently came across a benchmark presentation by Alan Bateman at 
> Devoxx<https://youtu.be/XF4XZlPZc_c?si=-Qp2PampTbNGj3a5>, where the 
> Helidon Nima framework demonstrated better performance results 
> compared to a reactive framework. However, when I examined the 
> Plaintext benchmark (specifically focusing on Netty and Undertow, 
> which benchmark only plaintext), I noticed that Nima, which operates 
> entirely on virtual threads, failed to outperform even the blocking 
> Undertow. Additionally, I conducted tests with Tomcat and Jetty using 
> Loom's executor, and they also did not exhibit significant 
> improvements compared to a reactive stack. Perhaps I should ask the 
> Helidon team, but my question is, is this the expected performance 
> level for Project Loom, or can we anticipate better performance in the 
> future?
>

Slide 11 in that presentation is where I showed Heldion 3 vs. Heldion 4 
results. The slide has a link to the results published on 
www.techempower.com/benchmarks. There may be some Helidon maintainers on 
this mailing list to say more about this. The slides were dulled in the 
recording (it happened to a few other too) but you can find the original 
slides here if you want:

https://cr.openjdk.org/~alanb/devoxx2023/ProjectLoomDevoxx2023.pdf

>
> Furthermore, I took a closer look at the Poller 
> implementation<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/ch/Poller.java#L436>, 
> and I noticed that it utilizes only one thread (by default) for both 
> read and write polling. I'm curious why there's only one thread, and 
> wouldn't it be more efficient to have pollers matching the number of 
> CPU cores for optimal performance?
>
>
This code is significantly replaced in the fibers branch of the loom 
repo. We have a draft PR to bring this into the main line for JDK 22. 
The main difference is that the number of epoll instances is based on 
the number of hardware threads, actually the closest power of 2, with a 
virtual thread per instance. The effect of this is to integrate I/O 
event handling with the virtual thread scheduler which works surprising 
well under load. I don't know if you build the JDK from source; if you 
do then you can add that to the list to test.

-Alan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20231031/00c84ec7/attachment-0001.htm>