<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">Just an fyi, until you get into the order of 1k, 10k, etc. concurrent clients - I would expect platform threads to outperform virtual threads by quite a bit (best case be the same). Modern OS’s routinely handle thousands of active threads. (My OSX desktop with 4 true cores has nearly 5k threads running).</div><div class=""><br class=""></div><div class="">Also, if you can saturate your CPUs or local IO bus, adding more threads isn’t going to help. VirtualThreads shine when the request handler is fanning out to multiple remote services.</div><div class=""><br class=""></div><div class="">Regards,</div><div class="">Robert</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jun 21, 2024, at 12:05 PM, Matthew Swift <<a href="mailto:matthew.swift@gmail.com" class="">matthew.swift@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hello again,<div class=""><br class=""></div><div class="">As promised, here is my second (shorter I hope!) email sharing feedback on the recent Loom EA build (23-loom+4-102). If follows up on my previous email <a href="https://mail.openjdk.org/pipermail/loom-dev/2024-June/006788.html" class="">https://mail.openjdk.org/pipermail/loom-dev/2024-June/006788.html</a>.</div><div class=""><br class=""></div><div class="">I performed some experiments using the same application described in my previous email. However, in order to properly test the improvements to Object monitors (synchronized blocks and Object.wait()) I reverted all of the thread-pinning related changes that I had made in order to support virtual threads with JDK21. Specifically, I reverted the changes converting uses of monitors to ReentrantLock.</div><div class=""><br class=""></div><div class="">I'm pleased to say that this EA build looks extremely promising! :-) </div><div class=""><br class=""></div><div class="">### Experiment #1: read stress test</div><div class=""><br class=""></div><div class="">* platform threads: 215K/s throughput, CPU 14% idle</div><div class="">* virtual threads: 235K/s throughput, CPU 5% idle.</div><div class=""><br class=""></div><div class="">Comment: there's a slight throughput improvement, but CPU utilization is slightly higher too. Presumably this is due to the number of carrier threads being closely matched to the number of CPUs (I noticed significantly less context switching with v threads).</div><div class=""><br class=""></div><div class="">### Experiment #2: heavily indexed write stress test, with 40 clients</div><div class=""><br class=""></div><div class=""><div class="">* platform threads: 9300/s throughput, CPU 27% idle</div><div class="">* virtual threads: 8800/s throughput, CPU 24% idle.</div><div class=""><br class=""></div></div><div class="">Comment: there is a ~5% performance degradation using virtual threads. This is better than the degradation I observed in my previous email after switching to ReentrantLock though.</div><div class=""><br class=""></div><div class=""><div class="">### Experiment #3: extreme heavy indexed write stress test, with 120 clients</div><div class=""><br class=""></div><div class=""><div class="">* platform threads: 1450/s throughput</div><div class="">* virtual threads: 1450/s throughput (i.e. about the same).</div></div></div><div class=""><br class=""></div><div class="">Comment:</div><div class=""><br class=""></div><div class="">This test is intended to stress the internal locking mechanisms as much as possible and expose any pinning problems.</div><div class="">With JDK21 virtual threads the test would sometimes deadlock and thread dumps would show 100+ fork join carrier threads.</div><div class="">This is no longer the case with the EA build. It looks really solid.</div><div class=""><br class=""></div><div class="">This test does expose one important difference between platform threads and virtual threads though. Let's take a look at the response times:</div><div class=""><br class=""></div><div class="">Platform threads:</div><div class=""><br class=""></div><div class="">-------------------------------------------------------------------------------<br class="">| Throughput | Response Time | | <br class="">| (ops/second) | (milliseconds) | | <br class="">| recent average | recent average 99.9% 99.99% 99.999% | err/sec | <br class="">-------------------------------------------------------------------------------<br class="">...<br class="">| 1442.6 1606.6 | 83.097 74.683 448.79 599.79 721.42 | 0.0 | <br class="">| 1480.8 1594.0 | 81.125 75.282 442.50 599.79 721.42 | 0.0 |</div><div class=""><br class=""></div><div class="">Virtual threads:</div><div class=""><br class=""></div><div class=""> -------------------------------------------------------------------------------<br class=""></div>| Throughput | Response Time | | <br class="">| (ops/second) | (milliseconds) | | <br class="">| recent average | recent average 99.9% 99.99% 99.999% | err/sec | <br class="">-------------------------------------------------------------------------------<br class="">...<br class="">| 1445.4 1645.3 | 81.375 72.623 3170.89 4798.28 8925.48 | 0.0 | <br class="">| 1442.2 1625.0 | 81.047 73.371 3154.12 4798.28 6106.91 | 0.0 | <div class=""><br class=""></div><div class="">The outliers with virtual threads are much much higher. Could this be due to potential starvation when rescheduling virtual threads in the fork join pool?</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Matt</div><div class=""><br class=""></div></div>
</div></blockquote></div><br class=""></div></body></html>