<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">To Rob’s point, check out the latest tech empower prelim benchmarks. Compare httpserver to httpserver-robaho.</div><div dir="ltr"><br></div><div dir="ltr">A server that is fully async or uses VT is a better test. </div><div dir="ltr"><br></div><div dir="ltr">I suggest testing with a server based github.com/robaho/httpserver</div><div dir="ltr"><br><blockquote type="cite">On Jul 25, 2024, at 5:08 AM, Rob Bygrave <robin.bygrave@gmail.com> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div><br></div><div>Some thoughts in case it's useful (I've done a similar test quite a long time ago).<div><br></div><div>To me, it looks like with VT it will approximately try to submit 10K requests <i>almost</i> concurrently, where as with platform threads the number of concurrent requests is limited to the number of available processors (so there is a lot less concurrent requests in the platform thread case - there is a good chance the server is much happier at the lower concurrency).</div><div><br></div><div>First note is that the VT test will can look like a <i>Denial of service attack</i>. As I see it, for this test to be useful the server needs to be able to handle the level of concurrency we want to throw at it. When I ran this test I needed my http server to run VT in order to handle the high level of concurrent requests (as otherwise the http server hits its max concurrency and then effectively queues requests). If you know the max concurrency that the http server can handle, then you can run interesting tests around this max. The question for this test, is if <a href="http://192.168.1.17:8080/v1/crawl/delay/">http://192.168.1.17:8080/v1/crawl/delay/</a> is happy with the level of concurrency being thrown at it in the VT case.</div><div><br></div><div>When I ran a similar test (quite some time ago) I also found empirically that I needed to slow down the submission of jobs in the VT case. That is, it wasn't happy submitting 10K requests in a really tight loop and I believe I actually added in some LockSupport.parkNanos() inside that tight 10K loop in order to get what I thought was "better" behaviour on the test.</div></div><div><br></div><div>Hopefully that gives you some ideas.</div><div><br></div><div>Cheers, Rob.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 25 Jul 2024 at 21:12, David <<a href="mailto:david.vlijmincx@gmail.com">david.vlijmincx@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<br><br>I'm looking for some insight into some benchmark performance [1]. I'm comparing the performance of Virtual threads and Platform threads by submitting 10_000 tasks that perform 1 or 2 API requests to either a newVirtualThreadPerTaskExecutor or an Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()). <br><br>I expected the Virtual threads to perform better because the tasks don't look CPU-bound (response times around 4ms), but platform threads outperformed the Virtual threads when a response returned within 9ms. The results using JDK 24-loom+1-17 (2024/6/22) were the following:<br><br>Benchmark                                                      (delay)  (numberOfCalls)  Mode  Cnt  Score    Error  Units<br>Improv.LoomBenchmark.FixedPlatformPool            0                1  avgt   10  0.808 ±  0.001   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            0                2  avgt   10  1.614 ±  0.001   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            1                1  avgt   10  1.239 ±  0.002   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            1                2  avgt   10  2.473 ±  0.004   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            2                1  avgt   10  1.461 ±  0.003   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            2                2  avgt   10  2.918 ±  0.004   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            3                1  avgt   10  1.730 ±  0.002   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            3                2  avgt   10  3.461 ±  0.008   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            4                1  avgt   10  2.017 ±  0.003   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            4                2  avgt   10  4.032 ±  0.007   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            5                1  avgt   10  2.317 ±  0.003   s/op<br>Improv.LoomBenchmark.FixedPlatformPool            5                2  avgt   10  4.633 ±  0.006   s/op<br><br>Improv.LoomBenchmark.virtualThreadExecutor        0                1  avgt   10  2.976 ±  0.262   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        0                2  avgt   10  2.909 ±  0.093   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        1                1  avgt   10  2.853 ±  0.181   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        1                2  avgt   10  2.913 ±  0.146   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        2                1  avgt   10  2.875 ±  0.254   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        2                2  avgt   10  2.876 ±  0.112   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        3                1  avgt   10  2.772 ±  0.126   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        3                2  avgt   10  2.856 ±  0.196   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        4                1  avgt   10  2.731 ±  0.166   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        4                2  avgt   10  2.888 ±  0.155   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        5                1  avgt   10  2.855 ±  0.213   s/op<br>Improv.LoomBenchmark.virtualThreadExecutor        5                2  avgt   10  2.906 ±  0.172   s/op<br><br>The delay column shows how much delay in milliseconds was added by the end-point. Without any added delay the response time is about 4ms. I attached two visualizations of the results to this mail.<br><br>I'd be grateful if someone could shed some light on potential reasons behind virtual threads underperforming for these short-lived API with below 9ms response times. Are there any best practices for benchmarking Virtual threads that I might be missing?<br><br>Kind regards,<br>David<br><br>[1]: <a href="https://github.com/davidtos/VirtualThreadPerformanceShortLivedTasks/blob/master/src/main/java/Improv/LoomBenchmark.java" target="_blank">https://github.com/davidtos/VirtualThreadPerformanceShortLivedTasks/blob/master/src/main/java/Improv/LoomBenchmark.java</a><br></div>

</blockquote></div>

</div></blockquote></body></html>