<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Using VT pollers is about the same performance and still 100% CPU utilization. Using VT pollers with reduced parallelism is even slower, but it improved the worst case performance, and lowered the variance considerably:<div class=""><br class=""></div><div class="">VT pollers:</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 12px;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">robertengels@macmini go-wrk % ./go-wrk -c 1000 -d 30 -T 10000 <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Running 30s test @ <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">  1000 goroutine(s) running concurrently</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">3545553 requests in 29.463345004s, 422.66MB read</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Requests/sec:<span class="Apple-tab-span" style="white-space:pre">           </span>120337.76</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Transfer/sec:<span class="Apple-tab-span" style="white-space:pre">               </span>14.35MB</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Overall Requests/sec:<span class="Apple-tab-span" style="white-space:pre"> </span>115846.04</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Overall Transfer/sec:<span class="Apple-tab-span" style="white-space:pre">       </span>13.81MB</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Fastest Request:<span class="Apple-tab-span" style="white-space:pre">      </span>144µs</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Avg Req Time:<span class="Apple-tab-span" style="white-space:pre">          </span>8.309ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Slowest Request:<span class="Apple-tab-span" style="white-space:pre">      </span>1.418687s</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Number of Errors:<span class="Apple-tab-span" style="white-space:pre">   </span>2200</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Error Counts:<span class="Apple-tab-span" style="white-space:pre">            </span>connection reset by peer=2174,net/http: timeout awaiting response headers=26</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">10%:<span class="Apple-tab-span" style="white-space:pre">                     </span>2.33ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">50%:<span class="Apple-tab-span" style="white-space:pre">                   </span>2.869ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">75%:<span class="Apple-tab-span" style="white-space:pre">                  </span>3.004ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99%:<span class="Apple-tab-span" style="white-space:pre">                  </span>3.093ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99.9%:<span class="Apple-tab-span" style="white-space:pre">                        </span>3.096ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99.9999%:<span class="Apple-tab-span" style="white-space:pre">             </span>3.097ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99.99999%:<span class="Apple-tab-span" style="white-space:pre">            </span>3.097ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">stddev:<span class="Apple-tab-span" style="white-space:pre">                       </span>34.259ms</span></div></div></div><div class=""><br class=""><div>VT pollers with parallelism=3:</div><div><br class=""></div><div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">robertengels@macmini go-wrk % ./go-wrk -c 1000 -d 30 -T 10000 <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Running 30s test @ <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">  1000 goroutine(s) running concurrently</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">3351695 requests in 29.724461752s, 399.55MB read</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Requests/sec:<span class="Apple-tab-span" style="white-space:pre">          </span>112758.81</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Transfer/sec:<span class="Apple-tab-span" style="white-space:pre">              </span>13.44MB</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Overall Requests/sec:<span class="Apple-tab-span" style="white-space:pre">        </span>110119.93</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Overall Transfer/sec:<span class="Apple-tab-span" style="white-space:pre">      </span>13.13MB</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Fastest Request:<span class="Apple-tab-span" style="white-space:pre">     </span>145µs</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Avg Req Time:<span class="Apple-tab-span" style="white-space:pre">         </span>8.868ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Slowest Request:<span class="Apple-tab-span" style="white-space:pre">     </span>411.775ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Number of Errors:<span class="Apple-tab-span" style="white-space:pre">  </span>2308</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Error Counts:<span class="Apple-tab-span" style="white-space:pre">           </span>connection reset by peer=2308</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">10%:<span class="Apple-tab-span" style="white-space:pre">                   </span>3.429ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">50%:<span class="Apple-tab-span" style="white-space:pre">                 </span>3.867ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">75%:<span class="Apple-tab-span" style="white-space:pre">                 </span>3.975ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99%:<span class="Apple-tab-span" style="white-space:pre">                 </span>4.076ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99.9%:<span class="Apple-tab-span" style="white-space:pre">                       </span>4.08ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99.9999%:<span class="Apple-tab-span" style="white-space:pre">             </span>4.081ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">99.99999%:<span class="Apple-tab-span" style="white-space:pre">           </span>4.081ms</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(0, 0, 0); font-size: 11px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">stddev:<span class="Apple-tab-span" style="white-space:pre">                      </span>5.346ms</span></div></div><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 12, 2024, at 2:34 PM, Robert Engels <<a href="mailto:robaho@icloud.com" class="">robaho@icloud.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">I tested previously using VT pollers and saw no difference in performance - I expect it would be worse - as it would take even longer for the poller to run as it would be behind all of the other VT tasks. I think reducing the parallelism with VT pollers would be even worse still.<br class=""><br class=""><blockquote type="cite" class="">On Aug 12, 2024, at 2:32 PM, Robert Engels <<a href="mailto:robaho@icloud.com" class="">robaho@icloud.com</a>> wrote:<br class=""><br class="">Also, the JDK poller may be specific in that I am on OSX, but when I look at the default poller mode code, it is system threads: <a href="https://github.com/openjdk/loom/blob/cfcc05f57f0b5f212493da3a2abbac6abed2a48e/src/java.base/share/classes/sun/nio/ch/PollerProvider.java#L48" class="">https://github.com/openjdk/loom/blob/cfcc05f57f0b5f212493da3a2abbac6abed2a48e/src/java.base/share/classes/sun/nio/ch/PollerProvider.java#L48</a><br class=""><br class="">I think for Linux it is changed to be VT threads : <a href="https://github.com/openjdk/loom/blob/cfcc05f57f0b5f212493da3a2abbac6abed2a48e/src/java.base/linux/classes/sun/nio/ch/DefaultPollerProvider.java#L38" class="">https://github.com/openjdk/loom/blob/cfcc05f57f0b5f212493da3a2abbac6abed2a48e/src/java.base/linux/classes/sun/nio/ch/DefaultPollerProvider.java#L38</a><br class=""><br class=""><blockquote type="cite" class="">On Aug 12, 2024, at 2:29 PM, Robert Engels <<a href="mailto:robaho@icloud.com" class="">robaho@icloud.com</a>> wrote:<br class=""><br class="">Sorry, I should have included that. I am using (build 24-loom+3-33) - which is still using Pollers.<br class=""><br class="">As to sizing the pool - it would need to be based on the current connect count and their activity (and the configuration of number of pollers, etc.) - which I think for many (most?) systems would be hard to predict?<br class=""><br class="">This is why I contend that it would be better to lower the native priority of the carrier threads - I think it solves the sizing problem nicely (sorry the pun).<br class=""><br class="">If the available CPU is being exhausted by platform threads, then most likely the VT threads shouldn’t run at all (especially since they are non timesliced) - as the system is already in an overloaded state - and this would accomplish that.<br class=""><br class="">In this particular case it is a very high load, so unless I am misunderstanding you, I don’t think the scheduler is prioritizing correctly - since lowering the parallelism improves the situation. <br class=""><br class=""><blockquote type="cite" class="">On Aug 12, 2024, at 2:17 PM, Ron Pressler <<a href="mailto:ron.pressler@oracle.com" class="">ron.pressler@oracle.com</a>> wrote:<br class=""><br class="">What JDK are you using? I believe that as of JDK 22 there are no longer poller threads (polling is done by virtual threads running under the same scheduler). If you haven’t tried with JDK 22, try it; you may get better results.<br class=""><br class="">There is an inherent tension between scheduling under high loads and scheduling under lower loads. The problem is that to make a scheduler highly efficient at high loads you need to minimise the coordination among threads, which means that under utilisation cannot be easily detected (it’s easy for a thread to unilaterally detect it’s under heavy load when its submission queue grows, but detecting that fewer threads are needed requires coordination).<br class=""><br class="">The virtual thread scheduler is a work-stealing scheduler that prioritises higher loads over lower loads. In the future we may allow plugging in schedulers that are more adaptive to changing loads in exchange for being less efficient under high loads. Note that sizing the virtual thread scheduler is no harder than sizing a thread pool. The difference is that people are more used to more adaptive but less efficient thread pools.<br class=""><br class="">— Ron<br class=""><br class=""><br class=""><blockquote type="cite" class="">On 12 Aug 2024, at 20:03, robert engels <<a href="mailto:robaho@icloud.com" class="">robaho@icloud.com</a>> wrote:<br class=""><br class="">Hi.  <br class=""><br class="">I believe I have discovered what is essentially a priority inversion problem in the Loom network stack.<br class=""><br class="">I have been comparing a NIO based http server framework with one using VT.<br class=""><br class="">The VT framework when using the Loom defaults consistently uses significantly more CPU and achieves less throughput and overall performance.<br class=""><br class="">By lowering the parallelism, the VT framework exceeds the NIO framework in throughput and overall performance.<br class=""><br class="">My research and hypothesis is that if you create as many carrier threads as CPUs, then they compete withe the network poller threads. By reducing the number of carrier threads, the poller can run. If the poller can’t run, then many/all of the carrier threads will eventually park waiting for a runnable task, but there won’t be one until the poller runs - which is why the poller must have priority over the carrier threads.<br class=""><br class="">I believe the best solution would be to lower the native priority of carrier threads as explicitly configuring the number of carrier threads accurately will be nearly impossible for varied workloads. I suspect this would also help GC be more deterministic as well.<br class=""><br class="">Using default parallelism:<br class=""><br class="">robertengels@macmini go-wrk % ./go-wrk -c 1000 -d 30 -T 10000 <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a><br class="">Running 30s test @ <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a><br class="">1000 goroutine(s) running concurrently<br class="">3557027 requests in 29.525311463s, 424.03MB read<br class="">Requests/sec: 120473.82<br class="">Transfer/sec: 14.36MB<br class="">Overall Requests/sec: 116021.08<br class="">Overall Transfer/sec: 13.83MB<br class="">Fastest Request: 112µs<br class="">Avg Req Time: 8.3ms<br class="">Slowest Request: 839.967ms<br class="">Number of Errors: 2465<br class="">Error Counts: broken pipe=2,connection reset by peer=2451,net/http: timeout awaiting response headers=12<br class="">10%: 2.102ms<br class="">50%: 2.958ms<br class="">75%: 3.108ms<br class="">99%: 3.198ms<br class="">99.9%: 3.201ms<br class="">99.9999%: 3.201ms<br class="">99.99999%: 3.201ms<br class="">stddev: 32.52ms<br class=""><br class="">and using reduced parallelism of 3:<br class=""><br class="">robertengels@macmini go-wrk % ./go-wrk -c 1000 -d 30 -T 10000 <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a><br class="">Running 30s test @ <a href="http://imac:8080/plaintext" class="">http://imac:8080/plaintext</a><br class="">1000 goroutine(s) running concurrently<br class="">4059418 requests in 29.092649689s, 483.92MB read<br class="">Requests/sec: 139534.14<br class="">Transfer/sec: 16.63MB<br class="">Overall Requests/sec: 132608.44<br class="">Overall Transfer/sec: 15.81MB<br class="">Fastest Request: 115µs<br class="">Avg Req Time: 7.166ms<br class="">Slowest Request: 811.999ms<br class="">Number of Errors: 2361<br class="">Error Counts: net/http: timeout awaiting response headers=51,connection reset by peer=2310<br class="">10%: 1.899ms<br class="">50%: 2.383ms<br class="">75%: 2.478ms<br class="">99%: 2.541ms<br class="">99.9%: 2.543ms<br class="">99.9999%: 2.544ms<br class="">99.99999%: 2.544ms<br class="">stddev: 32.88ms<br class=""><br class="">More importantly, the reduced parallelism has a cpu idle percentage of 30% (which matches the NIO framework) whereas the default parallelism has an idle of near 0 % (due to scheduler thrashing).<br class=""><br class="">The attached JFR screenshot (I have also attached the JFR captures) tells the story. #2 is the VT with default parallelism. #3 is the NIO based framework, and #4 is VT with reduced parallelism. #2 clearly shows the thrashing that is occurring with threads parking and unparking and the scheduler waiting for work.<br class=""><br class=""><PastedGraphic-1.png> <br class=""><br class=""><br class=""><profile2.jfr><profile3.jfr><profile4.jfr><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></div></div></blockquote></div><br class=""></div></body></html>