Loom and high performance networking
Robert Engels
robaho at icloud.com
Tue Aug 13 11:25:20 UTC 2024
I noticed the errors too, but I have tested with two different client impls and two different server impls, and both report similar error rates (about .25%). I believe it is the OSX networking layer closing connections due to lack of buffer space or detecting what it thinks is a DoS attack.
Anyway, with a different client (no http pipelining) I get similar results:
With default parallelism:
robertengels at macmini go-wrk % wrk -H 'Host: imac' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 20 -c 1000 --timeout 8 -t 8 http://imac:8080/plaintext
Running 20s test @ http://imac:8080/plaintext
8 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 60.53ms 137.57ms 2.41s 88.69%
Req/Sec 14.71k 2.13k 22.55k 77.25%
Latency Distribution
50% 4.98ms
75% 34.80ms
90% 229.44ms
99% 658.57ms
2342553 requests in 20.02s, 317.23MB read
Socket errors: connect 0, read 1926, write 4, timeout 0
Requests/sec: 116990.94
Transfer/sec: 15.84MB
and with parallelism=3:
robertengels at macmini go-wrk % wrk -H 'Host: imac' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 20 -c 1000 --timeout 8 -t 8 http://imac:8080/plaintext
Running 20s test @ http://imac:8080/plaintext
8 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 39.91ms 90.53ms 822.66ms 89.29%
Req/Sec 14.99k 2.84k 23.83k 75.06%
Latency Distribution
50% 6.62ms
75% 9.54ms
90% 148.83ms
99% 392.24ms
2386247 requests in 20.02s, 323.15MB read
Socket errors: connect 0, read 2833, write 5, timeout 0
Requests/sec: 119201.21
Transfer/sec: 16.14MB
and similarly, the default parallelism shows much higher cpu usage - 8% idle vs 25% idle for the latter.
Using VT pollers (pollerMode=2) and default parallelism:
robertengels at macmini go-wrk % wrk -H 'Host: imac' -H 'Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 20 -c 1000 --timeout 8 -t 8 http://imac:8080/plaintext
Running 20s test @ http://imac:8080/plaintext
8 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 66.76ms 137.25ms 1.70s 87.06%
Req/Sec 14.38k 2.36k 21.94k 75.88%
Latency Distribution
50% 4.74ms
75% 51.68ms
90% 263.08ms
99% 664.99ms
2289858 requests in 20.02s, 310.10MB read
Socket errors: connect 0, read 2135, write 7, timeout 0
Requests/sec: 114360.91
Transfer/sec: 15.49MB
and the same 8% idle.
So I am pretty sure my hypothesis is correct. I may try and build a Loom / use a library to lower priority of the carrier threads. I suspect I will see similar performance to the reduced parallelism case.
> On Aug 13, 2024, at 2:25 AM, Alan Bateman <alan.bateman at oracle.com> wrote:
>
>
>
> On 12/08/2024 20:29, Robert Engels wrote:
>> Sorry, I should have included that. I am using (build 24-loom+3-33) - which is still using Pollers.
>>
> That's based on jdk-24+8 so up to date.
>
> There are two "poller modes". On macOS it uses the "system thread" poller mode by default but you can experiment with -Djdk.pollerMode=2 to have the poller threads be virtual threads, same as the default on Linux. To do this will also mean setting -Djdk.readPollers=4 or some power of 2 value. Our testing on macOS some time ago didn't converge on the best default.
>
> There seems to be a lot of errors (broken pipe, connection reset, and other timeouts) in your results. Do you have some TCP/networking tuning to remove these from the runs as I assume timeouts will skew some of the results.
>
> -Alan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240813/60bf2bd3/attachment-0001.htm>
More information about the loom-dev
mailing list