Loom and high performance networking

robert engels robaho at icloud.com
Mon Aug 12 19:03:53 UTC 2024


Hi. 

I believe I have discovered what is essentially a priority inversion problem in the Loom network stack.

I have been comparing a NIO based http server framework with one using VT.

The VT framework when using the Loom defaults consistently uses significantly more CPU and achieves less throughput and overall performance.

By lowering the parallelism, the VT framework exceeds the NIO framework in throughput and overall performance.

My research and hypothesis is that if you create as many carrier threads as CPUs, then they compete withe the network poller threads. By reducing the number of carrier threads, the poller can run. If the poller can’t run, then many/all of the carrier threads will eventually park waiting for a runnable task, but there won’t be one until the poller runs - which is why the poller must have priority over the carrier threads.

I believe the best solution would be to lower the native priority of carrier threads as explicitly configuring the number of carrier threads accurately will be nearly impossible for varied workloads. I suspect this would also help GC be more deterministic as well.

Using default parallelism:

robertengels at macmini go-wrk % ./go-wrk -c 1000 -d 30 -T 10000 http://imac:8080/plaintext
Running 30s test @ http://imac:8080/plaintext
  1000 goroutine(s) running concurrently
3557027 requests in 29.525311463s, 424.03MB read
Requests/sec:		120473.82
Transfer/sec:		14.36MB
Overall Requests/sec:	116021.08
Overall Transfer/sec:	13.83MB
Fastest Request:	112µs
Avg Req Time:		8.3ms
Slowest Request:	839.967ms
Number of Errors:	2465
Error Counts:		broken pipe=2,connection reset by peer=2451,net/http: timeout awaiting response headers=12
10%:			2.102ms
50%:			2.958ms
75%:			3.108ms
99%:			3.198ms
99.9%:			3.201ms
99.9999%:		3.201ms
99.99999%:		3.201ms
stddev:			32.52ms

and using reduced parallelism of 3:

robertengels at macmini go-wrk % ./go-wrk -c 1000 -d 30 -T 10000 http://imac:8080/plaintext
Running 30s test @ http://imac:8080/plaintext
  1000 goroutine(s) running concurrently
4059418 requests in 29.092649689s, 483.92MB read
Requests/sec:		139534.14
Transfer/sec:		16.63MB
Overall Requests/sec:	132608.44
Overall Transfer/sec:	15.81MB
Fastest Request:	115µs
Avg Req Time:		7.166ms
Slowest Request:	811.999ms
Number of Errors:	2361
Error Counts:		net/http: timeout awaiting response headers=51,connection reset by peer=2310
10%:			1.899ms
50%:			2.383ms
75%:			2.478ms
99%:			2.541ms
99.9%:			2.543ms
99.9999%:		2.544ms
99.99999%:		2.544ms
stddev:			32.88ms

More importantly, the reduced parallelism has a cpu idle percentage of 30% (which matches the NIO framework) whereas the default parallelism has an idle of near 0 % (due to scheduler thrashing).

The attached JFR screenshot (I have also attached the JFR captures) tells the story. #2 is the VT with default parallelism. #3 is the NIO based framework, and #4 is VT with reduced parallelism. #2 clearly shows the thrashing that is occurring with threads parking and unparking and the scheduler waiting for work.






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/attachment-0005.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 1006974 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/PastedGraphic-1-0001.png>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/attachment-0006.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: profile2.jfr
Type: application/octet-stream
Size: 586573 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/profile2-0001.jfr>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/attachment-0007.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: profile3.jfr
Type: application/octet-stream
Size: 451624 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/profile3-0001.jfr>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/attachment-0008.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: profile4.jfr
Type: application/octet-stream
Size: 466975 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/profile4-0001.jfr>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240812/3ed2f917/attachment-0009.htm>


More information about the loom-dev mailing list