remove corePoolSize in ForkJoinPool<init> and hope to improve FJP's algorithm

Tue Feb 18 09:18:10 UTC 2025

Previously, I intended to propose adding more params for configuring virtual thread scheduling <https://bugs.openjdk.org/browse/JDK-8349763 >. However, after carefully reading the code, I realized that in the FJP (ForkJoinPool) initialization, the corePoolSize is not actually used. I think it would be best to either remove this parameter or at least update the parameter description comment to reduce confusion for readers. 
In addition, I conducted some experiments and found that creating threads in the FJP actually does not cause significant fluctuations (approximately around 1ms). The phenomenon I encountered was actually caused by the unfair scheduling of the unpark operation. When a virtual thread is suspended and more tasks arrive, then more tasks may wait to be awakened. The first virtual thread task that choose to yield may not be given execution thread resources with priority. This will increase the latency of a request, especially under high CPU utilization. I hope this part could be improved. I provide some data below. (Tasks simulate a yield scenario through sleep(100). The whole execution time is about 102ms when using a normal java thread.)
high CPU：
[Thread-name taskid:calc-result time(ms): used-time]
VirtualThread[#25]/runnable at ForkJoinPool-1-worker-4 1:495000 time(ms): 177.633408
VirtualThread[#26]/runnable at ForkJoinPool-1-worker-6 2:495000 time(ms): 183.056113
VirtualThread[#27]/runnable at ForkJoinPool-1-worker-6 3:495000 time(ms): 191.993923
vtTimes Statistics:
Mean: 132.19690011333338
P50: 123.7981245
P90: 164.3613794
P99: 170.36093036999995
Max: 191.993923
VirtualThread[#25]/runnable at ForkJoinPool-1-worker-6 1:495000 time(ms): 185.086663
VirtualThread[#26]/runnable at ForkJoinPool-1-worker-4 2:495000 time(ms): 194.068939
VirtualThread[#27]/runnable at ForkJoinPool-1-worker-4 3:495000 time(ms): 203.031132
vtTimes Statistics:
Mean: 116.19142947333329
P50: 104.2598835
P90: 161.2748723
P99: 185.14847699999993
Max: 203.031132
low CPU
vtTimes Statistics:
Mean: 119.62772587666666
P50: 120.145596
P90: 123.3764946
P99: 125.27045233
Max: 126.963864
From the data, we can see that the tasks which yielded earliest are only awakened and resumed at the very end, resulting in long latencies.
Besides, I’m wondering if we could reduce latency issues under high pressure by increasing the number of threads available for executing tasks. It's a commonly used method when combining ThreadPoolExecutor with a SynchronousQueue.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20250218/dbefcad7/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 558781 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20250218/dbefcad7/temp4cj-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 103201 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20250218/dbefcad7/temp4cj-0003.png>