re: re:remove corePoolSize in ForkJoinPool<init> and hope to improve FJP's algorithm

唐佳未(佳未) tjw378335 at alibaba-inc.com
Thu Feb 20 01:40:23 UTC 2025


------------------------------------------------------------------
发件人:唐佳未(佳未) <tjw378335 at alibaba-inc.com>
发送时间:2025年2月19日(星期三) 19:07
收件人:Alan Bateman<alan.bateman at oracle.com>
主 题:回复:remove corePoolSize in ForkJoinPool<init> and hope to improve FJP's algorithm
I changed to "24-ea" version and breakdown the time into two part. The "run" part represents the time taken to execute the run function, while "wait-run" represents the time elapsed between when the task is submitted and when the execution of the run function actually begins. I found that during execution, both the tasks switched out and the tasks entering the waiting queue may not be scheduled fairly.
A sample output data:
vtTimes(run) Statistics:
Mean: 115.06817072600037
P50: 115.65260599999999
P90: 129.6860885
P99: 138.11789536999999
Max: 143.395163
vtTimes(wait-run) Statistics:
Mean: 10.94150815699996
P50: 10.7257605
P90: 24.259161200000005
P99: 26.810217639999998
Max: 69.963936
vtTimes(wait-run + run) Statistics:
Mean: 126.0096788829999
P50: 126.104803
P90: 139.717534
P99: 154.83003226000002
Max: 193.316744
Some print information for the "max" data:
Max: 143.395163
[Thread-name  taskid run(ms) wait-run(ms) total(ms) ]
VirtualThread[#1382]/runnable at ForkJoinPool-1-worker-1 1345-calc-result:495000 run(ms): 122.404413 wait-run(ms): 4.9543 total(ms): 127.358713
VirtualThread[#1069]/runnable at ForkJoinPool-1-worker-4 1033-calc-result:495000 run(ms): 137.422592 wait-run(ms): 6.725234 total(ms): 144.147826
VirtualThread[#661]/runnable at ForkJoinPool-1-worker-6 625-calc-result:495000 run(ms): 143.395163 wait-run(ms): 24.694745 total(ms): 168.089908
VirtualThread[#1071]/runnable at ForkJoinPool-1-worker-5 1035-calc-result:495000 run(ms): 137.435417 wait-run(ms): 6.710765 total(ms): 144.146182
VirtualThread[#1384]/runnable at ForkJoinPool-1-worker-1 1347-calc-result:495000 run(ms): 120.137852 wait-run(ms): 0.040419 total(ms): 120.178271
Max: 69.963936
Max: 193.316744
[Thread-name  taskid run(ms) wait-run(ms) total(ms) ]
VirtualThread[#1176]/runnable at ForkJoinPool-1-worker-1 1140-calc-result:495000 run(ms): 138.525482 wait-run(ms): 6.27873 total(ms): 144.804212
VirtualThread[#1185]/runnable at ForkJoinPool-1-worker-5 1149-calc-result:495000 run(ms): 123.328816 wait-run(ms): 21.354414 total(ms): 144.68323
VirtualThread[#34]/runnable at ForkJoinPool-1-worker-6 4-calc-result:495000 run(ms): 123.352808 wait-run(ms): 69.963936 total(ms): 193.316744
VirtualThread[#1383]/runnable at ForkJoinPool-1-worker-7 1346-calc-result:495000 run(ms): 122.493635 wait-run(ms): 0.085193 total(ms): 122.578828
VirtualThread[#1387]/runnable at ForkJoinPool-1-worker-1 1350-calc-result:495000 run(ms): 122.274342 wait-run(ms): 0.045768 total(ms): 122.32011
4 output of jcmd:
66684:
java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 7, running = 0, steals = 3931, tasks = 0, submissions = 1426]
Delayed task schedulers:
[0] java.util.concurrent.ScheduledThreadPoolExecutor at 1540e19d[Running, pool size = 1, active threads = 0, queued tasks = 1184, completed tasks = 517]
[1] java.util.concurrent.ScheduledThreadPoolExecutor at 14ae5a5[Running, pool size = 1, active threads = 1, queued tasks = 1640, completed tasks = 991]
66684:
java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 8, running = 0, steals = 5908, tasks = 0, submissions = 48]
Delayed task schedulers:
[0] java.util.concurrent.ScheduledThreadPoolExecutor at 1540e19d[Running, pool size = 1, active threads = 0, queued tasks = 1143, completed tasks = 556]
[1] java.util.concurrent.ScheduledThreadPoolExecutor at 14ae5a5[Running, pool size = 1, active threads = 1, queued tasks = 1618, completed tasks = 1012]
66684:
java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 8, running = 0, steals = 7731, tasks = 0, submissions = 236]
Delayed task schedulers:
[0] java.util.concurrent.ScheduledThreadPoolExecutor at 1540e19d[Running, pool size = 1, active threads = 0, queued tasks = 1062, completed tasks = 846]
[1] java.util.concurrent.ScheduledThreadPoolExecutor at 14ae5a5[Running, pool size = 1, active threads = 1, queued tasks = 1094, completed tasks = 1842]
66684:
java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 2, running = 0, steals = 10370, tasks = 0, submissions = 0]
Delayed task schedulers:
[0] java.util.concurrent.ScheduledThreadPoolExecutor at 1540e19d[Running, pool size = 1, active threads = 0, queued tasks = 235, completed tasks = 1729]
[1] java.util.concurrent.ScheduledThreadPoolExecutor at 14ae5a5[Running, pool size = 1, active threads = 0, queued tasks = 373, completed tasks = 2663]
JDK:
openjdk version "24-ea" 2025-03-18
OpenJDK Runtime Environment (build 24-ea+29-3578)
OpenJDK 64-Bit Server VM (build 24-ea+29-3578, mixed mode, sharing)
------------------------------------------------------------------
发件人:Alan Bateman <alan.bateman at oracle.com>
发送时间:2025年2月19日(星期三) 00:23
收件人:"唐佳未(佳未)"<tjw378335 at alibaba-inc.com>; "loom-dev"<loom-dev at openjdk.org>
主 题:Re: remove corePoolSize in ForkJoinPool<init> and hope to improve FJP's algorithm
On 18/02/2025 09:18, 唐佳未(佳未) wrote:
: 
From the data, we can see that the tasks which yielded earliest are only awakened and resumed at the very end, resulting in long latencies.
Besides, I’m wondering if we could reduce latency issues under high pressure by increasing the number of threads available for executing tasks. It's a commonly used method when combining ThreadPoolExecutor with a SynchronousQueue.
 In FJP, each worker thread owns a local queue. A worker thread executes the tasks in its local queue before scanning other queues for work.
 There are unowned submission queues that are used when for tasks submitted by (mostly) platform threads. If a platform thread unparks a virtual thread (for example, a platform thread in the TPE uses a SQ to rendezvous with a virtual thread in your scenario, then the task for the virtual thread will be pushed to one of these unowned submission queues. Same thing if there is a timeout, the task to continue the virtual thread will be pushed to an unowned submission queue.
 My reading of your mail is that a virtual thread is calling Thread.sleep and you are measuring the time until it continues. In the "high CPU" case it may be that FJP workers only execute tasks in their local queue so they don't scan the unowned submission queues very often, is this what you are seeing? With the JDK 24 EA builds it would be useful to execute `jcmd <pid> Thread.vthread_scheduler` a few times to gets some stats as I think this would help the discussion.
 -Alan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20250220/ee8d7661/attachment-0001.htm>


More information about the loom-dev mailing list