remove corePoolSize in ForkJoinPool<init> and hope to improve FJP's algorithm

Mon Feb 24 12:15:25 UTC 2025

The general problem with benchmarks is that their results only apply to the behaviour *in the benchmark* but we then tend to extrapolated the result to other cases that may not behave like the benchmark at all.

In this case, there are two important details that affect behaviour:

* The “yields” are timed sleeps
* Tasks are submitted from a platform thread

Timed sleeps are scheduled differently from untimed ones such as reading from a socket (this is true, BTW, not only for our virtual thread scheduler but also for some OS thread schedulers), and submissions from platform threads are scheduled differently from submissions from a virtual thread.

The assumption, which is reasonable I think, is that in the vast majority of real workloads, most thread yields will not be timed sleeps and most submissions will be from virtual threads. We also assume that the scheduler’s worker-pool is sized (which you can control, even dynamically) according to the expected load, so that all workers are relatively busy most of the time.

You, therefore, cannot extrapolate anything about latencies from this benchmark to what are assumed to be common realistic workloads.

We would, therefore like to know if you’ve encountered an issue in a real workload, and if so, what that real workload looks like.

— Ron

> On 24 Feb 2025, at 02:10, 唐佳未(佳未) <tjw378335 at alibaba-inc.com> wrote:
> 
> All tasks are submitted by the platform thread (main). 
> ```
> ThreadFactory factory = Thread.ofVirtual().factory();
> ExecutorService es = Executors.newThreadPerTaskExecutor(factory);
> for(int i = 0 ; i < 5000; i++) {
>         System.out.println("execute: " + i);
>         es.execute(new Task(i));
> 
> }
>             es.shutdown();
> try {
>         while(!es.awaitTermination(10, TimeUnit.SECONDS)) {
>             System.out.println("still waiting...");
>         }
> } catch (Exception e) {
>         e.printStackTrace();
> }
> 
> class Task implements Runnable {
>         long start, run_start, end;
>         int num;
>         Task(int i) {
>             start = System.nanoTime();
>             num = i;
>         }
>             @Override
>         public void run() {
>             run_start = System.nanoTime();
>             Integer[] largeInt = new Integer[100];
>             for(int j = 0 ; j < largeInt.length; j++) {
>                 largeInt[j] = j * 100;
>             }
>             try {
>                 Thread.sleep(100); // yield this vthread
>             } catch (Exception e) {
>                 e.printStackTrace();
>             }
>             int sum = 0;
>             for(int j = 0 ; j < largeInt.length; j++) {
>                 sum += largeInt[j];
>             }
>             end = System.nanoTime();
>             double passTime = (end - run_start)/1000000.0;
>             double waitRunTime = (run_start - start)/1000000.0;
>             double totalTime = (end - start)/1000000.0;
>             System.out.println(Thread.currentThread() + "\t" + num + "-calc-result:" + sum + "\trun(ms): " + passTime+ "\twait-run(ms): " + waitRunTime + "\ttotal(ms): " + totalTime);
> 
>             TestFJPParam.vtTimes[num] = passTime;
>             TestFJPParam.vtWaitTimes[num] = waitRunTime;
>             TestFJPParam.vtWholeTimes[num] = totalTime;
> 
>         }
>     }
> ```
> 
> "ThreadPoolExecutor and SynchronousQueue" I mentioned before is not used in test. It just a idea. If we emulate this approach to add worker threads to complete tasks, would it be possible to reduce latency? 
> 
> ------------------------------------------------------------------
> 发件人：Alan Bateman <alan.bateman at oracle.com>
> 发送时间：2025年2月20日(星期四) 18:27
> 收件人："唐佳未(佳未)"<tjw378335 at alibaba-inc.com>; "loom-dev"<loom-dev at openjdk.org>
> 主　题：Re: re：remove corePoolSize in ForkJoinPool<init> and hope to improve FJP's algorithm
> 
> On 20/02/2025 01:40, 唐佳未(佳未) wrote:
> :
> 
> java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 7, running = 0, steals = 3931, tasks = 0, submissions = 1426] java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 8, running = 0, steals = 5908, tasks = 0, submissions = 48] java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 8, running = 0, steals = 7731, tasks = 0, submissions = 236]
> java.util.concurrent.ForkJoinPool at 678ad349[Running, parallelism = 8, size = 8, active = 2, running = 0, steals = 10370, tasks = 0, submissions = 0]
> Thanks for the jcmd output. It shows that there are no queued tasks in the worker queues (tasks = 0) but many tasks are in the external submission queues. Tasks for virtual threads are pushed to an external submission queue when a virtual thread is initially started, unparked by a platform thread, unblocked by another thread exiting a monitor that the virtual thread was blocked on, or awoken after sleep/timed-park.
> 
> Your first mail speaks of a usage wth ThreadPoolExecutor and SynchronousQueue so I will guess there is some hand off from a platform thread to a virtual thread that would result in the task for the virtual thread being pushed to an external queue.
> 
> Can you tell us a bit about the "run function"? I can't tell from the mails so far if this function is mostly compute bound or whether these virtual threads are blocking regularly to allow carriers be released to do other work. One of the mails mentions "tasks switched out but I wasn't sure how to read that.  Even without this then you are correct that the scheduling is not fair.
> 
> -Alan
>