Performance of pooling virtual threads vs. semaphores

Liam Miller-Cushon cushon at google.com
Tue May 28 22:43:21 UTC 2024


Hello,

JEP 444 [1] and the docs in [2] mention that virtual threads should not be
pooled, and suggest semaphores as one possible alternative.

My colleague Chi Wang has been investigating virtual thread performance,
and has found that creating one virtual thread per task and synchronizing
on a semaphore can result in worse performance on machines with large
numbers of cores.

A benchmark run on a 128 core machine is included below. It submits
numTasks tasks to the executor determined by the strategy. The task itself
is mixed with CPU and I/O work (simulated by fibonacci and sleep). The
parallelism is set to 600 for all strategies.

* Strategy 1 is the baseline where it just submits all the tasks to a
ForkJoinPool whose pool size is 600.
* Strategy 2 uses the method suggested by JEP 444.
* Strategy 3 uses a fixed thread pool of 600 backed by virtual threads.

Note that, with 100K and 1M tasks, strategy 2 has a CPU time regression
that seems to increase with the number of cores. This result can be
reproduced on the real-world program that is being migrated to a
virtual-thread-per-task model.

Diffing the cpu profile between strategy 1 and strategy 2 showed that most
of the CPU regression comes from method
`java.util.concurrent.ForkJoinPool.scan(java.util.concurrent.ForkJoinPool$WorkQueue,
long, int)`.

Are there any ideas for why the semaphore strategy uses more CPU than
pooling virtual threads on machines with a large number of cores?

Thanks,
Liam

[1] https://openjdk.org/jeps/444#Do-not-pool-virtual-threads
[2]
https://docs.oracle.com/en/java/javase/22/core/virtual-threads.html#GUID-2BCFC2DD-7D84-4B0C-9222-97F9C7C6C521

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.Semaphore;

public class Main {

  private static Semaphore semaphore = null;

  public static void main(String[] args) {
    int strategy = 0;
    int parallelism = 600;
    int numTasks = 10000;

    if (args.length > 1) {
      strategy = Integer.parseInt(args[1]);
    }

    if (args.length > 2) {
      numTasks = Integer.parseInt(args[2]);
    }

    ExecutorService executor;
    switch (strategy) {
      case 1 -> {
        executor = new ForkJoinPool(parallelism);
      }
      case 2 -> {
        executor = Executors.newVirtualThreadPerTaskExecutor();
        semaphore = new Semaphore(parallelism);
      }
      case 3 -> {
        executor = Executors.newFixedThreadPool(parallelism,
Thread.ofVirtual().factory());
      }
      default -> {
        throw new IllegalArgumentException();
      }
    }

    try (executor) {
      for (var i = 0; i < numTasks; ++i) {
        executor.execute(Main::task);
      }
    }
  }

  private static void task() {
    if (semaphore != null) {
      try {
        semaphore.acquire();
      } catch (InterruptedException e) {
        throw new IllegalStateException();
      }
    }

    try {
      fibonacci(20);
      try {
        Thread.sleep(10);
      } catch (InterruptedException e) {
      }
      fibonacci(20);
      try {
        Thread.sleep(10);
      } catch (InterruptedException e) {
      }
      fibonacci(20);
    } finally {
      if (semaphore != null) {
        semaphore.release();
      }
    }
  }

  private static int fibonacci(int n) {
    if (n == 0) {
      return 0;
    } else if (n == 1) {
      return 1;
    } else {
      return fibonacci(n - 1) + fibonacci(n - 2);
    }
  }
}

# openjdk full version "23-ea+24-1995"
# AMD Ryzen Threadripper PRO 3995WX, hyperthreading enabled

$ hyperfine --parameter-scan strategy 1 3 --parameter-list numTasks
10000,100000,1000000 './jdk-23/bin/java Main -- {strategy} {numTasks}'

Benchmark 1: ./jdk-23/bin/java Main -- 1 10000
  Time (mean ± σ):     658.3 ms ±  24.4 ms    [User: 26808.8 ms, System:
493.7 ms]
  Range (min … max):   613.1 ms … 702.0 ms    10 runs

Benchmark 2: ./jdk-23/bin/java Main -- 2 10000
  Time (mean ± σ):     486.9 ms ±  28.5 ms    [User: 14804.4 ms, System:
501.5 ms]
  Range (min … max):   451.0 ms … 533.4 ms    10 runs

Benchmark 3: ./jdk-23/bin/java Main -- 3 10000
  Time (mean ± σ):     452.0 ms ±  10.8 ms    [User: 6598.1 ms, System:
335.8 ms]
  Range (min … max):   435.6 ms … 470.6 ms    10 runs

Benchmark 4: ./jdk-23/bin/java Main -- 1 100000
  Time (mean ± σ):      3.668 s ±  0.028 s    [User: 38.469 s, System:
1.188 s]
  Range (min … max):    3.628 s …  3.704 s    10 runs

Benchmark 5: ./jdk-23/bin/java Main -- 2 100000
  Time (mean ± σ):      3.612 s ±  0.042 s    [User: 65.924 s, System:
2.072 s]
  Range (min … max):    3.563 s …  3.687 s    10 runs

Benchmark 6: ./jdk-23/bin/java Main -- 3 100000
  Time (mean ± σ):      3.503 s ±  0.008 s    [User: 27.791 s, System:
1.211 s]
  Range (min … max):    3.492 s …  3.515 s    10 runs

Benchmark 7: ./jdk-23/bin/java Main -- 1 1000000
  Time (mean ± σ):     34.093 s ±  0.031 s    [User: 206.235 s, System:
14.313 s]
  Range (min … max):   34.015 s … 34.120 s    10 runs

Benchmark 8: ./jdk-23/bin/java Main -- 2 1000000
  Time (mean ± σ):     34.354 s ±  0.063 s    [User: 330.215 s, System:
17.501 s]
  Range (min … max):   34.267 s … 34.479 s    10 runs

Benchmark 9: ./jdk-23/bin/java Main -- 3 1000000
  Time (mean ± σ):     34.551 s ±  1.018 s    [User: 238.050 s, System:
10.258 s]
  Range (min … max):   34.124 s … 37.420 s    10 runs

Summary
  ./jdk-23/bin/java Main -- 3 10000 ran
    1.08 ± 0.07 times faster than ./jdk-23/bin/java Main -- 2 10000
    1.46 ± 0.06 times faster than ./jdk-23/bin/java Main -- 1 10000
    7.75 ± 0.19 times faster than ./jdk-23/bin/java Main -- 3 100000
    7.99 ± 0.21 times faster than ./jdk-23/bin/java Main -- 2 100000
    8.12 ± 0.20 times faster than ./jdk-23/bin/java Main -- 1 100000
   75.43 ± 1.80 times faster than ./jdk-23/bin/java Main -- 1 1000000
   76.01 ± 1.82 times faster than ./jdk-23/bin/java Main -- 2 1000000
   76.44 ± 2.90 times faster than ./jdk-23/bin/java Main -- 3 1000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240528/0b7af6fd/attachment.htm>


More information about the loom-dev mailing list