effectiveness of jdk.virtualThreadScheduler.maxPoolSize

Mon Jan 9 12:04:30 UTC 2023

I doubt it will suddenly convince anyone, but I wrote a minimal example to illustrate the concrete problem.
With the classic thread executor, the normal requests (fastIORequest) complete quickly even during concurrent CPU requests (slowCPURequest), while using the default Loom executor they are stuck for some time.

Arnaud

package org.example;

import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class MixedServer {

   static final Instant t0 = Instant.now();

   static long slowCPURequest(String name, Instant posted) throws Exception {
      long t = 0;
      var n = 1e9;

      // slow CPU-only stuff: e.g. image codec, XML manip, collection iteration, ...
      for (int i = 0; i < n; i++) {
         t = t + 2;
      }

      var completed = Instant.now();
      System.out.println(Duration.between(t0, Instant.now()) + " - " + name
            +  " completed (duration: " + Duration.between(posted, completed) + ")");
      return t;
   }

   static Object fastIORequest(Instant posted) throws Exception {
      // IO stuff, e.g. JDBC call
      // (but sleep would have the same effect for this test, i.e. block the thread)
      Thread.sleep(2);

      var completed = Instant.now();
      System.out.println(Duration.between(t0, Instant.now()) + " - fast IO task"
            +  " completed (duration: " + Duration.between(posted, completed) + ")");
      return null;
   }

   public static void main(String[] args) throws Exception {
      var executor =
            Executors.newVirtualThreadPerTaskExecutor(); // unfair
            //Executors.newCachedThreadPool(); // fair

      var timer = new Timer();
      timer.schedule(new TimerTask() {
         @Override
         public void run() {
            var posted = Instant.now();
            executor.submit(() -> fastIORequest(posted));
         }
      }, 100, 100);

      var futures = new ArrayList<Future>();
      for (int i = 0; i < 60; i++) {
         final var name = "Task_" + i;
         var posted = Instant.now();
         futures.add(executor.submit(() -> slowCPURequest(name, posted)));
      }

      for (Future f: futures)
         f.get();

      System.out.println("--- done ---");
      executor.shutdownNow();
      timer.cancel();
   }
}

Great, then one someone who runs CPU-heavy jobs that are appropriate for virtual threads presents a problem the’ve encountered that could be solved by some form of time-sharing, we’ll take a look. For the time being, we have a hypothesis that
ZjQcmQRYFpfptBannerStart
Great, then one someone who runs CPU-heavy jobs that are appropriate for virtual threads presents a problem the’ve encountered that could be solved by some form of time-sharing, we’ll take a look. For the time being, we have a hypothesis that one or more problems could occur, and a further hypothesis that some class of solutions might fix or mitigate them. Unfortunately, theses hypotheses are not presently actionable.

I don’t know what problem Go’s maintainers solve with time-sharing. Maybe they just want to solve Go’s lack of convenient access to a scheduler with time-sharing even in the case of background processing tasks — a problem that Java doesn’t have — or perhaps they’ve managed to identify common server workloads that could be helped by time-sharing, in which case we’d love to see those workloads so that we can make sure our fix works.

I am not at all resistant to adding time-sharing to the virtual thread scheduler. I am resistant to fixing bugs that have not been reported. I have said many times — and not just on the mailing lists — that the best and often only way to get a some change made to Java is to report a problem. You might have noticed that all JEPs start with a motivation section that attempts to clearly present a problem that’s encountered by Java’s users (and sometimes maintainers) and analyse its relative severity to justify the proposed solution. That is usually the JEP’s most important section (and the section we  typically spend most time writing) because the most important thing is to understand the problem you’re trying to tackle. Every change has a cost. A feature might add overhead that harms workloads that don’t benefit from it, and it certainly has an opportunity cost. Neither of these is disqualifying, but we simply cannot judge the pros and cons of doing something until we can weigh some problem against another, and we can’t even get started on this process until we have a problem in front of us.

We *have* been presented with a problem that some specific kind of time-sharing can solve (postponing a batch job that’s consuming resources to run at a much later time), and it is one of the reasons we’ve added custom schedulers that will be able to employ it to our roadmap. It is certainly possible that the lack of time-sharing causes problems that need addressing in the default (currently only) virtual-thread scheduler — I am not at all dismissing that possibility — but there’s not much we can do until those problems are actually reported to us, which would allow us to know more about when those problems occur, how frequently, and what kind of time-sharing can assist in which circumstances and to what degree. There’s no point in trying to convince us of something we are already convinced of, namely that the possibility exists that some virtual thread workloads could be significantly helped by time-sharing. In fact, I’ve mentioned that possibility on this mailing list a few times already.

If the problems are common, now that more people are giving virtual threads a try, I expect they will be reported by someone soon enough and we could start the process of tackling them.

— Ron

On 9 Jan 2023, at 08:48, Sam Pullara <spullara at gmail.com<mailto:spullara at gmail.com>> wrote:

Ron, I think you are being purposefully obtuse by not recognizing that some folks are going to run high CPU jobs in vthreads. The proof is with the folks using Go that already encountered it and fixed it.

On Mon, Jan 9, 2023 at 12:46 AM Arnaud Masson <arnaud.masson at fr.ibm.com<mailto:arnaud.masson at fr.ibm.com>> wrote:
Side note : it seems “more” preemptive time sharing was added for goroutines in Go 1.14 to avoid the kind of scheduling starvation we discussed:

https://medium.com/a-journey-with-go/go-asynchronous-preemption-b5194227371c<https://medium.com/a-journey-with-go/go-asynchronous-preemption-b5194227371c>

Thanks
Arnaud

Unless otherwise stated above:

Compagnie IBM France
Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 664 069 390,60 €
SIRET : 552 118 465 03644 - Code NAF 6203Z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230109/34d49198/attachment-0001.htm>