Custom Schedulers use-case

Fri Oct 17 23:13:14 UTC 2025

Thanks for the detailed response. We hope OpenJDK will commit to supporting
custom schedulers at some point. For "self deadlock" cases, the API could
provide some guidelines on what users should not do inside custom
schedulers.

Our colleague David Gay also provided more details on the multi-tenancy use
case:

Firestore is a cloud-based database, implemented with a multi-tenant (i.e.,
a single job serves many customers) architecture. Multi-tenancy allows us
to serve small-scale customers very cheaply, but brings isolation
challenges: : traffic to a single Firestore database can potentially affect
the performance and availability of other databases by consuming all or
most of the resources in one or more components. Thus providing isolation
between customers sending traffic to the same task in a job is critical.

Specifically for Java: Firestore backends are implemented in Java,
currently using a custom asynchronous programming framework which basically:
- provides all the usual Java control structures (try-catch, loops, etc)
- "automatic" suspension at (manually identified) blocking points
- scheduling of 'slices' of these asynchronous computations via a fair
scheduler (we're using a stride scheduler)

Replacing our custom asynchronous programming framework with virtual
threads is obviously highly desirable - much more readable and efficient
code (and I can stop getting confused by continuations), but the fair
scheduling of slices is an absolute requirement. We did experiments
comparing the performance impact of an antagonistic workload from customer
A on an 'innocent' workload from customer B:
- without fair scheduling, B sees two orders of magnitude worse latency
(p50 and p99)
- with fair scheduling, B sees essentially no p50 latency impact and
tolerable p99 impact
The 'without fair scheduling' measurements are effectively measuring how
the linux kernel schedules our threads - I would expect broadly similar
results from the default virtual thread scheduler as neither has any
information on which customer owns which traffic to appropriately
prioritise scheduling.

The above is partly summarised from
https://research.google/pubs/firestore-the-nosql-serverless-database-for-the-application-developer/
- specifically see:
- section IV.C for the overview of our isolation approach
- section V.C and Figure 11 for the isolation benchmark

-Man

On Fri, Oct 10, 2025 at 1:01 AM Alan Bateman <alan.bateman at oracle.com>
wrote:

> On 09/10/2025 22:11, Man Cao wrote:
> > Hi loom developers,
> >
> > Official support for custom schedulers is highly valuable to some of
> > our Java applications such as our colleague David Gay's use case.
> >
> > Are there any major concerns or obstacles to official support for
> > custom schedulers?
> >
>
> There are some workloads that are not suited to a work stealing
> scheduler. We've seen this with workloads that have low concurrency, not
> a lot going on, and the scanning to "find work" consuming additional CPU
> cycles that nobody wants to pay for. There may be merit in having the
> JDK provide a different scheduler for such cases, more experimentation
> is required.
>
> There are folks that want to do things like using the AWT event thread,
> or the JavaFX application thread, as the carrier. They've seen
> coroutines used on UI threads in other systems and want to experiment
> doing something similar. Early explorations into this did not go very far.
>
> There are other folks that are interested in thread affinity, binding
> virtual threads to specific carriers, and carriers to specific cores in
> NUMA nodes. Some of this exploration is about integration with existing
> systems that use event loops. We are looking forward to a write-up of
> these explorations and any findings.
>
> Beyond this there are folks doing fun things with simulation and other
> experimentation.
>
> I'm not familiar with David Gay's work except for Liam's mail to say
> that they are doing something in the area of multi-tenancy. If a
> write-up or a summary of the explorations and findings could be sent to
> loom-dev then it would be useful.
>
> To your question, the topic of custom schedulers is an
> exploration/research topic. The JDK has to be cautious. Calling out to a
> custom scheduler (= arbitrary code) from core/sensitive parts of the
> runtime is very scary. It's very easy to "self deadlock" - we've seen
> folks trying to use locks to coordinate between mounted virtual threads
> and their carrier. We are also concerned that the API surface for
> schedulers will grow.
>
> There are two prototypes in the loom repo at this time, this is what
> Liam linked to. We are hoping that folks that are interested in this
> topic will try one or both and come back their findings. The more data,
> esp. from real world usage, will help inform this project on whether
> there is merit is going further with either direction or whether there
> are other directions that might be more fruitful.
>
> -Alan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20251017/8ebc333c/attachment-0001.htm>