ScopedValue performance declines as number of dynamic bindings increases

Wed Jun 25 11:39:12 UTC 2025

Hi Andrew,

Thank you for the extensive reply.

> However, we don't cache failure. One reason for this is that the scoped
> value cache is created lazily, and we don't want to create it if all a
> thread is doing is using a scoped value to detect recursion. Also, we
> don't want to kick a value out of the cache for the sake of a failure.
> We can review this decision if the negative case turns out to be worth
> caching.

Sounds good to me. I'd expect the most ergonomic way of providing a default value is to fail the lookup and use orElse to supply the default. Otherwise, supplying a default value would require you
to install it at the entry point of the program, before any thread creation or stacks have grown, and even that is a simplified view of the issue at hand. My bet is on the negative case being important to cache,
but we don't have to bet so let's await data instead.

> We want to be able to share a set of scoped values between parent and
> child threads with (near-)zero time and space overhead, so whatever
> structure contains them should be immutable. We want to be able to bind
> (and unbind) a scoped value in constant (hopefully very low) time, so
> rebinding can't, say, update a hash map. We want repeated lookups to be
> very fast.

Right, so was using ThreadLocal discarded as an idea because of the usage of a hashtable?

Otherwise, ScopedValue could be modelled as

class ScopedValue<V> {
  record SV(V value, SV next) {};
  ThreadLocal<SV> stack;
}

Of course, this is also discarding any caching. I guess the copying of ScopedValue bindings taking as little time as possible is important, because of virtual threads.

> Finally, we don't expect to see huge numbers of scoped values in use. I
> guess there may be some edge cases where it might be appropriate, but I
> don't think this will be a common pattern.

I'm not sure how it has been imagined that ScopedValues will be used. If ScopedValue becomes a popular feature, then what's interesting is what happens when you start using many libraries in your program.
Each library may have a few instances of ScopedValues, but an execution thread will have the same amount of ScopedValues as all of the libraries are using (in the worst case).
This leads to a peculiar situation where the performance of your program may degrade in an unexpected manner. Perhaps I only have a lively imagination.

> Finally, if you're aware of any application need for large numbers of
> scoped values, I'm interested to hear about it.

I do not. Fans of functional programming will however recognise that ScopedValue (dynamic binding for installation of handlers) and Continuation (one-shot delimited continuation to provide to the handler) forms the semantic basis for an algebraic effect system.
If the Continuation API is exposed publically (or coroutines for Java are implemented) in the future, then we'll probably see some heavy use of ScopedValues in some libraries and codebases. That might be a niche area of use, however.
Here's a paper on exactly what I'm talking about: https://www.logic.cs.tsukuba.ac.jp/~sat/pdf/tfp2020.pdf 

All the best,
Johan Sjölén

PS.
Thanks for the ScopedValueExecutorService example link
________________________________________
From: loom-dev <loom-dev-retn at openjdk.org> on behalf of Andrew Haley <aph-open at littlepinkcloud.com>
Sent: Wednesday, June 25, 2025 12:28
To: loom-dev at openjdk.org
Subject: Re: ScopedValue performance declines as number of dynamic bindings increases

On 24/06/2025 16:12, Johan Sjolen wrote:
> I've been working on implementing a feature and the ScopedValue preview feature was an excellent fit to a particular design problem I faced.
> Reading the source code, I noticed that it seems as though the linked list of value bindings are shared across all ScopedValue instances. This was a bit surprising to me, as this means that the performance of accessing
> a ScopedValue binding declines linearly with the number of bindings before it on the stack.

Only the first time you use it: thereafter, the value is cached, and if
you use it repeatedly in a method, hoisted into a register.

However, we don't cache failure. One reason for this is that the scoped
value cache is created lazily, and we don't want to create it if all a
thread is doing is using a scoped value to detect recursion. Also, we
don't want to kick a value out of the cache for the sake of a failure.
We can review this decision if the negative case turns out to be worth
caching.

Finally, we don't expect to see huge numbers of scoped values in use. I
guess there may be some edge cases where it might be appropriate, but I
don't think this will be a common pattern.

 > I'm wondering about the reasoning and trade-offs here, what do we get
  with our implementation that the Schemers do not? Maybe we should
consider changing our implementation?

We want to be able to share a set of scoped values between parent and
child threads with (near-)zero time and space overhead, so whatever
structure contains them should be immutable. We want to be able to bind
(and unbind) a scoped value in constant (hopefully very low) time, so
rebinding can't, say, update a hash map. We want repeated lookups to be
very fast.

We expect scoped values to be created as static finals, and much
optimization depends on that. We expect to see small numbers of scoped
values active at any time.

Finally, if you're aware of any application need for large numbers of
scoped values, I'm interested to hear about it.

--
Andrew Haley  (he/him)O
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671