Scoped values
Andrew Haley
aph at redhat.com
Mon Sep 16 18:05:48 UTC 2019
Here's a proof-of-concept implementation of scoped values to play
with.
Download:
The patch and the benchmark are in http://cr.openjdk.java.net/~aph/scoped/
The API is very simple. Create a Scoped object with, e.g.
static final Scoped<> myScoped1 = Scoped.forType(Integer.class);
Bind your scoped object to a a scope with e.g.
myVar.bind(33, () -> {
System.out.println(myVar.get());
}
... which should print 33. The lifetime of the binding is the lifetime
of the enclosed scope, and if you try to get() a value outside its
scope you'll get an exception. There is no put(): you can re-bind a
value if you want, but that creates a new binding scope which hides
the old value. That value will become visible again when the inner
scope ends.
And that is, more or less, the extent of the API.
Here's a full example:
public class Hello
{
static final Scoped<String> myScopedValue = Scoped.forType(String.class);
public static void main(String[] args) {
myScopedValue.bind("Hello, World!", () -> {
hello();
});
}
static void hello() {
System.out.println(myScopedValue.get());
}
}
How it works:
When a value is bound, an instance of ScopedBinding is created. The
bind() method adds the bound value to a ScopedMap, which is a hash
table which maps from Scoped<T> to T. This table expands as required.
[ In this PoC implementation the ScopedMap is local to a Thread, but I
guess that in Project Loom each Continuation will maintain a ScopedMap
of the bindings which exist in its stack. When a hierarchy of
continuations is mounted on the native stack, we can scan them one by
one. There are other ways to do the search: we could even walk the
stack, or we could maintain a tree of bindings. Which is most
performant depends on the relative frequency of bind() to get(). ]
Also, for performance reasons, there is a small thread-local cache of
bindings. Successful lookups in this cache are considerably faster
than searching ScopedMaps, and the cost of maintaining it is
slight. (How fast is the cache? It depends. If everything gets
inlined, the code is small enough, and values are repeatedly accessed,
C2 will hoist cache entries into *registers*.)
If we're rarely using a scoped value the cost of maintaining
this cache will be very small, but the performance increase when we
frequently use a value will be great. As with all caches, if your
working set of values is large enough to thrash the cache then
performance will suffer.
The performance advantage of scoped values over ThreadLocals varies
from around a factor of 2 to billions: C2 can analyse scoped accesses
much more readily than ThreadLocals. With C1 and the interpreter the
speedup is less dramatic but still substantial. There is scope for yet
more optimization, but I had to stop somewhere.
Benchmark results, "SC" for Scoped, "TL" for ThreadLocal:
(C2 compiler)
Benchmark Mode Cnt Score Error Units
ThreadLocalTest.counterSC avgt 3 7.742 ± 0.033 ns/op
ThreadLocalTest.counterTL avgt 3 13562023.214 ± 357446.858 ns/op
ThreadLocalTest.getSC avgt 3 5.447 ± 0.010 ns/op
ThreadLocalTest.getTL avgt 3 10.894 ± 0.043 ns/op
ThreadLocalTest.summationSC avgt 3 289371.352 ± 10586.007 ns/op
ThreadLocalTest.summationTL avgt 3 5936803.887 ± 585718.637 ns/op
ThreadLocalTest.thousandGetsMultiSC avgt 3 3184.632 ± 152.276 ns/op
ThreadLocalTest.thousandGetsMultiTL avgt 3 8023.575 ± 147.210 ns/op
ThreadLocalTest.thousandGetsSC avgt 3 3497.657 ± 152.637 ns/op
ThreadLocalTest.thousandGetsTL avgt 3 8515.073 ± 1380.751 ns/op
(C1 compiler only)
Benchmark Mode Cnt Score Error Units
ThreadLocalTest.counterSC avgt 3 2729177.283 ± 139922.430 ns/op
ThreadLocalTest.counterTL avgt 3 27265732.333 ± 164437.245 ns/op
ThreadLocalTest.getSC avgt 3 6.310 ± 0.003 ns/op
ThreadLocalTest.getTL avgt 3 18.065 ± 0.022 ns/op
ThreadLocalTest.summationSC avgt 3 2737301.824 ± 92434.595 ns/op
ThreadLocalTest.summationTL avgt 3 12904368.400 ± 19691.899 ns/op
ThreadLocalTest.thousandGetsMultiSC avgt 3 9537.571 ± 28.315 ns/op
ThreadLocalTest.thousandGetsMultiTL avgt 3 16085.950 ± 131.427 ns/op
ThreadLocalTest.thousandGetsSC avgt 3 8563.665 ± 904.076 ns/op
ThreadLocalTest.thousandGetsTL avgt 3 16384.927 ± 22.803 ns/op
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the loom-dev
mailing list