Scope locals

Andrew Haley aph at redhat.com
Fri May 7 16:39:11 UTC 2021


On 5/5/21 11:54 PM, Paul Sandoz wrote:

> - I might be missing something subtle here, but I wonder if it may
> be possible to unify Carrier and Snapshot. AFAICT it is possible to
> pass an instance of Carrier around, and hand off much like an
> instance of Snapshot can be. What if ScopeLocal.snapshot() returns
> an instance of Carrier, that only refers to locals for inter-thread
> inheritance in the context of the current thread? Then there is no
> need to special case scoped execution with a snapshot. That would be
> pleasing collapse of the API to a lower energy state.

So I tried this one, and there are some problems, but it's still an
interesting idea.

The action of inheriting scope locals is different from the action of
running with a set of bindings in that those inherited from a parent replace
those in the child rather than supplementing them. I think that's what
we want, because we want code running in the child task to run as
though it were in the parent, without any additional inherited values
that are already bound in the child.

I might be mistaken about that. If we're happy for inheritance to be
treated in exactly the same way as adding more bindings, I think we
can unify Carrier and Snapshot. However, it might be rather fiddly,
which I'd rather avoid if possible.

There is some efficiency cost: adding a list of bindings to the
current thread requires an object to be created, whereas replacing
all of the inheritable bindings merely requires a pointer to be
updated (so no space is allocated). This matters because some uses of
inheritable scope locals are extremely performance sensitive. For
example, inheritance in a CountedCompleter looks something like this:

    private final ScopeLocal.Carrier snapshot = ScopeLocal.snapshot();

    protected final boolean exec() {
        if (snapshot != ScopeLocal.snapshot()) {
            snapshot.run(this::compute);
        } else {
            compute();
        }
        return false;
    }

There's a benchmark in the JSR 166 test suite that calculates a
Fibonacci number using an extremely inefficient (but parallel)
algorithm. This is used to measure the overhead of parallel
constructs, and is pretty standard in the parallel programming
literature.

On JDK head, fib(45) takes this long on 16 cores, time is in seconds:

CCFib 45 = 1134903170	Time:     0.965 Steals/t:   102 Workers:       15

On the Loom codebase, it's much the same:

CCFib 45 = 1134903170	Time:     1.093 Steals/t:   181 Workers:       15

But if you actually bind (but doesn't use) a scope local, so there is
some inheriting to be done:

CCFib 45 = 1134903170	Time:     3.472 Steals/t:   140 Workers:       15

Ouch. Simply inheriting scope locals takes 3.5 times as long, even if
those scope locals are never used! (True, the CCFib benchmark is
extreme in that each node in the computation does almost nothing --
just an addition -- but there are realistic graph algorithms that do
stuff almost like that.)

This slowdown is, I think, due to two things. Partly, inlining
decisions taken are very different due to the more complex code paths,
but also there's the cost of the Lambda in

            snapshot.run(this::compute);

and the saving and restoring of the scope local pointers. We can just
about stand this slowdown, I think, but it'd be nice to fix it. And I
certainly don't want to make it worse, which treating inheritance and
adding bindings the same would do.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



More information about the loom-dev mailing list