Scoped variables

Wed Dec 5 00:33:20 UTC 2018

On Dec 3, 2018, at 1:25 PM, dean.long at oracle.com wrote:
> 
> My prototype does not allow arbitrary methods to bind a frame/scope local.
> Instead, only one method is allowed to do that:
> 
> public class FrameLocal<T> {
> 
>     public void doEnclosed(Runnable r) { ... }
> 
> then inside the Runnable you can call set() to assign a value.  If we only wanted
> to allow an intial value, we could do somethine like:
> 
>     public void doEnclosed(T newValue, Runnable r) { ... }
> 
> For simplicity each variable needs to do its own doEnclosed call, but I can
> imagine also having a static method that takes a list of variables.
> 
> My prototype has both deep and shallow "binding".  The deep binding stack walk
> looks for special StackWalkCookieHolder.doEnclosed frames that
> FrameLocal.doEnclosed uses.

I like the thought here:  There are several kinds of "cookies" you might need
to look for in a stack walk. The JVM should unify them; the reward will be
simpler code and better return on investment in JIT optimizations.  Stack
based access control is today's such cookie.

> The shallow binding uses both a stack and a cache.
> The per-variable stack keeps track of the most recent binding in the current
> Continuation or Thread (so it is associated with a call stack).

That makes sense.  One thing to consider is using the defining stack
frame as a place to allocate storage for this stack (for extending it).
As a precedent for this, note that the result of a monitorenter instruction
is also allocated in the stack frame which issues the instruction, in
both the interpreter and compiler.  It's more straightforward at first
to put such things in a thread-linked side array, but eventually we
probably want to consolidate all the state into one block of stack,
where the JIT can manage it intensively.  This may lead (as I'm sure
you see) to display blocks floating inside of stack frames that define
them, with younger displays pointing (perhaps) to older displays in
ancestor frames.

> Currently the stack
> is kept when the continuation is unmounted.  The cache keeps track of the stack
> that has the most recent binding.  This is because the most recent binding could
> be in a parent continuation or parent thread.

Good that you are covering that case.  These things need to span all the
way up the callee-to-caller relation, not stop at continuation boundaries.

It also adds complexity:  If a parent frame returns, perhaps during concurrent
execution in a different fiber, but a child frame still needs access to a frame
local, the frame local binding must be preserved somehow for the child's use.
Suggestion:  This is one reason *constant* bindings are preferable to *variable*
bindings.  It's easier to "fork" a constant to distinct clients than share a link to
a variable, and there's no loss of generality.

> My prototype does not support a
> global binding as the parent to a thread binding.

That was Lisp's problem, because they started with global variables, then
made them "SPECIAL" by default, and only then discovered that lexical
scoping was preferable.  This means that every dynamically scoped variable
must (potentially) alias to a mutable global variable, if not bound by an
intervening frame.  Sure glad that's not us.

> Without fibers, continuations, or
> global bindings, then the most recent binding reduces to the thread binding, and
> there shouldn't be any need for a cache.

(The model can be thread-agnostic, right?  You just have to find the frame that
defined the local; doesn't matter who owns it or how.)

> The cache is a map and is reset every time the continuation is mounted/unmounted.

Yep.  We could also try to salvage the cache at unmount and reuse it at remount.
This would be pretty easy if the structure of the cache is (a) rooted in the fiber and
(b) allocated in the frames of the fiber.  Major events like JIT or deopt could reset the
cache by zeroing out the root variables in the fiber.

> The stack is looked up in a WeakHashMap, making it basically equivalent to a
> ThreadLocal.  If we had long-lived frame locals, then perhaps we could simply look
> them up in an array, but then why not use a long-lived thread local instead that
> could also be looked up in an array?

That smells like footprint to me.  There's a WHM per active fiber?  I suppose it's
OK if the WHM is discarded when the fiber unmounts.  (Or keep it in a weak
reference field on the fiber?)

> To reduce Continuation storage requirements, we could choose not to retain the
> stacks after an unmount, but instead lazily rebuild using stack walks.  Right now
> I'm only using the deep stack walk to verify the results of the cache + stack
> lookup.

Yep.  That's the sort of trade-off that faces us.  Another point about allocating
stacks inside of actual stack frames:  Reconstructing them simply requires
walking the stack and noticing where they were.

— John