PM design question: Scopes

Wed Nov 22 21:45:51 UTC 2017

On Mon, Nov 20, 2017 at 10:17 AM, Brian Goetz <brian.goetz at oracle.com>
wrote:

>
> We had a long meeting regarding scoping and shadowing of pattern
> variables.  We ended up in a good place, and we were all a bit surprised at
> where it seems to be pointing.
>
> We started with two use cases that we thought were important:
>
> Re-use of binding variables:
>
>     switch (x) {
>         case Foo(var a): ...  break;
>         case Bar(var a): ...
>     }
>
> Short-circuiting tests:
>
>     if (!(x matches Foo(var a))
>         throw new NotFooException();
>     // use a here
>
> We had a few nice-to-haves:
>  - that binding variables should be ordinary variables, not something new;
>  - that binding, when assigned, be final
>

I would advocate *against* binding being implicitly final, for two reasons:

   1. Existing code employing the compiler tree API might need to be
   updated to account for the new way that a variable can be implicitly
   final.  For a related example, I recently submitted a Pull Request to
   Apache NetBeans (https://github.com/apache/incubator-netbeans/pull/258 )
   to handle the fact that fields declared on an interface are implicitly
   static, so that the built-in "unused variable assignment" hint would not
   issue a spurious warning in certain cases involving field definitions on
   interfaces.
   2. It would be more difficult to later change the Java language to allow
   bound variables to be re-assigned (by dropping the "implicitly final"
   effect), as existing code employing the compiler tree API might hard-code
   an assumption that a bound variable is final.  In the same PR, for example,
   I have hard coded an assumption that interface fields are always
static (if node.getKind()
   == Kind.INTERFACE then the field is always added to staticInitializers);
   in fact, the reason why I added the clunky assertion before
   staticInitializers.add(member); is to test this assumption, so that if
   for some reason an interface field is not actually static, then the
   assertion will fail and the code can be updated.  As another
example, in my SLF4J
   Helper <http://plugins.netbeans.org/plugin/72557> plugin for NetBeans, I
   hard code the assumption that the exception parameter of a multi-catch
   block is implicitly final by short-circuiting in code that attempts to
   statically determine whether or not a variable will be null at runtime;
   if the variable is the exception parameter of a multi-catch block, then my
   code indicates that the variable will not be null at runtime because it
   assumes that the variable cannot be overwritten.

> Where we expected to land was something like:
>  - binding variables are treated as blank finals
>  - binding variables are hoisted into a synthetic block, which starts
> right before the statement containing the expression defining the binding
>  - it is permitted for locals to shadow other locals that are DU at the
> point of shadowing.  (This, as a bonus, would rescue the existing
> unfortunate scoping of local variables defined in switch blocks.)
>
> We thought this was a sensible place to land because it built on the
> existing notion of scoping and local variables.  The remaining question, it
> seemed, was: "where does this synthetic scope end."
>
> First, a note about where the scope starts.  Consider:
>
>     if (e1 && x matches Foo(var a)) {
>         ...
>     }
>
> Logically, we'd like to start the scope for `a` right where it is first
> declared; this is how locals work.  But, if we want to maintain the
> existing concept of local variable scope, it has to start earlier.  The
> latest candidate is right before the if starts; we act as if there is an
> invisible { ... } containing the entirety of the if statement, and declare
> `a` there.
>
> This means, though, that the scope of `a` includes `e1`, even though `a`
> is declared later.  This is confusing, but maybe we can ignore this, and
> provide a clear diagnostic if the user stumbles across it.
>
> So, where does the scope end?  The obvious candidate is right after the if
> statement.  This means `a` is in scope for the entire if-else, but, because
> it is DU in the else-blocks, can be reused if we adopt the "shadowing OK if
> DU" rule.
>
> FWIW, the "shadowing ok if DU" rule is clever, and gives us the behavior
> we want for switch / if-else chains with patterns, but has some collateral
> damage.  For example, the following would become valid code:
>
>     int x;  // declared but never used
>     float x = 1.0f;  // acceptable shadowing of int x
>

Local variable shadowing would be very nice, in my opinion, especially if
it was possible to make a variable final "from then on":

int x;
// x is initialized somehow
final int x = x;

Currently, one common way to approach this type of scenario is to have a
declaration of final int x before an anonymous block that sets up an "xTemp"
variable, and ends with initializing x to xTemp.

>
> Again, maybe we can ignore this.  But where things really blew up was
> attempting to handle the short-circuiting if case:
>
>     if (!(x matches Foo(var a))
>         throw new NotFooException();
>     // use a here
>
> For this to work, we'd have to extend the scope to the end of the block
> containing the if statement.  Now, given our "shadowing is OK if DU rule",
> this is fine, right?  Not so fast.  In this simpler case:
>
>     if (x matches Foo(var b)) { }
>     // try to reuse b here, I dare you
>
> we find that
>  - B is neither DU nor DA after the if, so we can't shadow it;
>  - B is final and not DU, so we can't write to it;
>  - B is not DA, so we can't use it.
>
> In other words, B is a permanent toxic waste zone, we can neither use, nor
> redeclare, nor assign it.  Urk.
>
> Note too that our scoping rule is not really about unbalanced ifs; it's
> about abrupt completion.  This is reasonable too:
>
>     if (x matches Foo(var a)) {
>         println("Matched!");
>     }
>     else
>         throw new NotFooException();
>     // reasonable to use a here too!
>
> Taking stock: our goal here was to try and use normal scopes and blank
> final semantics to describe binding variables, out of a desire to not
> introduce new concepts.  But it's a bad fit; the scope may be unnaturally
> large on the beginning side, and wherever we set the end of the scope, we
> end up in a choice of bad situations (either something we want in scope is
> not, or something we don't want in scope is.)  So traditional scopes are
> just a bad approximation, and what we gain in "reusing familiar concepts",
> we lose in the mismatch.
>
>
> STEPPING BACK
>
> What we realized at this point is that the essence of binding variables is
> their _conditionality_.  There is not a single logical old-style scope that
> describes the right set of places for a binding to be in scope, but there
> is a well-defined control-flow analysis that tells us exactly where we can
> use the binding, and where we can't.  This is the flow-scoping construct we
> initially worried was too "new and different."  But, after some further
> thought, and a few tweaks, this seems exactly what we want, and I think can
> be made understandable.
>
> The basic idea behind flow-scoping is: a binding variable is in scope
> where it is well-defined, and not in scope when it is not. We'll provide a
> complete calculus, but the key thing to understand is that the rules of
> flow scoping are just plain old DA/DU; if a binding is DA, then it is
> well-defined.
>
> In particular, flow-scoping can handle abrupt termination naturally; for a
> statement:
>
>     if (x matches Foo(var a)) { A }
>     else { B }
>     C
>
> the scope of `a` includes A, and also includes C iff B completes
> abruptly.  We can easily explain this as:
>  - if x matches Foo(var a), we execute the A block, and in this case `a`
> is clearly well-defined (as we'd not execute A if the match failed);
>  - The only way to reach C, if B completes abruptly, is if the match
> succeeds, so `a` is well defined during C in this case too.
>
> Because the scope of a binding variable is precisely the cases in which it
> is well defined, there is no need to tinker with shadowing.
>
> Conditional variables can now always be final, because they will never be
> in scope and not DA.
>
> Similarly, folding reachability into scoping for conditional variables
> also means that fallthrough has a well-defined meaning. If we have:
>
>     case Foo(int x): ... break;
>     case Bar(int x): ....
>
> then the Bar case is not reachable from where x would be initialized, so
> the first x is not in scope when the second x is declared, and everything
> is great.  On the other hand:
>
>     case Foo(int x): ... no break ...
>     case Bar(int x): ... A ...
>
> now x is well-defined in A, no matter how we got there.  (The merging of
> the two xs is the same merging we have to do anyway for "if (x matches
> Foo(int a) || x matches Bar(int a)".)
>
>
> People had originally expressed concern that flow-scoping leaves a scope
> "with holes", and allows puzzlers with shadowing of fields. (This is the
> "swiss cheese" problem.) For example:
>
>     // Field
>     String s
>
>     if (!(x matches String s)) {
>         a(s);
>     }
>     else {
>         b(s);
>     }
>
> This would be confusing because the `s` passed to a() is the field, but
> the `s` passed to b() is the binding.  But, there's a really simple way to
> prevent this: do not allow conditional variables to shadow fields or
> locals.  Now, there is no chance of this confusion, and this is not a big
> constraint, because the names of conditional variables are strictly local.
> (Further, we can disallow shadowing of in-scope conditional variables by
> locals (or other conditional variables.))
>

Excellent idea, in my opinion.

>
>
> Scorecard:
>  - Relatively straightforward to spec, as we have a clean calculus for
> flow-scoped conditional variables;
>  - Relatively straightforward to implement (our prototype already does
> this);
>  - One new concept: conditional variables;
>  - Conditional vars are scope where they make sense, and not in scope
> where they do not, cannot be assigned to (always DA and final when in
> scope), and are never in scope when not DA;
>  - No changes to shadowing;
>  - Meets all the target use cases.
>
>

I like the idea of conditional variables and "where well-defined" scoping.

Informally, this is my understanding:  a bound variable would be in scope
at a point if the work of testing the pattern match was guaranteed to have
been performed and to have succeeded by that point.  For example:

   - In something like if (e0 && (e1 matches Foo f || e2)) { A }, f
would not be
   well-defined within e0, e2, or A (assuming e2 is not a matches expression).
   To make f available in A, then e2 would have to be a matches SomeType f
   expression.
   - In something like if (e0 && (e1 matches Foo f && e2)) { A }, f would
   not be well-defined within e0, but would be well-defined within e2 and A.
   - In something like if (e0 || (e1 matches Foo f && e2)) { A }, f would
   not be well-defined within e0 or A (assuming e0 is not a matches
   expression), but would be well-defined within e2.
   - In something like if (!(e1 matches Foo f) || e2) { A }, f would not be
   well-defined within A (regardless of e2), but would be well-defined
   within e2.

Is my understanding correct in these four examples?

It is the last example that I was specifically thinking about for the
aforementioned Pull Request; I would love to be able to write the assertion
on one line as:

assert !(info.getTrees().getElement(TreePath.getPath(getCurrentPath(),
member)) matches Element e) || e.getModifiers().contains(Modifier.STATIC);

instead of:

{
    final Element e;
    assert (e =
info.getTrees().getElement(TreePath.getPath(getCurrentPath(), member))) ==
null || e.getModifiers().contains(Modifier.STATIC);
}