PM design question: Scopes

Mon Nov 20 18:17:56 UTC 2017

We had a long meeting regarding scoping and shadowing of pattern 
variables.  We ended up in a good place, and we were all a bit surprised 
at where it seems to be pointing.

We started with two use cases that we thought were important:

Re-use of binding variables:

     switch (x) {
         case Foo(var a): ...  break;
         case Bar(var a): ...
     }

Short-circuiting tests:

     if (!(x matches Foo(var a))
         throw new NotFooException();
     // use a here

We had a few nice-to-haves:
  - that binding variables should be ordinary variables, not something new;
  - that binding, when assigned, be final

Where we expected to land was something like:
  - binding variables are treated as blank finals
  - binding variables are hoisted into a synthetic block, which starts 
right before the statement containing the expression defining the binding
  - it is permitted for locals to shadow other locals that are DU at the 
point of shadowing.  (This, as a bonus, would rescue the existing 
unfortunate scoping of local variables defined in switch blocks.)

We thought this was a sensible place to land because it built on the 
existing notion of scoping and local variables.  The remaining question, 
it seemed, was: "where does this synthetic scope end."

First, a note about where the scope starts.  Consider:

     if (e1 && x matches Foo(var a)) {
         ...
     }

Logically, we'd like to start the scope for `a` right where it is first 
declared; this is how locals work.  But, if we want to maintain the 
existing concept of local variable scope, it has to start earlier.  The 
latest candidate is right before the if starts; we act as if there is an 
invisible { ... } containing the entirety of the if statement, and 
declare `a` there.

This means, though, that the scope of `a` includes `e1`, even though `a` 
is declared later.  This is confusing, but maybe we can ignore this, and 
provide a clear diagnostic if the user stumbles across it.

So, where does the scope end?  The obvious candidate is right after the 
if statement.  This means `a` is in scope for the entire if-else, but, 
because it is DU in the else-blocks, can be reused if we adopt the 
"shadowing OK if DU" rule.

FWIW, the "shadowing ok if DU" rule is clever, and gives us the behavior 
we want for switch / if-else chains with patterns, but has some 
collateral damage.  For example, the following would become valid code:

     int x;  // declared but never used
     float x = 1.0f;  // acceptable shadowing of int x

Again, maybe we can ignore this.  But where things really blew up was 
attempting to handle the short-circuiting if case:

     if (!(x matches Foo(var a))
         throw new NotFooException();
     // use a here

For this to work, we'd have to extend the scope to the end of the block 
containing the if statement.  Now, given our "shadowing is OK if DU 
rule", this is fine, right?  Not so fast.  In this simpler case:

     if (x matches Foo(var b)) { }
     // try to reuse b here, I dare you

we find that
  - B is neither DU nor DA after the if, so we can't shadow it;
  - B is final and not DU, so we can't write to it;
  - B is not DA, so we can't use it.

In other words, B is a permanent toxic waste zone, we can neither use, 
nor redeclare, nor assign it.  Urk.

Note too that our scoping rule is not really about unbalanced ifs; it's 
about abrupt completion.  This is reasonable too:

     if (x matches Foo(var a)) {
         println("Matched!");
     }
     else
         throw new NotFooException();
     // reasonable to use a here too!

Taking stock: our goal here was to try and use normal scopes and blank 
final semantics to describe binding variables, out of a desire to not 
introduce new concepts.  But it's a bad fit; the scope may be 
unnaturally large on the beginning side, and wherever we set the end of 
the scope, we end up in a choice of bad situations (either something we 
want in scope is not, or something we don't want in scope is.)  So 
traditional scopes are just a bad approximation, and what we gain in 
"reusing familiar concepts", we lose in the mismatch.

STEPPING BACK

What we realized at this point is that the essence of binding variables 
is their _conditionality_.  There is not a single logical old-style 
scope that describes the right set of places for a binding to be in 
scope, but there is a well-defined control-flow analysis that tells us 
exactly where we can use the binding, and where we can't.  This is the 
flow-scoping construct we initially worried was too "new and 
different."  But, after some further thought, and a few tweaks, this 
seems exactly what we want, and I think can be made understandable.

The basic idea behind flow-scoping is: a binding variable is in scope 
where it is well-defined, and not in scope when it is not. We'll provide 
a complete calculus, but the key thing to understand is that the rules 
of flow scoping are just plain old DA/DU; if a binding is DA, then it is 
well-defined.

In particular, flow-scoping can handle abrupt termination naturally; for 
a statement:

     if (x matches Foo(var a)) { A }
     else { B }
     C

the scope of `a` includes A, and also includes C iff B completes 
abruptly.  We can easily explain this as:
  - if x matches Foo(var a), we execute the A block, and in this case 
`a` is clearly well-defined (as we'd not execute A if the match failed);
  - The only way to reach C, if B completes abruptly, is if the match 
succeeds, so `a` is well defined during C in this case too.

Because the scope of a binding variable is precisely the cases in which 
it is well defined, there is no need to tinker with shadowing.

Conditional variables can now always be final, because they will never 
be in scope and not DA.

Similarly, folding reachability into scoping for conditional variables 
also means that fallthrough has a well-defined meaning. If we have:

     case Foo(int x): ... break;
     case Bar(int x): ....

then the Bar case is not reachable from where x would be initialized, so 
the first x is not in scope when the second x is declared, and 
everything is great.  On the other hand:

     case Foo(int x): ... no break ...
     case Bar(int x): ... A ...

now x is well-defined in A, no matter how we got there.  (The merging of 
the two xs is the same merging we have to do anyway for "if (x matches 
Foo(int a) || x matches Bar(int a)".)

People had originally expressed concern that flow-scoping leaves a scope 
"with holes", and allows puzzlers with shadowing of fields. (This is the 
"swiss cheese" problem.) For example:

     // Field
     String s

     if (!(x matches String s)) {
         a(s);
     }
     else {
         b(s);
     }

This would be confusing because the `s` passed to a() is the field, but 
the `s` passed to b() is the binding.  But, there's a really simple way 
to prevent this: do not allow conditional variables to shadow fields or 
locals.  Now, there is no chance of this confusion, and this is not a 
big constraint, because the names of conditional variables are strictly 
local.  (Further, we can disallow shadowing of in-scope conditional 
variables by locals (or other conditional variables.))

Scorecard:
  - Relatively straightforward to spec, as we have a clean calculus for 
flow-scoped conditional variables;
  - Relatively straightforward to implement (our prototype already does 
this);
  - One new concept: conditional variables;
  - Conditional vars are scope where they make sense, and not in scope 
where they do not, cannot be assigned to (always DA and final when in 
scope), and are never in scope when not DA;
  - No changes to shadowing;
  - Meets all the target use cases.

On 11/3/2017 6:44 AM, Gavin Bierman wrote:
>
>
>     Scopes
>
> Java has five constructs that introduce fresh variables into scope: 
> the local variable declaration statement, the for statement, the 
> try-with-resources statement, the catch block, and lambda expressions. 
> The first, local variable declaration statements, introduce variables 
> that are in scope for the rest of the block that it is declared in. 
> The others introduce variables that are limited in their scope.
>
> The addition of pattern matching brings a new expression, |matches|, 
> and extends the |switch| statement. Both these constructs can now 
> introduce fresh (and, if the pattern match succeeds, definitely 
> assigned (DA)) variables. But the question is /what is the scope of 
> these ‘pattern’ variables/?
>
> Let us consider the pattern matching constructs in turn. First the 
> |switch| statement:
>
> |switch (o) { case int i: ... case .. }|
>
> What is the scope of the pattern variable |i|? There are a range of 
> options.
>
> 1.
>
>     The scope of the pattern variable is from the start of the switch
>     statement until the end of the enclosing block.
>
>     In this case the pattern variable is in scope but would be
>     definitely unassigned (DU) immediately after the switch statement.
>
>     |switch (o) { case int i : ... // DA ... // DA case T t : // i is
>     in scope } ... // i in still in scope and DU|
>
>   * *+ve* Simple
>   * *-ve* Can’t simply reuse a pattern variable in the same switch
>     statement (without some form of shadowing)
>   * *-ve* Pattern variable poisons the rest of the block
>
> 2.
>
>     The scope of the pattern variable extends only to the end of the
>     switch block.
>
>     In this case the pattern variable would be considered DA only for
>     the statements between the current case label and the subsequent
>     case labeled statement. For example:
>
>     |switch (o) { case int i : ... // DA ... // DA case T t : // i is
>     in scope but not DA } ... // i not in scope|
>
>   * *+ve* Simple
>   * *+ve* Pattern variables not poisoned in subsequent statements in
>     the rest of the block
>   * *+ve* Similar technique to |for| identifiers (not a new idea)
>   * *-ve* Can’t simply reuse a pattern variable in the same switch
>     statement (without some form of shadowing)
>
> 3.
>
>     The scope of the pattern variable extends only to the next case label.
>
>     |switch (o) { case int i : ... // in scope and DA ... // in scope
>     and DA case T i : // int i not in scope, so can re-use } ... // i
>     not in scope|
>
>   * *+ve* Simple syntactic rule
>   * *+ve* Allows reuse of pattern variable in the same switch statement.
>   * *-ve* Doesn’t make sense for fallthrough
>
> *NOTE* This final point is important - supporting fallthrough impacts 
> on what solution we might choose for scoping of pattern variables. (We 
> could not support fallthrough and instead support OR patterns - a 
> further design dimension.)
>
> *ASIDE* Should we support a |switch| /expression/; it seems clear that 
> scoping should be treated in the same way as it is for lambda expressions.
>
> The |matches| expression is unusual in that it is an /expression/ that 
> introduces a fresh variable. What is the scope of this variable? We 
> want it to be more than the expression itself, as we want the 
> following example code to be correct:
>
> |if (e matches String s) { System.out.println("It's a string - " + s); }|
>
> In other words, the variable introduced by the pattern needs to be in 
> scope for an enclosing IfThen statement.
>
> However, a |match| expression could be nested within another 
> expression. It seems reasonable that the patterns variables are in 
> scope for at least the rest of the expression. For example:
>
> |(e matches String s || s.length() > 0) |
>
> Here the |s| should be in scope for the subexpression 
> |s.length| (although it is not DA). In contrast:
>
> |(e matches String s && s.length() > 0)|
>
> Here the |s| is both in scope and DA for the subexpression |s.length|.
>
> However, what about the following:
>
> |if (s.length() > 0 && e matches String s) { System.out.println(s); }|
>
> Given the idea that a pattern variable flows from the inside-out to 
> the enclosing statement, it would appear that |s| is in scope for the 
> subexpression |s.length|; although it is not DA. Unless we want scopes 
> to be non-contiguous, we will have to accept this rather odd situation 
> (consider where |s| shadows a field). [This appears to be what happens 
> in the current C# compiler.]
>
> Now let’s consider how far a pattern variable flows wrt its enclosing 
> statement. We have a range of options:
>
> 1.
>
>     The scope is both the statement that the match expression occurs
>     in and the rest of the block. In this scenario,
>
>     |if (o matches T t) { ... } else { ... }|
>
>     is treated as equivalent to the following pseudo-code (where
>     |match-and-bind| is a fictional pattern matching construct that
>     pattern-matches and binds to a variable that has already been
>     declared)
>
>     |T t; if (o match-and-bind t) { // t in scope and DA } else { // t
>     in scope and DU } // t in scope and DU|
>
>     This is how the current C# compiler works (although the spec
>     describes the next option; so perhaps this is a bug).
>
> 2.
>
>     The scope is just the statement that the match expression occurs
>     in. In this scenario,
>
>     |if (o matches T t) { ... } else { } ...|
>
>     is treated as equivalent to the pseudo-code
>
>     |{ T t; if (o match-and-bind t) { // t in scope and DA } else { //
>     t in scope and DU // thus declaration int t = 42; is not allowed.
>     } } // t not in scope ...|
>
> This restricted scope allows reuse of pattern variables, e.g.
>
> |if (o matches T x) { ... } if (o matches S x) { ... }|
>
> 3.
>
>     The scope of the pattern variable is determined by a flow analysis
>     of the enclosing statement. (It could be thought of as a
>     refinement of option b.) This is currently implemented in the
>     prototype compiler. For example:
>
>     |if (!!(o matches T t)) { // t in scope } else { // t not in scope }|
>
>   * *+ve* Code will work in the presence of most refactorings
>   * *+ve* We have this code working already :-)
>   * *-ve* This is a break to the existant notion of scope as a
>     contiguous program fragment. A scope can now have holes in it.
>     Will users ever understand this? (Although they are /very/ similar
>     to the flow-based rules for DA/DU.)
>
> *ASIDE* Regardless of whether we opt for (b) or (c) we may consider a 
> further extension where we allow the scope to extend beyond the 
> current statement for the case of an unbalanced |if| statement. For 
> example
>
> |``` if (!(o matches T t)) { return; } // t in scope ... return; ```|
>
>   * *+ve* Supports a common idiom where else blocks are not needed
>   * *-ve* Yet further complication of notion of scope.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20171120/15920401/attachment-0001.html>