Pattern features for next iteration

Thu Jan 21 15:44:46 UTC 2021

----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Mardi 19 Janvier 2021 20:59:00
> Objet: Pattern features for next iteration

> We're zeroing in on a feature list for the next iteration of pattern
> matching, which includes:
> 
>  - New kinds of patterns
>    - Array patterns
>    - Record deconstruction patterns, including varargs records
>    - maybe: guard patterns (true(e), false(e))
>    - maybe: AND patterns (P & Q), syntax TBD
> 
>  - New place for patterns: switch
>    - Extend switch to support more types (all types?)
>    - Extend switch on non-legacy types to support patterns as case labels
>    - New nullity behavior (including case null)
>    - New scoping + fallthrough
>    - Refined exhaustiveness analysis
> 
> This document is an overview; the immediate goal is coming to consensus
> that this is the set of features we should be targeting next.  (Please,
> no deep-dives into details of, say, translation until this is locked down.)
> 

For me: extending switch to support any types + type pattern + case null seems a reasonable goal.

About retconing switch, you talk about old switch vs new switch,
i think there is another possible decomposition, old "case" vs new pattern, i.e. "case constant" can be retrofitted as a pattern.

So a switch that mix type patterns and case can be written that way:
  switch(o) {
    case "foo" ->    // equivalent to Object$$equals("foo") or type String && String$$equals("foo")
    type String s -> 
    type Integer i ->
  } 

I've used "type" as keyword, it can be any other keyword ("instanceof" ?) or no keyword at all.

I believe there is more freedom in term of grammar even if there is no keyword and no false dichotomy between old vs new. 

> 
> #### Array patterns
> 
> An array pattern has the form
>     T[] { CommaSeparatedListOfPatternsWithOptionalTrailingDots }
> OptionalIdentifier
> 
> The target type of an array pattern is T[], and it matches if:
>  - the target is non-null
>  - the target is an instance of T[]
>  - the target has has an length equal to the arity of the nested
> pattern list, or, if the nested pattern list ends with a `...`, length
> greater than the arity of the nested pattern list
>  - for each nested pattern P_i, the i'th element of the list matches P_i
> 
> It binds the union of the bindings of the nested patterns.  Like
> deconstruction patterns, it may optional bind a name to the target cast
> to the right type, if a name is provided:
> 
>     case T[] { P, Q } ts:
> 
> For multi-dimensional arrays, we interpret the pattern
> 
>     T[][] { P, Q }
> 
> as matching P and Q to the first and second elements of the outermost array:
> 
>     case T[][] { T[] a, T[] b }
> or
>     case T[][] { T[] { var a, var b }, T[] { var c, ... } }
> 
> Note that `T[] ts` is an ordinary type pattern, not an array pattern.
> 
> 
> #### Record patterns
> 
> A record pattern has the form
> 
>     TypeName( CommaSeparatedListOfPatternsWithOptionalTrailingDots )
> OptionalIdentifier
> 
> The type name must be a record class.  It matches if:
> 
>  - the target is non-null
>  - the target is an instance of the specified record type
>  - each of the nested patterns matches the corresponding component of
> the record, as served by the accessor method
> 
> The binding list is the union of the binding lists of the nested
> patterns; if the optional identifier is specified, it is bound to the
> target, cast to TypeName.
> 
> The treatment of null components is as per the matching rules for the
> nested pattern; for type patterns, if the type is total on (a supertype
> of) the type of the component, it is presumed to match, otherwise a
> dynamic (instanceof) check is performed.
> 
> 
> #### Guard patterns
> 
> A guard pattern has the form true(BooleanExpr) or false(BooleanExpr).
> It matches if the boolean expression is true or false, respectively.  It
> matches any target type and binds no bindings.
> 
> 
> #### AND patterns
> 
> Two patterns P and Q can be combined into an AND pattern denoted as P&Q
> (syntax provisional.)  P and Q must both be applicable to the target
> type for P&Q to be; the binding list is the union of the binding lists.
> If P does not match the target, no attempt is made to match Q.
> 
> 
> #### Translation
> 
> I'll write VARS(P) to describe the binding variables (name and type) of
> a pattern P, and TR(target, P) to describe the compiler-tree desugaring
> of a pattern match expression.  When translating a pattern, its bindings
> are hoisted out to a suitable scope.  Javac has as "LetExpr" tree type,
> `LetExpr{S*} in e`, which means we execute statements S* and evaluate
> expression `e`, and its result is the result of evaluating `e`.  union_i
> is the union over a set indexed by i; product_i is the &&-product over
> the set.  |S| is the cardinality of a set.
> 
> Type patterns:
>     VARS(T t) = { T t }
>     TR(target, T t) = true   // for statically total type patterns,
> otherwise:
>     TR(target, T t) = LetExpr {
>                         boolean b = target instanceof T;
>                         if (b) t = (T) target;
>                       } in b
> 
> Record patterns:
>     VARS(R(P*)) = union_i VARS(P_i)
>     TR(target, R(P*)) = LetExpr {
>                     boolean b = target instanceof T;
>                     R res = b ? (R) target : null;
>                 } in (b && product_i TR(res.comp_i(), P_i))
> 
> Array patterns:
>     VARS(T[] { P* }) = union_i VARS(P_i)
>     TR(target, T[] { P* }) = LetExpr {
>                         boolean b = target instanceof T;
>                         T[] res = b ? (T[]) target : null;
>                     } in (b && res.length == |P*| && product_i
> TR(res[i], P_i))
>     TR(target, T[] { P*, ... }) = LetExpr {
>                         boolean b = target instanceof T;
>                         T[] res = b ? (T[]) target : null;
>                     } in (b && res.length >= |P*| && product_i
> TR(res[i], P_i))
> 
> Guard patterns:
>     VARS(true(e)) = { }
>     VARS(false(e)) = { }
>     TR(true(e)) = e
>     TR(false(e)) = !e
> 
> AND patterns:
>     VARS(P&Q) = VARS(P) union VARS(Q)
>     TR(P&Q) = TR(P) && TR(Q)
> 
> The above represents an unrolling of nested patterns, where t~P(Q) is
> unrolled into t~P(a) && a~Q (where t~P means "P matches t").
> 
> #### Binary compatibility
> 
> In the future, it will be possible to explicitly declare deconstruction
> patterns in records.  In order to make adding an explicit canonical
> deconstruction pattern a binary-compatible operation, by the time we
> exit preview, some of this logic should be funneled through a
> condy-based pattern runtime so that old binaries that ask for the
> canonical deconstruction pattern for a record will get a reasonable
> default if no explicit one exists, or the one in the class file if one
> does.  This entails replacing `res.comp_i()` in the translation with
> calls to the pattern runtime.
> 
> 
> ## Patterns in switch
> 
> Introducing patterns into case labels creates another round of
> adjustments for switch, and probably some new seams between old and new
> functionality.
> 
> #### More types
> 
> Switch is currently limited to integral primitive types and their boxes,
> strings, and enums.  We'll these types "legacy switch types", and call
> switches on legacy switch types, for which all of whose case labels are
> "old-style" case labels, "legacy switches". (I'll develop the list of
> candidate restrictions on legacy switches as I go; ideally this list
> should be empty, but it probably won't be.)
> 
> Pattern matching surely lets us switch over any reference type. It is an
> open question whether we should fill out the three remaining primitive
> types (boolean, float, double.)  On the one hand, it seems oddly
> incomplete to be able to switch over all but three types, while on the
> other, switching over floats seems pretty niche, and we already have a
> statement for switch over booleans (`if`).
> 
> Assuming we decide for completeness, there are two interpretations of
> switching on float: whether this is a legacy or pattern switch.
> 
> Note that for nested patterns, we need to have type patterns for all the
> primitive types, so we can match against `Foo(boolean b)` or `Bar(float
> f)`, even if we don't allow switching on these types.
> 
> #### Patterns as case labels
> 
> The main change here, of course, is allowing patterns as case labels.
> The specified pattern must be applicable to the static type of the
> switch operand.
> 
> One question we need to answer is how to integrate legacy case labels
> into the story.  There are basically three options here:
> 
>  - Retcon them.  In this approach, we define a constant literal to be a
> constant pattern.  Then, all switches can be interpreted as pattern
> switches.
> 
>  - Tolerated legacy.  With `instanceof`, rather than defining a bare
> type to be a pattern, we defined two forms of the `instanceof`
> production, a legacy one (`instanceof T`) and a pattern one.  We can do
> the same with case labels, and possibly then say that legacy case labels
> are only allowed in legacy switches.
> 
>  - Leave them to rot.  In this approach, legacy case labels are allowed
> on legacy switches only.
> 
> The difference between the first (retcon them as constant patterns) is
> whether literals are valid as patterns in contexts other than `case`
> labels (instanceof, nested patterns.)  I am steadily converging on the
> conclusion that constant patterns are an "obvious but wrong" idea, more
> "cute and clever" than sound language design.
> 
> Note that with a sensible guard feature, constant patterns become pure
> syntactic shortcut; `Foo(0)` can be expressed as `Foo(var x) & true(x ==
> 0)` in any pattern context.  That's not to say that the benefit of the
> shortcut is zero, but there's also a cost; further user confusion as to
> whether `id(0)` is an invocation of a method with a constant, or
> matching a pattern with a nested constant pattern.  (If we decide we
> really want constant patterns because the nestability leads to better
> readability, I prefer we find another way to denote them, such as `== 0`
> or `const(0)` or such.)
> 
> So, let's assume that we're going to be somewhere on the spectrum
> between "tolerate" and "rot".  There are two questions to answer here:
> 
>  - In switches on legacy types, do we allow a mix of constant cases and
> pattern cases?  (Note that this only makes a difference when we have
> more interesting patterns that can apply to these types, which mostly
> means waiting for declared patterns.)
> 
>  - In non-legacy-switches, do we allow constant cases at all?
> 
> If we say "yes" to either of these, then essentially we treat any
> pattern with a mix of these as a pattern switch, and lower `case C` to
> `case TargetType t & true(t.equals(C))` or similar.
> 
> #### Nullity
> 
> We've gone through an extensive discussion (actually, many extensive
> discussions) on nullity in switch.  It took several rounds to get over
> the "why can't we just always throw on null", but I hope we're there
> now.  It seems that the sensible strategy is:
> 
>  - At the top level of a switch, total type patterns, and the new `case
> null`, accept null, and no others.  (Default does not accept null, but
> you can fall out of a `case null` into a default.)  If a switch has no
> null-accepting patterns, then the switch throws NPE preemptively
> (simulating current behavior), as if there is an invisible trailing
> (possibly dead) `case null: NPE` in all switches.
> 
> This strategy is fully backward compatible; anything that throws today,
> will throw tomorrow.  Total patterns must come last, so only the last
> pattern, or an explicit `case null`, will see a null.
> 
> For nested patterns, we've already defined the semantics of pattern
> matching such that pattern matching never NPEs, and never introduces any
> failures to match null that don't derive from an underlying dynamic type
> test.  The problem is solely with expectations having to do with the top
> level of switch, based on historical preemptive null checks.
> 
> #### Scoping and fallthrough
> 
> Having decided to backtrack on merging binding variables in instanceof
> (`x instanceof Foo(var a) || x instanceof Bar(var a))`), the
> corresponding backtracking is to restrict fallthrough such that you
> cannot fall into, or out of, a case clause that has bindings.
> 
> This may require some adjustment in the scoping of binding variables,
> which is probably not yet fully defined for switches, to ensure that
> their scope ends at the next case declaration, so that we don't have a
> problem with reusing names:
> 
>     case Foo(var x): break;
>     case Bar(var x): // OK to use `x` here
> 
> It is OK to fall out out of a case without bindings into a case without
> bindings, such as falling out of `case null` into `default`.
> 
> #### Exhaustiveness
> 
> There's a definition of a set of patterns being total on a target type,
> outlined here:
> https://github.com/openjdk/amber-docs/blob/master/site/design-notes/type-patterns-in-switch.md.
> 
> 
> The main new feature here is allowing pattern sets that are
> (optimistically) total on the target to be considered total in a switch
> expression, and insert the appropriate throwing catch-call to handle the
> remainder.
> 
> Secondarily, we still have a question to confront about how to make
> statement switches total like expression switches, so that users can opt
> into stricter type checking.
> 
> #### Translation considerations
> 
> Legacy switches on primitives can continue to get translated as they are
> now.
> 
> Pattern switches are lowered to densely numbered int switches, and we
> use an indy-based classifier to select the candidate case.  The
> bootstrap takes as its static argument list a list of Class objects (or
> null), and as its dynamic arguments, the switch target and the case
> number to start at.  For example, we translate:
> 
>     switch (o) {
>         case Telegram t: A
>         case Integer i && true(i > 0): B
>         case Integer i: C
>         case Foo(Bar(var x)): D
>         case "moose": ...
>     }
> 
> as (pseudocode):
> 
>     int index = -1;
>     LOOP:
>     while (true) {
>         index = indy[bootstrap=SwitchBootstrap[Telegram.class,
> Integer.class, Integer.class, Foo.class, String.class]](o, index);
>         switch (idx) {
>             case 0: A
>             case 1:
>                 if (!(i > 0))
>                     continue LOOP;
>                 B;
>             case 2: C;
>             case 3:
>                 if (!(fooBinding1 instanceof Bar(var x))
>                     continue LOOP;
>                 D
>             case 4:
>                 if (!(o.equals("moose"))
>                     continue LOOP;
>         }
>         break LOOP;
>     }
> 
> The classifier builds a decision tree based on types, but does not
> include nested patterns or guards -- these are added by the desugaring.
> If there is anything that causes us to enter a case and then later
> decide "whoops", we break out of the switch, and re-run the classifier,
> starting at the last index we tried. (Earlier experiments suggested this
> was the sweet spot; while we can accurately model patterns and guards as
> constant bundles of method handles, the complexity is high and the
> benefit is not so high.
> 
> For record/array patterns, we use the classifier to select the type, and
> then unroll any nested patterns into the switch body.  For ANDed
> patterns, we classify on the type of the first pattern and unroll the
> rest into the switch body.
> 
> For legacy switches on enums and strings, we have the option of
> replacing the current desugaring with an indy-based classification as we
> do with pattern switches, to move complex desugaring code from the
> compiler to simpler runtime code.  We can do this now, later, or never.
> 
> 
> #### Open issues
> 
> We have a few open issues to sync on, though almost all can probably be
> kicked down the road past first preview:
> 
>  - Boundaries between legacy switches and pattern switches
>    - allow mixing of constant and pattern cases?
>    - allow switch on float/double/boolean with constant labels?
>   - Allow statement switches to opt into totality?  How?

Rémi