Pattern features for next iteration
Remi Forax
forax at univ-mlv.fr
Thu Jan 21 15:44:46 UTC 2021
----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Mardi 19 Janvier 2021 20:59:00
> Objet: Pattern features for next iteration
> We're zeroing in on a feature list for the next iteration of pattern
> matching, which includes:
>
> - New kinds of patterns
> - Array patterns
> - Record deconstruction patterns, including varargs records
> - maybe: guard patterns (true(e), false(e))
> - maybe: AND patterns (P & Q), syntax TBD
>
> - New place for patterns: switch
> - Extend switch to support more types (all types?)
> - Extend switch on non-legacy types to support patterns as case labels
> - New nullity behavior (including case null)
> - New scoping + fallthrough
> - Refined exhaustiveness analysis
>
> This document is an overview; the immediate goal is coming to consensus
> that this is the set of features we should be targeting next. (Please,
> no deep-dives into details of, say, translation until this is locked down.)
>
For me: extending switch to support any types + type pattern + case null seems a reasonable goal.
About retconing switch, you talk about old switch vs new switch,
i think there is another possible decomposition, old "case" vs new pattern, i.e. "case constant" can be retrofitted as a pattern.
So a switch that mix type patterns and case can be written that way:
switch(o) {
case "foo" -> // equivalent to Object$$equals("foo") or type String && String$$equals("foo")
type String s ->
type Integer i ->
}
I've used "type" as keyword, it can be any other keyword ("instanceof" ?) or no keyword at all.
I believe there is more freedom in term of grammar even if there is no keyword and no false dichotomy between old vs new.
>
> #### Array patterns
>
> An array pattern has the form
> T[] { CommaSeparatedListOfPatternsWithOptionalTrailingDots }
> OptionalIdentifier
>
> The target type of an array pattern is T[], and it matches if:
> - the target is non-null
> - the target is an instance of T[]
> - the target has has an length equal to the arity of the nested
> pattern list, or, if the nested pattern list ends with a `...`, length
> greater than the arity of the nested pattern list
> - for each nested pattern P_i, the i'th element of the list matches P_i
>
> It binds the union of the bindings of the nested patterns. Like
> deconstruction patterns, it may optional bind a name to the target cast
> to the right type, if a name is provided:
>
> case T[] { P, Q } ts:
>
> For multi-dimensional arrays, we interpret the pattern
>
> T[][] { P, Q }
>
> as matching P and Q to the first and second elements of the outermost array:
>
> case T[][] { T[] a, T[] b }
> or
> case T[][] { T[] { var a, var b }, T[] { var c, ... } }
>
> Note that `T[] ts` is an ordinary type pattern, not an array pattern.
>
>
> #### Record patterns
>
> A record pattern has the form
>
> TypeName( CommaSeparatedListOfPatternsWithOptionalTrailingDots )
> OptionalIdentifier
>
> The type name must be a record class. It matches if:
>
> - the target is non-null
> - the target is an instance of the specified record type
> - each of the nested patterns matches the corresponding component of
> the record, as served by the accessor method
>
> The binding list is the union of the binding lists of the nested
> patterns; if the optional identifier is specified, it is bound to the
> target, cast to TypeName.
>
> The treatment of null components is as per the matching rules for the
> nested pattern; for type patterns, if the type is total on (a supertype
> of) the type of the component, it is presumed to match, otherwise a
> dynamic (instanceof) check is performed.
>
>
> #### Guard patterns
>
> A guard pattern has the form true(BooleanExpr) or false(BooleanExpr).
> It matches if the boolean expression is true or false, respectively. It
> matches any target type and binds no bindings.
>
>
> #### AND patterns
>
> Two patterns P and Q can be combined into an AND pattern denoted as P&Q
> (syntax provisional.) P and Q must both be applicable to the target
> type for P&Q to be; the binding list is the union of the binding lists.
> If P does not match the target, no attempt is made to match Q.
>
>
> #### Translation
>
> I'll write VARS(P) to describe the binding variables (name and type) of
> a pattern P, and TR(target, P) to describe the compiler-tree desugaring
> of a pattern match expression. When translating a pattern, its bindings
> are hoisted out to a suitable scope. Javac has as "LetExpr" tree type,
> `LetExpr{S*} in e`, which means we execute statements S* and evaluate
> expression `e`, and its result is the result of evaluating `e`. union_i
> is the union over a set indexed by i; product_i is the &&-product over
> the set. |S| is the cardinality of a set.
>
> Type patterns:
> VARS(T t) = { T t }
> TR(target, T t) = true // for statically total type patterns,
> otherwise:
> TR(target, T t) = LetExpr {
> boolean b = target instanceof T;
> if (b) t = (T) target;
> } in b
>
> Record patterns:
> VARS(R(P*)) = union_i VARS(P_i)
> TR(target, R(P*)) = LetExpr {
> boolean b = target instanceof T;
> R res = b ? (R) target : null;
> } in (b && product_i TR(res.comp_i(), P_i))
>
> Array patterns:
> VARS(T[] { P* }) = union_i VARS(P_i)
> TR(target, T[] { P* }) = LetExpr {
> boolean b = target instanceof T;
> T[] res = b ? (T[]) target : null;
> } in (b && res.length == |P*| && product_i
> TR(res[i], P_i))
> TR(target, T[] { P*, ... }) = LetExpr {
> boolean b = target instanceof T;
> T[] res = b ? (T[]) target : null;
> } in (b && res.length >= |P*| && product_i
> TR(res[i], P_i))
>
> Guard patterns:
> VARS(true(e)) = { }
> VARS(false(e)) = { }
> TR(true(e)) = e
> TR(false(e)) = !e
>
> AND patterns:
> VARS(P&Q) = VARS(P) union VARS(Q)
> TR(P&Q) = TR(P) && TR(Q)
>
> The above represents an unrolling of nested patterns, where t~P(Q) is
> unrolled into t~P(a) && a~Q (where t~P means "P matches t").
>
> #### Binary compatibility
>
> In the future, it will be possible to explicitly declare deconstruction
> patterns in records. In order to make adding an explicit canonical
> deconstruction pattern a binary-compatible operation, by the time we
> exit preview, some of this logic should be funneled through a
> condy-based pattern runtime so that old binaries that ask for the
> canonical deconstruction pattern for a record will get a reasonable
> default if no explicit one exists, or the one in the class file if one
> does. This entails replacing `res.comp_i()` in the translation with
> calls to the pattern runtime.
>
>
> ## Patterns in switch
>
> Introducing patterns into case labels creates another round of
> adjustments for switch, and probably some new seams between old and new
> functionality.
>
> #### More types
>
> Switch is currently limited to integral primitive types and their boxes,
> strings, and enums. We'll these types "legacy switch types", and call
> switches on legacy switch types, for which all of whose case labels are
> "old-style" case labels, "legacy switches". (I'll develop the list of
> candidate restrictions on legacy switches as I go; ideally this list
> should be empty, but it probably won't be.)
>
> Pattern matching surely lets us switch over any reference type. It is an
> open question whether we should fill out the three remaining primitive
> types (boolean, float, double.) On the one hand, it seems oddly
> incomplete to be able to switch over all but three types, while on the
> other, switching over floats seems pretty niche, and we already have a
> statement for switch over booleans (`if`).
>
> Assuming we decide for completeness, there are two interpretations of
> switching on float: whether this is a legacy or pattern switch.
>
> Note that for nested patterns, we need to have type patterns for all the
> primitive types, so we can match against `Foo(boolean b)` or `Bar(float
> f)`, even if we don't allow switching on these types.
>
> #### Patterns as case labels
>
> The main change here, of course, is allowing patterns as case labels.
> The specified pattern must be applicable to the static type of the
> switch operand.
>
> One question we need to answer is how to integrate legacy case labels
> into the story. There are basically three options here:
>
> - Retcon them. In this approach, we define a constant literal to be a
> constant pattern. Then, all switches can be interpreted as pattern
> switches.
>
> - Tolerated legacy. With `instanceof`, rather than defining a bare
> type to be a pattern, we defined two forms of the `instanceof`
> production, a legacy one (`instanceof T`) and a pattern one. We can do
> the same with case labels, and possibly then say that legacy case labels
> are only allowed in legacy switches.
>
> - Leave them to rot. In this approach, legacy case labels are allowed
> on legacy switches only.
>
> The difference between the first (retcon them as constant patterns) is
> whether literals are valid as patterns in contexts other than `case`
> labels (instanceof, nested patterns.) I am steadily converging on the
> conclusion that constant patterns are an "obvious but wrong" idea, more
> "cute and clever" than sound language design.
>
> Note that with a sensible guard feature, constant patterns become pure
> syntactic shortcut; `Foo(0)` can be expressed as `Foo(var x) & true(x ==
> 0)` in any pattern context. That's not to say that the benefit of the
> shortcut is zero, but there's also a cost; further user confusion as to
> whether `id(0)` is an invocation of a method with a constant, or
> matching a pattern with a nested constant pattern. (If we decide we
> really want constant patterns because the nestability leads to better
> readability, I prefer we find another way to denote them, such as `== 0`
> or `const(0)` or such.)
>
> So, let's assume that we're going to be somewhere on the spectrum
> between "tolerate" and "rot". There are two questions to answer here:
>
> - In switches on legacy types, do we allow a mix of constant cases and
> pattern cases? (Note that this only makes a difference when we have
> more interesting patterns that can apply to these types, which mostly
> means waiting for declared patterns.)
>
> - In non-legacy-switches, do we allow constant cases at all?
>
> If we say "yes" to either of these, then essentially we treat any
> pattern with a mix of these as a pattern switch, and lower `case C` to
> `case TargetType t & true(t.equals(C))` or similar.
>
> #### Nullity
>
> We've gone through an extensive discussion (actually, many extensive
> discussions) on nullity in switch. It took several rounds to get over
> the "why can't we just always throw on null", but I hope we're there
> now. It seems that the sensible strategy is:
>
> - At the top level of a switch, total type patterns, and the new `case
> null`, accept null, and no others. (Default does not accept null, but
> you can fall out of a `case null` into a default.) If a switch has no
> null-accepting patterns, then the switch throws NPE preemptively
> (simulating current behavior), as if there is an invisible trailing
> (possibly dead) `case null: NPE` in all switches.
>
> This strategy is fully backward compatible; anything that throws today,
> will throw tomorrow. Total patterns must come last, so only the last
> pattern, or an explicit `case null`, will see a null.
>
> For nested patterns, we've already defined the semantics of pattern
> matching such that pattern matching never NPEs, and never introduces any
> failures to match null that don't derive from an underlying dynamic type
> test. The problem is solely with expectations having to do with the top
> level of switch, based on historical preemptive null checks.
>
> #### Scoping and fallthrough
>
> Having decided to backtrack on merging binding variables in instanceof
> (`x instanceof Foo(var a) || x instanceof Bar(var a))`), the
> corresponding backtracking is to restrict fallthrough such that you
> cannot fall into, or out of, a case clause that has bindings.
>
> This may require some adjustment in the scoping of binding variables,
> which is probably not yet fully defined for switches, to ensure that
> their scope ends at the next case declaration, so that we don't have a
> problem with reusing names:
>
> case Foo(var x): break;
> case Bar(var x): // OK to use `x` here
>
> It is OK to fall out out of a case without bindings into a case without
> bindings, such as falling out of `case null` into `default`.
>
> #### Exhaustiveness
>
> There's a definition of a set of patterns being total on a target type,
> outlined here:
> https://github.com/openjdk/amber-docs/blob/master/site/design-notes/type-patterns-in-switch.md.
>
>
> The main new feature here is allowing pattern sets that are
> (optimistically) total on the target to be considered total in a switch
> expression, and insert the appropriate throwing catch-call to handle the
> remainder.
>
> Secondarily, we still have a question to confront about how to make
> statement switches total like expression switches, so that users can opt
> into stricter type checking.
>
> #### Translation considerations
>
> Legacy switches on primitives can continue to get translated as they are
> now.
>
> Pattern switches are lowered to densely numbered int switches, and we
> use an indy-based classifier to select the candidate case. The
> bootstrap takes as its static argument list a list of Class objects (or
> null), and as its dynamic arguments, the switch target and the case
> number to start at. For example, we translate:
>
> switch (o) {
> case Telegram t: A
> case Integer i && true(i > 0): B
> case Integer i: C
> case Foo(Bar(var x)): D
> case "moose": ...
> }
>
> as (pseudocode):
>
> int index = -1;
> LOOP:
> while (true) {
> index = indy[bootstrap=SwitchBootstrap[Telegram.class,
> Integer.class, Integer.class, Foo.class, String.class]](o, index);
> switch (idx) {
> case 0: A
> case 1:
> if (!(i > 0))
> continue LOOP;
> B;
> case 2: C;
> case 3:
> if (!(fooBinding1 instanceof Bar(var x))
> continue LOOP;
> D
> case 4:
> if (!(o.equals("moose"))
> continue LOOP;
> }
> break LOOP;
> }
>
> The classifier builds a decision tree based on types, but does not
> include nested patterns or guards -- these are added by the desugaring.
> If there is anything that causes us to enter a case and then later
> decide "whoops", we break out of the switch, and re-run the classifier,
> starting at the last index we tried. (Earlier experiments suggested this
> was the sweet spot; while we can accurately model patterns and guards as
> constant bundles of method handles, the complexity is high and the
> benefit is not so high.
>
> For record/array patterns, we use the classifier to select the type, and
> then unroll any nested patterns into the switch body. For ANDed
> patterns, we classify on the type of the first pattern and unroll the
> rest into the switch body.
>
> For legacy switches on enums and strings, we have the option of
> replacing the current desugaring with an indy-based classification as we
> do with pattern switches, to move complex desugaring code from the
> compiler to simpler runtime code. We can do this now, later, or never.
>
>
> #### Open issues
>
> We have a few open issues to sync on, though almost all can probably be
> kicked down the road past first preview:
>
> - Boundaries between legacy switches and pattern switches
> - allow mixing of constant and pattern cases?
> - allow switch on float/double/boolean with constant labels?
> - Allow statement switches to opt into totality? How?
Rémi
More information about the amber-spec-experts
mailing list