[External] : Re: Pattern features for next iteration

Fri Jan 22 13:16:13 UTC 2021

----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Jeudi 21 Janvier 2021 17:01:09
> Objet: Re: [External] : Re: Pattern features for next iteration

>> For me: extending switch to support any types + type pattern + case null seems a
>> reasonable goal.
>>
>> About retconing switch, you talk about old switch vs new switch,
>> i think there is another possible decomposition, old "case" vs new pattern, i.e.
>> "case constant" can be retrofitted as a pattern.
> 
> As you suggest, the keyword "type" doesn't scale, since there will be
> patterns that are not simply type patterns.  But another keyword, such
> as perhaps "pattern", wouldn't have this problem.
> 
> Fundamentally, this is a lump-vs-split choice, and there's room for
> reasonable disagreement about whether it is better to lump or split in
> any given situation.  (For those keeping track at home, if we did all
> the "split" moves people have suggested on switch -- statement vs
> expression, nullable vs not, takes patterns vs not, total vs not, etc,
> we'd have something like 2^6 variants of switch.  Obviously no one is
> directly suggesting that, but my point is, there is a very limited
> budget for splitting, even when there is something worth splitting
> over.)  I agree that lumping -- which has been our default move here --
> can creates seams and cracks, but I am not sure splitting isn't worse,
> either in general, or in this case.
> 
> For the record, I am deeply suspicious of the "old switch vs new"
> split.  It came up when we did expression switch, but I was skeptical
> then (and still am), because there will surely be a stream of
> improvements that we may eventually want to apply to switch. Splitting
> into "old and new" plays into the fantasy that "this is the last time
> we'll ever do this, I swear."  But that seems unlikely.  Lumping
> statement and expression switches did create some rough edges -- notably
> the totality difference -- but I think the alternative would have been
> worse, and, even had we split then, it's not clear we wouldn't still be
> talking about splitting again now.
> 
> So, I think it would be better to focus on _why_ it seems appealing to
> split here.  The "seams" I mentioned come from the fact that the old
> switch behavior doesn't scale to more general switches, so needs to be
> corralled somewhere.  But the existence of seams isn't intrinsically a
> problem; sure, they add surface area to the language.  The real risk to
> be managed here is that of user confusion.  And, its easy to manufacture
> examples of "a user could make this mistake", but what we should really
> be worried about is mistakes that users risk making over and over.
> 
> So before we jump into syntax proposals, which are "solutions", can we
> be more clear about what problems you are worried about?

I'm not worried, you list 3 ways to introduce patterns into switch,
I'm just saying there is a 4th, the lump + split.

It's a lump because you can mix old "case"s and new patterns and it's a split because you have a prefix that indicates if it's a case or a pattern.

Currently we can have
  case constant
  case Class.constantName
  case enumConstantName
and we want to add
  case Class bindingName
  case Class()
  case Class.patternName()
  case patternName()
but the following are not valid
  case Class(constant)
  case Class(Class.constantName)
  case Class(enumConstantName)
  case Class.patternName(constant)
  case Class.patternName(Class.constantName)
  case Class.patternName(enumConstantName)
  case patternName(constant)
  case patternName(Class.constantName)
  case patternName(enumConstantName)

Roughly, "case" can be followed by an expression (old case) or a pattern (new syntax) but for nested values (in between parenthesis) only patterns are valid apart true() or false() that acts as bridge.

So semantically, there is a split in between expressions and patterns, so it may be interesting to have that same split reflected into the grammar.

Rémi

> 
> 
>>
>> So a switch that mix type patterns and case can be written that way:
>>    switch(o) {
>>      case "foo" ->    // equivalent to Object$$equals("foo") or type String &&
>>      String$$equals("foo")
>>      type String s ->
>>      type Integer i ->
>>    }
>>
>> I've used "type" as keyword, it can be any other keyword ("instanceof" ?) or no
>> keyword at all.
>>
>> I believe there is more freedom in term of grammar even if there is no keyword
>> and no false dichotomy between old vs new.
>>
>>> #### Array patterns
>>>
>>> An array pattern has the form
>>>      T[] { CommaSeparatedListOfPatternsWithOptionalTrailingDots }
>>> OptionalIdentifier
>>>
>>> The target type of an array pattern is T[], and it matches if:
>>>   - the target is non-null
>>>   - the target is an instance of T[]
>>>   - the target has has an length equal to the arity of the nested
>>> pattern list, or, if the nested pattern list ends with a `...`, length
>>> greater than the arity of the nested pattern list
>>>   - for each nested pattern P_i, the i'th element of the list matches P_i
>>>
>>> It binds the union of the bindings of the nested patterns.  Like
>>> deconstruction patterns, it may optional bind a name to the target cast
>>> to the right type, if a name is provided:
>>>
>>>      case T[] { P, Q } ts:
>>>
>>> For multi-dimensional arrays, we interpret the pattern
>>>
>>>      T[][] { P, Q }
>>>
>>> as matching P and Q to the first and second elements of the outermost array:
>>>
>>>      case T[][] { T[] a, T[] b }
>>> or
>>>      case T[][] { T[] { var a, var b }, T[] { var c, ... } }
>>>
>>> Note that `T[] ts` is an ordinary type pattern, not an array pattern.
>>>
>>>
>>> #### Record patterns
>>>
>>> A record pattern has the form
>>>
>>>      TypeName( CommaSeparatedListOfPatternsWithOptionalTrailingDots )
>>> OptionalIdentifier
>>>
>>> The type name must be a record class.  It matches if:
>>>
>>>   - the target is non-null
>>>   - the target is an instance of the specified record type
>>>   - each of the nested patterns matches the corresponding component of
>>> the record, as served by the accessor method
>>>
>>> The binding list is the union of the binding lists of the nested
>>> patterns; if the optional identifier is specified, it is bound to the
>>> target, cast to TypeName.
>>>
>>> The treatment of null components is as per the matching rules for the
>>> nested pattern; for type patterns, if the type is total on (a supertype
>>> of) the type of the component, it is presumed to match, otherwise a
>>> dynamic (instanceof) check is performed.
>>>
>>>
>>> #### Guard patterns
>>>
>>> A guard pattern has the form true(BooleanExpr) or false(BooleanExpr).
>>> It matches if the boolean expression is true or false, respectively.  It
>>> matches any target type and binds no bindings.
>>>
>>>
>>> #### AND patterns
>>>
>>> Two patterns P and Q can be combined into an AND pattern denoted as P&Q
>>> (syntax provisional.)  P and Q must both be applicable to the target
>>> type for P&Q to be; the binding list is the union of the binding lists.
>>> If P does not match the target, no attempt is made to match Q.
>>>
>>>
>>> #### Translation
>>>
>>> I'll write VARS(P) to describe the binding variables (name and type) of
>>> a pattern P, and TR(target, P) to describe the compiler-tree desugaring
>>> of a pattern match expression.  When translating a pattern, its bindings
>>> are hoisted out to a suitable scope.  Javac has as "LetExpr" tree type,
>>> `LetExpr{S*} in e`, which means we execute statements S* and evaluate
>>> expression `e`, and its result is the result of evaluating `e`.  union_i
>>> is the union over a set indexed by i; product_i is the &&-product over
>>> the set.  |S| is the cardinality of a set.
>>>
>>> Type patterns:
>>>      VARS(T t) = { T t }
>>>      TR(target, T t) = true   // for statically total type patterns,
>>> otherwise:
>>>      TR(target, T t) = LetExpr {
>>>                          boolean b = target instanceof T;
>>>                          if (b) t = (T) target;
>>>                        } in b
>>>
>>> Record patterns:
>>>      VARS(R(P*)) = union_i VARS(P_i)
>>>      TR(target, R(P*)) = LetExpr {
>>>                      boolean b = target instanceof T;
>>>                      R res = b ? (R) target : null;
>>>                  } in (b && product_i TR(res.comp_i(), P_i))
>>>
>>> Array patterns:
>>>      VARS(T[] { P* }) = union_i VARS(P_i)
>>>      TR(target, T[] { P* }) = LetExpr {
>>>                          boolean b = target instanceof T;
>>>                          T[] res = b ? (T[]) target : null;
>>>                      } in (b && res.length == |P*| && product_i
>>> TR(res[i], P_i))
>>>      TR(target, T[] { P*, ... }) = LetExpr {
>>>                          boolean b = target instanceof T;
>>>                          T[] res = b ? (T[]) target : null;
>>>                      } in (b && res.length >= |P*| && product_i
>>> TR(res[i], P_i))
>>>
>>> Guard patterns:
>>>      VARS(true(e)) = { }
>>>      VARS(false(e)) = { }
>>>      TR(true(e)) = e
>>>      TR(false(e)) = !e
>>>
>>> AND patterns:
>>>      VARS(P&Q) = VARS(P) union VARS(Q)
>>>      TR(P&Q) = TR(P) && TR(Q)
>>>
>>> The above represents an unrolling of nested patterns, where t~P(Q) is
>>> unrolled into t~P(a) && a~Q (where t~P means "P matches t").
>>>
>>> #### Binary compatibility
>>>
>>> In the future, it will be possible to explicitly declare deconstruction
>>> patterns in records.  In order to make adding an explicit canonical
>>> deconstruction pattern a binary-compatible operation, by the time we
>>> exit preview, some of this logic should be funneled through a
>>> condy-based pattern runtime so that old binaries that ask for the
>>> canonical deconstruction pattern for a record will get a reasonable
>>> default if no explicit one exists, or the one in the class file if one
>>> does.  This entails replacing `res.comp_i()` in the translation with
>>> calls to the pattern runtime.
>>>
>>>
>>> ## Patterns in switch
>>>
>>> Introducing patterns into case labels creates another round of
>>> adjustments for switch, and probably some new seams between old and new
>>> functionality.
>>>
>>> #### More types
>>>
>>> Switch is currently limited to integral primitive types and their boxes,
>>> strings, and enums.  We'll these types "legacy switch types", and call
>>> switches on legacy switch types, for which all of whose case labels are
>>> "old-style" case labels, "legacy switches". (I'll develop the list of
>>> candidate restrictions on legacy switches as I go; ideally this list
>>> should be empty, but it probably won't be.)
>>>
>>> Pattern matching surely lets us switch over any reference type. It is an
>>> open question whether we should fill out the three remaining primitive
>>> types (boolean, float, double.)  On the one hand, it seems oddly
>>> incomplete to be able to switch over all but three types, while on the
>>> other, switching over floats seems pretty niche, and we already have a
>>> statement for switch over booleans (`if`).
>>>
>>> Assuming we decide for completeness, there are two interpretations of
>>> switching on float: whether this is a legacy or pattern switch.
>>>
>>> Note that for nested patterns, we need to have type patterns for all the
>>> primitive types, so we can match against `Foo(boolean b)` or `Bar(float
>>> f)`, even if we don't allow switching on these types.
>>>
>>> #### Patterns as case labels
>>>
>>> The main change here, of course, is allowing patterns as case labels.
>>> The specified pattern must be applicable to the static type of the
>>> switch operand.
>>>
>>> One question we need to answer is how to integrate legacy case labels
>>> into the story.  There are basically three options here:
>>>
>>>   - Retcon them.  In this approach, we define a constant literal to be a
>>> constant pattern.  Then, all switches can be interpreted as pattern
>>> switches.
>>>
>>>   - Tolerated legacy.  With `instanceof`, rather than defining a bare
>>> type to be a pattern, we defined two forms of the `instanceof`
>>> production, a legacy one (`instanceof T`) and a pattern one.  We can do
>>> the same with case labels, and possibly then say that legacy case labels
>>> are only allowed in legacy switches.
>>>
>>>   - Leave them to rot.  In this approach, legacy case labels are allowed
>>> on legacy switches only.
>>>
>>> The difference between the first (retcon them as constant patterns) is
>>> whether literals are valid as patterns in contexts other than `case`
>>> labels (instanceof, nested patterns.)  I am steadily converging on the
>>> conclusion that constant patterns are an "obvious but wrong" idea, more
>>> "cute and clever" than sound language design.
>>>
>>> Note that with a sensible guard feature, constant patterns become pure
>>> syntactic shortcut; `Foo(0)` can be expressed as `Foo(var x) & true(x ==
>>> 0)` in any pattern context.  That's not to say that the benefit of the
>>> shortcut is zero, but there's also a cost; further user confusion as to
>>> whether `id(0)` is an invocation of a method with a constant, or
>>> matching a pattern with a nested constant pattern.  (If we decide we
>>> really want constant patterns because the nestability leads to better
>>> readability, I prefer we find another way to denote them, such as `== 0`
>>> or `const(0)` or such.)
>>>
>>> So, let's assume that we're going to be somewhere on the spectrum
>>> between "tolerate" and "rot".  There are two questions to answer here:
>>>
>>>   - In switches on legacy types, do we allow a mix of constant cases and
>>> pattern cases?  (Note that this only makes a difference when we have
>>> more interesting patterns that can apply to these types, which mostly
>>> means waiting for declared patterns.)
>>>
>>>   - In non-legacy-switches, do we allow constant cases at all?
>>>
>>> If we say "yes" to either of these, then essentially we treat any
>>> pattern with a mix of these as a pattern switch, and lower `case C` to
>>> `case TargetType t & true(t.equals(C))` or similar.
>>>
>>> #### Nullity
>>>
>>> We've gone through an extensive discussion (actually, many extensive
>>> discussions) on nullity in switch.  It took several rounds to get over
>>> the "why can't we just always throw on null", but I hope we're there
>>> now.  It seems that the sensible strategy is:
>>>
>>>   - At the top level of a switch, total type patterns, and the new `case
>>> null`, accept null, and no others.  (Default does not accept null, but
>>> you can fall out of a `case null` into a default.)  If a switch has no
>>> null-accepting patterns, then the switch throws NPE preemptively
>>> (simulating current behavior), as if there is an invisible trailing
>>> (possibly dead) `case null: NPE` in all switches.
>>>
>>> This strategy is fully backward compatible; anything that throws today,
>>> will throw tomorrow.  Total patterns must come last, so only the last
>>> pattern, or an explicit `case null`, will see a null.
>>>
>>> For nested patterns, we've already defined the semantics of pattern
>>> matching such that pattern matching never NPEs, and never introduces any
>>> failures to match null that don't derive from an underlying dynamic type
>>> test.  The problem is solely with expectations having to do with the top
>>> level of switch, based on historical preemptive null checks.
>>>
>>> #### Scoping and fallthrough
>>>
>>> Having decided to backtrack on merging binding variables in instanceof
>>> (`x instanceof Foo(var a) || x instanceof Bar(var a))`), the
>>> corresponding backtracking is to restrict fallthrough such that you
>>> cannot fall into, or out of, a case clause that has bindings.
>>>
>>> This may require some adjustment in the scoping of binding variables,
>>> which is probably not yet fully defined for switches, to ensure that
>>> their scope ends at the next case declaration, so that we don't have a
>>> problem with reusing names:
>>>
>>>      case Foo(var x): break;
>>>      case Bar(var x): // OK to use `x` here
>>>
>>> It is OK to fall out out of a case without bindings into a case without
>>> bindings, such as falling out of `case null` into `default`.
>>>
>>> #### Exhaustiveness
>>>
>>> There's a definition of a set of patterns being total on a target type,
>>> outlined here:
>>> https://urldefense.com/v3/__https://github.com/openjdk/amber-docs/blob/master/site/design-notes/type-patterns-in-switch.md__;!!GqivPVa7Brio!OAY6rqE6G14wxQmoO-3fLImbBoVLfJ3QinfdHe5Wf1FDJyBzZ1Xy5_-HyZW4KqyMEA$
>>> .
>>>
>>>
>>> The main new feature here is allowing pattern sets that are
>>> (optimistically) total on the target to be considered total in a switch
>>> expression, and insert the appropriate throwing catch-call to handle the
>>> remainder.
>>>
>>> Secondarily, we still have a question to confront about how to make
>>> statement switches total like expression switches, so that users can opt
>>> into stricter type checking.
>>>
>>> #### Translation considerations
>>>
>>> Legacy switches on primitives can continue to get translated as they are
>>> now.
>>>
>>> Pattern switches are lowered to densely numbered int switches, and we
>>> use an indy-based classifier to select the candidate case.  The
>>> bootstrap takes as its static argument list a list of Class objects (or
>>> null), and as its dynamic arguments, the switch target and the case
>>> number to start at.  For example, we translate:
>>>
>>>      switch (o) {
>>>          case Telegram t: A
>>>          case Integer i && true(i > 0): B
>>>          case Integer i: C
>>>          case Foo(Bar(var x)): D
>>>          case "moose": ...
>>>      }
>>>
>>> as (pseudocode):
>>>
>>>      int index = -1;
>>>      LOOP:
>>>      while (true) {
>>>          index = indy[bootstrap=SwitchBootstrap[Telegram.class,
>>> Integer.class, Integer.class, Foo.class, String.class]](o, index);
>>>          switch (idx) {
>>>              case 0: A
>>>              case 1:
>>>                  if (!(i > 0))
>>>                      continue LOOP;
>>>                  B;
>>>              case 2: C;
>>>              case 3:
>>>                  if (!(fooBinding1 instanceof Bar(var x))
>>>                      continue LOOP;
>>>                  D
>>>              case 4:
>>>                  if (!(o.equals("moose"))
>>>                      continue LOOP;
>>>          }
>>>          break LOOP;
>>>      }
>>>
>>> The classifier builds a decision tree based on types, but does not
>>> include nested patterns or guards -- these are added by the desugaring.
>>> If there is anything that causes us to enter a case and then later
>>> decide "whoops", we break out of the switch, and re-run the classifier,
>>> starting at the last index we tried. (Earlier experiments suggested this
>>> was the sweet spot; while we can accurately model patterns and guards as
>>> constant bundles of method handles, the complexity is high and the
>>> benefit is not so high.
>>>
>>> For record/array patterns, we use the classifier to select the type, and
>>> then unroll any nested patterns into the switch body.  For ANDed
>>> patterns, we classify on the type of the first pattern and unroll the
>>> rest into the switch body.
>>>
>>> For legacy switches on enums and strings, we have the option of
>>> replacing the current desugaring with an indy-based classification as we
>>> do with pattern switches, to move complex desugaring code from the
>>> compiler to simpler runtime code.  We can do this now, later, or never.
>>>
>>>
>>> #### Open issues
>>>
>>> We have a few open issues to sync on, though almost all can probably be
>>> kicked down the road past first preview:
>>>
>>>   - Boundaries between legacy switches and pattern switches
>>>     - allow mixing of constant and pattern cases?
>>>     - allow switch on float/double/boolean with constant labels?
>>>    - Allow statement switches to opt into totality?  How?
>>
>> Rémi