Next up for patterns: type patterns in switch

Fri Aug 14 01:34:01 UTC 2020

OK, I’m going to make some *concise* responses to what appear to be
the choice points in your Email.  I’m going to leave aside the JEP document[1]
for now.

[1]: https://cr.openjdk.java.net/~briangoetz/amber/pattern-match.html

On Jun 24, 2020, at 7:44 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> ...
> ## Patterns in switch
> **Typing.**  Currently, the operand of a `switch` may only be one of the
> integral primitive types, the box type of an integral primitive, `String`, or an
> `enum` type.  (Further, if the `switch` operand is an `enum` type, the `case`
> labels must be _unqualified_ enum constant names.)  Clearly we can relax this
> restriction to allow other types, and constrain the case labels to only be
> patterns that are applicable to that type, but it may leave a seam of "legacy"
> vs "pattern" switch, especially if we do not adopt bare constant literals as
> the denotation of constant patterns.  (We have confronted this issue before with
> expression switch, and concluded that it was better to rehabilitate the `switch`
> we have rather than create a new construct, and we will make the same choice
> here, but the cost of this is often a visible seam.)

Yes, let’s do this.  The in-place enhancement is worth the seam.

So (modulo syntax ambiguities) if `x instanceof P` is valid, then
`switch (x) { … case P … }` is valid, right?  (Is there a more formal
draft spec of this syntax somewhere, yet, or is that TBW after this
discussion?)

> **Parsing.**  The grammar currently specifies that the operand of a `case` label
> is a `CaseConstant`, which casts a wide syntactic net, later narrowed with
> post-checks after attribution.  This means that, since parsing is done before we
> know the type of the operand, we must be watchful for ambiguities between
> patterns and expressions (and possibly refine the production for `case` labels.)

(OK, so something like CaseConstant | Pattern is needed, with rules
for dealing with ambiguity.)

> **Nullity.**  The `switch` construct is currently hostile to `null`, but some
> patterns do match `null`, and it may be desirable if nulls can be handled
> within a suitably crafted `switch`.  

Is it desirable?  Yes; let those nulls flow.  We’ve discussed that one to
exhaustion and beyond, and it seems the next step is a formal spec.

> 
> **Exhaustiveness.**  For switches over the permitted subtypes of sealed types,
> we will want to be able to do exhaustiveness analysis -- including for nested
> patterns (i.e., if `Shape`  is `Circle` or `Rect`, then `Box(Circle c)` and
> `Box(Rect r)` are exhaustive on `Box<Shape>`.)

Ref: https://cr.openjdk.java.net/~briangoetz/amber/pattern-match.html#exhaustiveness

A solution for this should fix enums, I think.   The proposal on this list
of saying “default P:” instead of “default:” where “P” had better cover
the rest of the switch values, looks like a winner to me.  Surely there
are other ways to do it.

Failing that, just requiring the user to write “default:”, for now, as with
enums, is adequate.  So, it’s a nice-to-have, but could be left for later.

> **Fallthrough.**  Fallthrough is everyone's least favorite feature of `switch`,
> but it exists for a reason.  (The mistake was making fallthrough the default
> behavior, but that ship has sailed.)  In the absence of an OR pattern
> combinator, one might find fallthrough in switch useful in conjunction with
> patterns:
> 
> ```
> case Box(int x):
> case Bag(int x):
>     // use x
> ```
> 
> However, it is likely that we will, at least initially, disallow falling out
> of, or into, a case label with binding variables.

This is good.  We can revisit.  OR-patterns with bindings are very
hard, perhaps too hard.

Side thought:  Range patterns can be done with libraries.  So maybe
use cases for other OR-patterns can be done similarly, with either libraries
or ad hoc destructor expressions.  (See my discussion elsewhere of
external transactional blocks.)  Exploring those ideas should not
hinder or delay the release of basic pattern-in-switch.

> #### Translation
> 
> Switches on primitives and their wrapper types are translated using the
> `tableswitch` or `lookupswitch` bytecodes; switches on strings and enums are
> lowered in the compiler to switches involving hash codes (for strings) or
> ordinals (for enums.)
> 
> For switches on patterns, we would need a new strategy, one likely built on
> `invokedynamic`, where we lower the cases to a densely numbered `int` switch,
> and then invoke a classifier function with the operand which tells us the first
> case number it matches.

Yep.  I note that the classifier function could be given a “continue
after case #N” entry point which would allow any form of guard
to be issued as in-line code, if you wrap a suitable loop around
the desugared switch.  Also, if we ever add fall-through back in,
there’s probably a natural way to map it to fall-through in the
desugared switch.  Basically the switch structure stays but the
head and cases are lowered to compactly assigned case numbers.

> ...
> 
> #### Guards
> 
> Worse, the semantics of
> `switch` mean that once a `case` label is selected, there is no way to say
> "oops, forget it, keep trying from the next label”.  

As with null hostility, we can blame this on the very limited portfolio
of “switch” to date.  Since case labels are *always* singleton values,
they *never* overlaps and so there’s never a significant need to retry
a switch operation.

The exception to this observation is that “default” overlaps *all* labels.
And this leads me to observe that C and Java switch allows you to pic
*any one* case, and position it above the default, so that *just that one*
case has the ability to fall through into the default logic.  That is so
very annoying when you have *two or more* cases that need to continue
on to the default logic.  It’s a sharp edge that stems from using fall-through
as a weaker substitute for (a) or-patterns and (b) continue-to-default.

Now that case labels can overlap almost arbitrarily (they cannot shadow
later ones), we need to question the axiom of “switch logic” that you
can’t restart the selection process once you have picked a case.

If we allow a restarting branch, we won’t need a guard syntax per se.

> It is common in languages with pattern matching to support some form of "guard"
> expression, which is a boolean expression that conditions whether the case
> matches, such as:
> 
> ```
> case Point(var x, var y)
>     __where x == y: ...
> ```

Imperative guard:
case Point(var x, var y):
   if (x != y)  continue switch;

> 
> Bindings from the pattern would have to be available in guard expressions.

Comes for free with imperative guards.

> ...the more complex guards are, the harder it is
> to tell where the case label ends and the "body" begins.  (And worse if we allow
> switch expressions inside guards.)

An imperative guards is just control flow.  That’s pretty hard to miss.
(It’s also uglier than a declarative guard; maybe we can add them later as sugar.)

> An alternate to guards is to allow an imperative `continue` statement in
> `switch`, which would mean "keep trying to match from the next label."  Given
> the existing semantics of `continue`, this is a natural extension, but since
> `continue` does not currently have meaning for switch, some work would have to
> be done to disambiguate continue statements in switches enclosed in loops.

Indeed, that’s really the only bikeshed that needs painting.

At an absolute minimum allow “continue L”, where “L:” precedes
a (statement) switch, to mean “restart matching after current case”.
As a bonus this fixes the sharp edge I mentioned above, of wanting
to fall through to default from *two or more* cases in a classic switch.

I also suggest “continue switch”.  Maybe while we are at it “continue for”,
etc.  (I.e., every breakable or continuable construct accepts its own
head keyword in place of a label.)  This covers expression switches,
which do not admit labels.

> The
> imperative version is strictly more expressive than most reasonable forms of the
> declarative version,

Indeed.  It also translates really easily.

> but users are likely to prefer the declarative version.

Then offer them sugar, but later, for dessert.  Not yet.
```
case P __OnlyIf E:
//desugars to:
case P: {if (!(E)) continue switch;}
```

> ## Nulls
> 
> Almost no language design exercise is complete without some degree of wrestling
> with `null`.  As we define more complex patterns than simple type patterns, and
> extend constructs such as `switch` (which have existing opinions about nullity)
> to support patterns, we need to have a clear understanding of which patterns
> are nullable, and separate the nullity behaviors of patterns from the nullity
> behaviors of those constructs which use patterns.  

Let ‘em flow through the patterns.  Arrange to catch ‘em
with `case null` (only at the top) and a total case (only at
the bottom).  Keep the existing NPE for `default:` if there
wasn’t a `case null` (and there must not have been a previous
total case).

I also like `default P:` as meaning “P had better be total.” 
Which in turn means “yeah, nulls flow here”.  But that’s
a separable feature, which can be carved off for later.

> ## Nullity and patterns
> 
> This topic has a number of easily-tangled concerns:
> 
>  - **Construct nullability.**  Constructs to which we want to add pattern
>    awareness (`instanceof`, `switch`) already have their own opinion about
>    nulls.  Currently, `instanceof` always says false when presented with a
>    `null`, and `switch` always NPEs.  We may, or may not, wish to refine these
>    rules in some cases.

For now keep instanceof null-free by disallowing expressions
which, in a switch, would have allowed nulls.  That means total
patterns (except those mandated by legacy) and `null`.  Revisit
later.  Would that meet the various requirements we’ve discussed?
(It’s hard to keep track.)

>  - **Pattern nullability.**  Some patterns clearly would never match `null`
>    (such as deconstruction patterns), whereas others (an "any" pattern, and
>    surely the `null` constant pattern) might make sense to match null.

null and T where T <: X (X the target type) can match null
Nothing else can at this point.

(Future discussion:  I think there’s an argument still to be had about
*static* matchers matching null.  My take on that is currently “let ‘em
flow.”)

>  - **Refactoring friendliness.**  There are a number of cases that we would like
>    to freely refactor back and forth, such as certain chains of `if ... else if`
>    with switches.

Yes, refactoring is a deep test of good design.  (Sorry about case Object vs.
instanceof Object, though.  I think that’s a seam we live with.)

>  - **Nesting vs top-level.**  The "obvious" thing to do at the top level of a
>    construct is not always the "obvious" thing to do in a nested construct.

I think we are in uneasy agreement that a pattern does not intrinsically
issue NPEs, but either lets nulls through or rejects them.  Whether or not
the programmer approves of or is conscious of nulls (we can’t guess)

Meanwhile, the construct using a pattern has a null policy:
1. instanceof : return false
2. switch : issue NPE if there is no statically nullable pattern (no chance of matching)
3. nested patterns : let ‘em flow (FUTURE WORK)

>  - **Totality vs partiality.**  When a pattern is partial on the operand type
>    (e.g., `case String` when the operand of switch is `Object`), it is almost
>    never the case we want to match null (except in the case of the `null`
>    constant pattern), whereas when a pattern is total on the operand type (e.g.,
>    `case Object` in the same example), it is more justifiable to match null.

Again, uneasy agreement.  Let’s do it this way.

Optionally,  consider a notation which lets the programmer say,
“this last pattern here should be total, now and forever”.
Reward the extra information by allowing “default:” to be dropped.
Support this for enum switches also, maybe even primitive switches.
Specify an exception to be thrown if execution ever runs past
the “default” point.

>  - **Inference.**  It would be nice if a `var` pattern were simply inference for
>    a type pattern, rather than some possibly-non-denotable union.

Yes please.  “var” should always be refactorable with a denotable type pattern.

> #### Construct and pattern nullability
> 
> Currently, `instanceof` always says `false` on `null`, and `switch` always
> throws on `null`.  Whatever null opinions a construct has, these are applied
> before we even test any patterns.  

I’m OK with this, although I prefer to think about it as something
that happens after all the patterns fail to match the null.
In fact, I’d hope that my preferred frame of mind is a legitimate
alternative account; i.e., you can’t tell the difference, except
maybe from a line number in the debugger.

> A similar sharp corner is the decomposition of a nested pattern `P(Q)` into
> `P(alpha) & alpha instanceof Q`; while this is intended to be a universally
> valid transformation, if P's 1st component might be null and Q is total,  this
> transformation would not be valid because of the existing (mild) null-hostility
> of `instanceof`.  Again, we may be able to address this by adjusting the rules
> surrounding `instanceof` slightly.

I hope so.  We don’t need to decide this yet, until we add in deconstructors…?

There was some discussion of `default P(Q,R)` vs. `case P(default Q,default R)`.
I think this needs more thought.

> ...
> Given that the `case null` appears so close to the `switch`, it does not seem
> confusing that this switch would match `null`; the existence of `case null` at
> the top of the switch makes it pretty clear that this is intended behavior.  (We
> could further restrict the null pattern to being the first pattern in a switch,
> to make this clearer.)

Granted.

> Now, let's look at the other end of the switch -- the last case.  What if the
> last pattern is a total pattern?  (Note that if any `case` has a total pattern,
> it _must_ be the last one, otherwise the cases after that would be dead, which
> would be an error.)  Is it also reasonable for that to match null?  After all,
> we're saying "everything”:

Granted; the last case is inevitably special.

(Though maybe we need that extra marker to unlock the
no-default-needed reward.)
> 
> So far, we're suggesting:
> 
>  - A switch with a constant `null` case  will accept nulls;
>  - If present, a constant `null` case must go first;
>  - A switch with a total (any) case matches also accepts nulls;
>  - If present, a total (any) case must go last.

Yes.

> 
> #### Relocating the problem
> 
> It might be more helpful to view these changes as not changing the behavior of
> `switch`, but of the `default` case of `switch`.  

(See above.  If it’s just perspective I don’t much care.)

> 
> ...
> The main casualty here is that the `default` case does not mean the same
> thing as `case var x` or `case Object o`.  We can't deprecate `default`, but
> for pattern switches, it becomes much less useful.  

(Subsequent discussion has maybe revived default as a useful syntax.)

> #### What about method (declared) patterns?
> ...

(More later, I hope.)