Next up for patterns: type patterns in switch
Brian Goetz
brian.goetz at oracle.com
Thu Jul 23 16:20:01 UTC 2020
In case I wasn't clear, this was a proposal to proceed on _right now_.
There's been no comments so far; should I take that as "perfect, ship it?"
On 6/24/2020 10:44 AM, Brian Goetz wrote:
> There are a lot of directions we could take next for pattern
> matching. The one that builds most on what we've already done, and
> offers significant incremental expressiveness, is extending the type
> patterns we already have to a new context: switch. (There is still
> plenty of work to do on deconstruction patterns, pattern assignment,
> etc, but these require more design work.)
>
> Here's an overview of where I think we are here.
>
> [JEP 305][jep305] introduced the first phase of [pattern
> matching][patternmatch]
> into the Java language. It was deliberately limited, focusing on only
> one kind
> of pattern (type test patterns) and one linguistic context (`instanceof`).
> Having introduced the concept to Java developers, we can now extend
> both the
> kinds of patterns and the linguistic context where patterns are used.
>
> ## Patterns in switch
>
> The obvious next context in which to introduce pattern matching is
> `switch`; a
> switch using patterns as `case` labels can replace `if .. else if`
> chains with
> a more direct way of expressing a multi-way conditional.
>
> Unfortunately, `switch` is one of the most complex, irregular
> constructs we have
> in Java, so we must teach it some new tricks while avoiding some
> existing traps.
> Such tricks and traps may include:
>
> **Typing.** Currently, the operand of a `switch` may only be one of the
> integral primitive types, the box type of an integral primitive,
> `String`, or an
> `enum` type. (Further, if the `switch` operand is an `enum` type, the
> `case`
> labels must be _unqualified_ enum constant names.) Clearly we can
> relax this
> restriction to allow other types, and constrain the case labels to only be
> patterns that are applicable to that type, but it may leave a seam of
> "legacy"
> vs "pattern" switch, especially if we do not adopt bare constant
> literals as
> the denotation of constant patterns. (We have confronted this issue
> before with
> expression switch, and concluded that it was better to rehabilitate
> the `switch`
> we have rather than create a new construct, and we will make the same
> choice
> here, but the cost of this is often a visible seam.)
>
> **Parsing.** The grammar currently specifies that the operand of a
> `case` label
> is a `CaseConstant`, which casts a wide syntactic net, later narrowed with
> post-checks after attribution. This means that, since parsing is done
> before we
> know the type of the operand, we must be watchful for ambiguities between
> patterns and expressions (and possibly refine the production for
> `case` labels.)
>
> **Nullity.** The `switch` construct is currently hostile to `null`,
> but some
> patterns do match `null`, and it may be desirable if nulls can be handled
> within a suitably crafted `switch`.
>
> **Exhaustiveness.** For switches over the permitted subtypes of
> sealed types,
> we will want to be able to do exhaustiveness analysis -- including for
> nested
> patterns (i.e., if `Shape` is `Circle` or `Rect`, then `Box(Circle
> c)` and
> `Box(Rect r)` are exhaustive on `Box<Shape>`.)
>
> **Fallthrough.** Fallthrough is everyone's least favorite feature of
> `switch`,
> but it exists for a reason. (The mistake was making fallthrough the
> default
> behavior, but that ship has sailed.) In the absence of an OR pattern
> combinator, one might find fallthrough in switch useful in conjunction
> with
> patterns:
>
> ```
> case Box(int x):
> case Bag(int x):
> // use x
> ```
>
> However, it is likely that we will, at least initially, disallow
> falling out
> of, or into, a case label with binding variables.
>
> #### Translation
>
> Switches on primitives and their wrapper types are translated using the
> `tableswitch` or `lookupswitch` bytecodes; switches on strings and
> enums are
> lowered in the compiler to switches involving hash codes (for strings) or
> ordinals (for enums.)
>
> For switches on patterns, we would need a new strategy, one likely
> built on
> `invokedynamic`, where we lower the cases to a densely numbered `int`
> switch,
> and then invoke a classifier function with the operand which tells us
> the first
> case number it matches. So a switch like:
>
> ```
> switch (o) {
> case P: A
> case Q: B
> }
> ```
>
> is lowered to:
>
> ```
> int target = indy[BSM=PatternSwitch, args=[P,Q]](o)
> switch (target) {
> case 0: A
> case 1: B
> }
> ```
>
> A symbolic description of the patterns is provided as the bootstrap
> argument
> list, which builds a decision tree based on analysis of the patterns
> and their
> target types.
>
> #### Guards
>
> No matter how rich our patterns are, it is often the case that we will
> want
> to provide additional filtering on the results of a pattern:
>
> ```
> if (shape instanceof Cylinder c && c.color() == RED) { ... }
> ```
>
> Because we use `instanceof` as part of a boolean expression, it is easy to
> narrow the results by conjoining additional checks with `&&`. But in
> a `case`
> label, we do not necessarily have this opportunity. Worse, the
> semantics of
> `switch` mean that once a `case` label is selected, there is no way to say
> "oops, forget it, keep trying from the next label".
>
> It is common in languages with pattern matching to support some form
> of "guard"
> expression, which is a boolean expression that conditions whether the case
> matches, such as:
>
> ```
> case Point(var x, var y)
> __where x == y: ...
> ```
>
> Bindings from the pattern would have to be available in guard expressions.
>
> Syntactic options (and hazards) for guards abound; users would
> probably find it
> natural to reuse `&&` to attach guards to patterns; C# has chosen
> `when` for
> introducing guards; we could use `case P if (e)`, etc. Whatever we do
> here,
> there is a readability risk, as the more complex guards are, the
> harder it is
> to tell where the case label ends and the "body" begins. (And worse
> if we allow
> switch expressions inside guards.)
>
> An alternate to guards is to allow an imperative `continue` statement in
> `switch`, which would mean "keep trying to match from the next
> label." Given
> the existing semantics of `continue`, this is a natural extension, but
> since
> `continue` does not currently have meaning for switch, some work would
> have to
> be done to disambiguate continue statements in switches enclosed in
> loops. The
> imperative version is strictly more expressive than most reasonable
> forms of the
> declarative version, but users are likely to prefer the declarative
> version.
>
> ## Nulls
>
> Almost no language design exercise is complete without some degree of
> wrestling
> with `null`. As we define more complex patterns than simple type
> patterns, and
> extend constructs such as `switch` (which have existing opinions about
> nullity)
> to support patterns, we need to have a clear understanding of which
> patterns
> are nullable, and separate the nullity behaviors of patterns from the
> nullity
> behaviors of those constructs which use patterns.
>
> ## Nullity and patterns
>
> This topic has a number of easily-tangled concerns:
>
> - **Construct nullability.** Constructs to which we want to add pattern
> awareness (`instanceof`, `switch`) already have their own opinion about
> nulls. Currently, `instanceof` always says false when presented with a
> `null`, and `switch` always NPEs. We may, or may not, wish to
> refine these
> rules in some cases.
> - **Pattern nullability.** Some patterns clearly would never match
> `null`
> (such as deconstruction patterns), whereas others (an "any"
> pattern, and
> surely the `null` constant pattern) might make sense to match null.
> - **Refactoring friendliness.** There are a number of cases that we
> would like
> to freely refactor back and forth, such as certain chains of `if
> ... else if`
> with switches.
> - **Nesting vs top-level.** The "obvious" thing to do at the top
> level of a
> construct is not always the "obvious" thing to do in a nested
> construct.
> - **Totality vs partiality.** When a pattern is partial on the
> operand type
> (e.g., `case String` when the operand of switch is `Object`), it is
> almost
> never the case we want to match null (except in the case of the `null`
> constant pattern), whereas when a pattern is total on the operand
> type (e.g.,
> `case Object` in the same example), it is more justifiable to match
> null.
> - **Inference.** It would be nice if a `var` pattern were simply
> inference for
> a type pattern, rather than some possibly-non-denotable union.
>
> As a starting example, consider:
>
> ```
> record Box(Object o) { }
>
> Box box = ...
> switch (box) {
> case Box(Chocolate c):
> case Box(Frog f):
> case Box(var o):
> }
> ```
>
> It would be highly confusing and error-prone for either of the first two
> patterns to match `Box(null)` -- given that `Chocolate` and `Frog`
> have no type
> relation, it should be perfectly safe to reorder the two. But, because
> the last
> pattern seems so obviously total on boxes, it is quite likely that
> what the
> author wants is to match all remaining boxes, including those that
> contain null.
> (Further, it would be terrible if there were _no_ way to say "Match
> any `Box`,
> even if it contains `null`. (While one might initially think this
> could be
> repaired with OR patterns, imagine that `Box` had _n_ components --
> we'd need to
> OR together _2^n_ patterns, with complex merging, to express all the
> possible
> combinations of nullity.))
>
> Scala and C# took the approach of saying that "var" patterns are not
> just type
> inference, they are "any" patterns -- so `Box(Object o)` matches boxes
> containing a non-null payload, where `Box(var o)` matches all boxes. This
> means, unfortunately, that `var` is not mere type inference -- which
> complicates
> the role of `var` in the language considerably. Users should not have
> to choose
> between the semantics they want and being explicit about types; these
> should be
> orthogonal choices. The above `switch` should be equivalent to:
>
> ```
> Box box = ...
> switch (box) {
> case Box(Chocolate c):
> case Box(Frog f):
> case Box(Object o):
> }
> ```
>
> and the choice to use `Object` or `var` should be solely one of
> whether the
> manifest types are deemed to improve or impair readability.
>
> #### Construct and pattern nullability
>
> Currently, `instanceof` always says `false` on `null`, and `switch` always
> throws on `null`. Whatever null opinions a construct has, these are
> applied
> before we even test any patterns.
>
> We can formalize the intuition outlined above as: type patterns that
> are _total_
> on their target operand (`var x`, and `T t` on an operand of type `U`,
> where `U
> <: T`) match null, and non-total type patterns do not. (Another way to say
> this is: a `var` pattern is the "any" pattern, and a type pattern that
> is total
> on its operand type is also an "any" pattern.) Additionally, the `null`
> constant pattern matches null. These are the _only_ nullable patterns.
>
> In our `Box` example, this means that the last case (whether written
> as `Box(var
> o)` or `Box(Object o)`) matches all boxes, including those containing null
> (because the nested pattern is total on the nested operand), but the
> first two
> cases do not.
>
> If we retain the current absolute hostility of `switch` to nulls, we can't
> trivially refactor from
>
> ```
> switch (o) {
> case Box(Chocolate c):
> case Box(Frog f):
> case Box(var o):
> }
> ```
> to
>
> ```
> switch (o) {
> case Box(var contents):
> switch (contents) {
> case Chocolate c:
> case Frog f:
> case Object o:
> }
> }
> }
> ```
>
> because the inner `switch(contents)` would NPE before we tried to
> match any of
> the patterns it contains. Instead, the user would explicitly have to
> do an `if
> (contents == null)` test, and, if the intent was to handle `null` in
> the same
> way as the `Object o` case, some duplication of code would be needed.
> We can
> address this sharp corner by slightly relaxing the null-hostility of
> `switch`,
> as described below.
>
> A similar sharp corner is the decomposition of a nested pattern `P(Q)`
> into
> `P(alpha) & alpha instanceof Q`; while this is intended to be a
> universally
> valid transformation, if P's 1st component might be null and Q is
> total, this
> transformation would not be valid because of the existing (mild)
> null-hostility
> of `instanceof`. Again, we may be able to address this by adjusting
> the rules
> surrounding `instanceof` slightly.
>
> ## Generalizing switch
>
> The refactoring example above motivates why we might want to relax the
> null-handling behavior of `switch`. On the other hand, the one thing the
> current behavior has going for it is that at least the current
> behavior is easy
> to reason about; it always throws when confronted with a `null`. Any
> relaxed
> behavior would be more complex; some switches would still have to
> throw (for
> compatibility with existing semantics), and some (which can't be expressed
> today) would accept nulls. This is a tricky balance to achieve, but I
> think we
> have a found a good one.
>
> A starting point is that we don't want to require readers to do an _O(n)_
> analysis of each of the `case` labels just to determine whether a
> given switch
> accepts `null` or not; this should be an _O(1)_ analysis. (We do not
> want to
> introduce a new flavor of `switch`, such as `switch-nullable`; this
> might seem
> to fix the proximate problem but would surely create others. As we've
> done with
> expression switch and patterns, we'd rather rehabilitate `switch` than
> create
> an almost-but-not-quite-the-same variant.)
>
> Let's start with the null pattern, which we'll spell for sake of
> exposition
> `case null`. What if you were allowed to say `case null` in a switch,
> and the
> switch would do the obvious thing?
>
> ```
> switch (o) {
> case null -> System.out.println("Ugh, null");
> case String s -> System.out.println("Yay, non-null: " + s);
> }
> ```
>
> Given that the `case null` appears so close to the `switch`, it does
> not seem
> confusing that this switch would match `null`; the existence of `case
> null` at
> the top of the switch makes it pretty clear that this is intended
> behavior. (We
> could further restrict the null pattern to being the first pattern in
> a switch,
> to make this clearer.)
>
> Now, let's look at the other end of the switch -- the last case. What
> if the
> last pattern is a total pattern? (Note that if any `case` has a total
> pattern,
> it _must_ be the last one, otherwise the cases after that would be
> dead, which
> would be an error.) Is it also reasonable for that to match null?
> After all,
> we're saying "everything":
>
> ```
> switch (o) {
> case String s: ...
> case Object o: ...
> }
> ```
>
> Under this interpretation, the switch-refactoring anomaly above goes away.
>
> The direction we're going here is that if we can localize the
> null-acceptance of
> switches in the first (is it `case null`?) and last (is it total?)
> cases, then
> the incremental complexity of allowing _some_ switches to accept null
> might be
> outweighed by the incremental benefit of treating `null` more
> uniformly (and
> thus eliminating the refactoring anomalies.) Note also that there is
> no actual
> code compatibility issue; this is all mental-model compatibility.
>
> So far, we're suggesting:
>
> - A switch with a constant `null` case will accept nulls;
> - If present, a constant `null` case must go first;
> - A switch with a total (any) case matches also accepts nulls;
> - If present, a total (any) case must go last.
>
> #### Relocating the problem
>
> It might be more helpful to view these changes as not changing the
> behavior of
> `switch`, but of the `default` case of `switch`. We can equally well
> interpret
> the current behavior as:
>
> - `switch` always accepts `null`, but matching the `default` case of
> a `switch`
> throws `NullPointerException`;
> - any `switch` without a `default` case has an implicit do-nothing
> `default`
> case.
>
> If we adopt this change of perspective, then `default`, not `switch`,
> is in
> control of the null rejection behavior -- and we can view these changes as
> adjusting the behavior of `default`. So we can recast the proposed
> changes as:
>
> - Switches accept null;
> - A constant `null` case will match nulls, and must go first;
> - A total switch (a switch with a total `case`) cannot have a
> `default` case;
> - A non-total switch without a `default` case gets an implicit
> do-nothing
> `default` case;
> - Matching the (implicit or explicit) default case with a `null` operand
> always throws NPE.
>
> The main casualty here is that the `default` case does not mean the same
> thing as `case var x` or `case Object o`. We can't deprecate
> `default`, but
> for pattern switches, it becomes much less useful.
>
> #### What about method (declared) patterns?
>
> So far, we've declared all patterns, except the `null` constant
> pattern and the
> total (any) pattern, to not match `null`. What about patterns that are
> explicitly declared in code? It turns out we can rule out these matching
> `null` fairly easily.
>
> We can divide declared patterns into three kinds: deconstruction
> patterns (dual
> to constructors), static patterns (dual to static methods), and instance
> patterns (dual to instance methods.) For both deconstruction and instance
> patterns, the match target becomes the receiver; method bodies are never
> expected to deal with the case where `this == null`.
>
> For static patterns, it is conceivable that they could match `null`,
> but this
> would put a fairly serious burden on writers of static patterns to
> check for
> `null` -- which they would invariably forget, and many more NPEs would
> ensue.
> (Think about writing the pattern for `Optional.of(T t)` -- it would be
> overwhelmingly likely we'd forget to check the target for nullity.)
> SO there
> is a strong argument to simply say "declared patterns never match
> null", to
> not put writers of such patterns in this situation.
>
> So, only the top and bottom patterns in a switch could match null; if
> the top
> pattern is not `case null`, and the bottom pattern is not total, then
> the switch
> throws NPE on null, otherwise it accepts null.
>
> #### Adjusting instanceof
>
> The remaining anomaly we had was about unrolling nested patterns when
> the inner
> pattern is total. We can plug this by simply outlawing total patterns in
> `instanceof`.
>
> This may seem like a cheap trick, but it makes sense on its own. If the
> following statement was allowed:
>
> ```
> if (e instanceof var x) { X }
> ```
>
> it would simply be confusing; on the one hand, it looks like it should
> always
> match, but on the other, `instanceof` is historically null-hostile.
> And, if the
> pattern always matches, then the `if` statement is silly; it should be
> replaced
> with:
>
> ```
> var x = e;
> X
> ```
>
> since there's nothing conditional about it. So by banning "any"
> patterns on the
> RHS of `instanceof`, we both avoid a confusion about what is going to
> happen,
> and we prevent the unrolling anomaly.
>
> For reasons of compatibility, we will have to continue to allow
>
> ```
> if (e instanceof Object) { ... }
> ```
>
> which succeeds on all non-null operands.
>
> We've been a little sloppy with the terminology of "any" vs "total";
> note that
> in
>
> ```
> Point p;
> if (p instanceof Point(var x, var y)) { }
> ```
>
> the pattern `Point(var x, var y)` is total on `Point`, but not an
> "any" pattern
> -- it still doesn't match on p == null.
>
> On the theory that an "any" pattern in `instanceof` is silly, we may
> also want
> to ban other "silly" patterns in `instanceof`, such as constant
> patterns, since
> all of the following have simpler forms:
>
> ```
> if (x instanceof null) { ... }
> if (x instanceof "") { ... }
> if (i instanceof 3) { ... }
> ```
>
> In the first round (type patterns in `instanceof`), we mostly didn't
> confront
> this issue, saying that `instanceof T t` matched in all the cases where
> `instanceof T` would match. But given that the solution for `switch`
> relies
> on "any" patterns matching null, we may wish to adjust the behavior of
> `instanceof` before it exits preview.
>
>
> [jep305]: https://openjdk.java.net/jeps/305
> [patternmatch]: pattern-match.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200723/c8618583/attachment-0001.htm>
More information about the amber-spec-experts
mailing list