Next up for patterns: type patterns in switch
Remi Forax
forax at univ-mlv.fr
Thu Jul 23 18:38:12 UTC 2020
Right now: add only the type Pattern and the null Pattern so people can test and report if we go in the right direction or not.
Don't do deconstruction now ! We are not ready :)
In next releases? Add constant Pattern and static Pattern. Ban instance pattern because you don't want to match something that depends on a mutable states (my worst experience with Scala).
On guards, they do not work well with exhaustiveness. Still not sure they are useful.
On nullity, I prefer the total pattern to be explicit instead of implicit (given we have var, the type on the expression taken by the switch can be fuzzy for someone doing a quick look to the code), case any x, case _ x or whatever the syntax is, is more readable IMO.
And I agree on banning total patterns in instanceof.
Remi
On July 23, 2020 4:20:01 PM UTC, Brian Goetz <brian.goetz at oracle.com> wrote:
>In case I wasn't clear, this was a proposal to proceed on _right now_.
>
>There's been no comments so far; should I take that as "perfect, ship
>it?"
>
>
>On 6/24/2020 10:44 AM, Brian Goetz wrote:
>> There are a lot of directions we could take next for pattern
>> matching. The one that builds most on what we've already done, and
>> offers significant incremental expressiveness, is extending the type
>> patterns we already have to a new context: switch. (There is still
>> plenty of work to do on deconstruction patterns, pattern assignment,
>> etc, but these require more design work.)
>>
>> Here's an overview of where I think we are here.
>>
>> [JEP 305][jep305] introduced the first phase of [pattern
>> matching][patternmatch]
>> into the Java language. It was deliberately limited, focusing on
>only
>> one kind
>> of pattern (type test patterns) and one linguistic context
>(`instanceof`).
>> Having introduced the concept to Java developers, we can now extend
>> both the
>> kinds of patterns and the linguistic context where patterns are used.
>>
>> ## Patterns in switch
>>
>> The obvious next context in which to introduce pattern matching is
>> `switch`; a
>> switch using patterns as `case` labels can replace `if .. else if`
>> chains with
>> a more direct way of expressing a multi-way conditional.
>>
>> Unfortunately, `switch` is one of the most complex, irregular
>> constructs we have
>> in Java, so we must teach it some new tricks while avoiding some
>> existing traps.
>> Such tricks and traps may include:
>>
>> **Typing.** Currently, the operand of a `switch` may only be one of
>the
>> integral primitive types, the box type of an integral primitive,
>> `String`, or an
>> `enum` type. (Further, if the `switch` operand is an `enum` type,
>the
>> `case`
>> labels must be _unqualified_ enum constant names.) Clearly we can
>> relax this
>> restriction to allow other types, and constrain the case labels to
>only be
>> patterns that are applicable to that type, but it may leave a seam of
>
>> "legacy"
>> vs "pattern" switch, especially if we do not adopt bare constant
>> literals as
>> the denotation of constant patterns. (We have confronted this issue
>> before with
>> expression switch, and concluded that it was better to rehabilitate
>> the `switch`
>> we have rather than create a new construct, and we will make the same
>
>> choice
>> here, but the cost of this is often a visible seam.)
>>
>> **Parsing.** The grammar currently specifies that the operand of a
>> `case` label
>> is a `CaseConstant`, which casts a wide syntactic net, later narrowed
>with
>> post-checks after attribution. This means that, since parsing is
>done
>> before we
>> know the type of the operand, we must be watchful for ambiguities
>between
>> patterns and expressions (and possibly refine the production for
>> `case` labels.)
>>
>> **Nullity.** The `switch` construct is currently hostile to `null`,
>> but some
>> patterns do match `null`, and it may be desirable if nulls can be
>handled
>> within a suitably crafted `switch`.
>>
>> **Exhaustiveness.** For switches over the permitted subtypes of
>> sealed types,
>> we will want to be able to do exhaustiveness analysis -- including
>for
>> nested
>> patterns (i.e., if `Shape` is `Circle` or `Rect`, then `Box(Circle
>> c)` and
>> `Box(Rect r)` are exhaustive on `Box<Shape>`.)
>>
>> **Fallthrough.** Fallthrough is everyone's least favorite feature of
>
>> `switch`,
>> but it exists for a reason. (The mistake was making fallthrough the
>> default
>> behavior, but that ship has sailed.) In the absence of an OR pattern
>> combinator, one might find fallthrough in switch useful in
>conjunction
>> with
>> patterns:
>>
>> ```
>> case Box(int x):
>> case Bag(int x):
>> // use x
>> ```
>>
>> However, it is likely that we will, at least initially, disallow
>> falling out
>> of, or into, a case label with binding variables.
>>
>> #### Translation
>>
>> Switches on primitives and their wrapper types are translated using
>the
>> `tableswitch` or `lookupswitch` bytecodes; switches on strings and
>> enums are
>> lowered in the compiler to switches involving hash codes (for
>strings) or
>> ordinals (for enums.)
>>
>> For switches on patterns, we would need a new strategy, one likely
>> built on
>> `invokedynamic`, where we lower the cases to a densely numbered `int`
>
>> switch,
>> and then invoke a classifier function with the operand which tells us
>
>> the first
>> case number it matches. So a switch like:
>>
>> ```
>> switch (o) {
>> case P: A
>> case Q: B
>> }
>> ```
>>
>> is lowered to:
>>
>> ```
>> int target = indy[BSM=PatternSwitch, args=[P,Q]](o)
>> switch (target) {
>> case 0: A
>> case 1: B
>> }
>> ```
>>
>> A symbolic description of the patterns is provided as the bootstrap
>> argument
>> list, which builds a decision tree based on analysis of the patterns
>> and their
>> target types.
>>
>> #### Guards
>>
>> No matter how rich our patterns are, it is often the case that we
>will
>> want
>> to provide additional filtering on the results of a pattern:
>>
>> ```
>> if (shape instanceof Cylinder c && c.color() == RED) { ... }
>> ```
>>
>> Because we use `instanceof` as part of a boolean expression, it is
>easy to
>> narrow the results by conjoining additional checks with `&&`. But in
>
>> a `case`
>> label, we do not necessarily have this opportunity. Worse, the
>> semantics of
>> `switch` mean that once a `case` label is selected, there is no way
>to say
>> "oops, forget it, keep trying from the next label".
>>
>> It is common in languages with pattern matching to support some form
>> of "guard"
>> expression, which is a boolean expression that conditions whether the
>case
>> matches, such as:
>>
>> ```
>> case Point(var x, var y)
>> __where x == y: ...
>> ```
>>
>> Bindings from the pattern would have to be available in guard
>expressions.
>>
>> Syntactic options (and hazards) for guards abound; users would
>> probably find it
>> natural to reuse `&&` to attach guards to patterns; C# has chosen
>> `when` for
>> introducing guards; we could use `case P if (e)`, etc. Whatever we do
>
>> here,
>> there is a readability risk, as the more complex guards are, the
>> harder it is
>> to tell where the case label ends and the "body" begins. (And worse
>> if we allow
>> switch expressions inside guards.)
>>
>> An alternate to guards is to allow an imperative `continue` statement
>in
>> `switch`, which would mean "keep trying to match from the next
>> label." Given
>> the existing semantics of `continue`, this is a natural extension,
>but
>> since
>> `continue` does not currently have meaning for switch, some work
>would
>> have to
>> be done to disambiguate continue statements in switches enclosed in
>> loops. The
>> imperative version is strictly more expressive than most reasonable
>> forms of the
>> declarative version, but users are likely to prefer the declarative
>> version.
>>
>> ## Nulls
>>
>> Almost no language design exercise is complete without some degree of
>
>> wrestling
>> with `null`. As we define more complex patterns than simple type
>> patterns, and
>> extend constructs such as `switch` (which have existing opinions
>about
>> nullity)
>> to support patterns, we need to have a clear understanding of which
>> patterns
>> are nullable, and separate the nullity behaviors of patterns from the
>
>> nullity
>> behaviors of those constructs which use patterns.
>>
>> ## Nullity and patterns
>>
>> This topic has a number of easily-tangled concerns:
>>
>> - **Construct nullability.** Constructs to which we want to add
>pattern
>> awareness (`instanceof`, `switch`) already have their own opinion
>about
>> nulls. Currently, `instanceof` always says false when presented
>with a
>> `null`, and `switch` always NPEs. We may, or may not, wish to
>> refine these
>> rules in some cases.
>> - **Pattern nullability.** Some patterns clearly would never match
>> `null`
>> (such as deconstruction patterns), whereas others (an "any"
>> pattern, and
>> surely the `null` constant pattern) might make sense to match
>null.
>> - **Refactoring friendliness.** There are a number of cases that we
>
>> would like
>> to freely refactor back and forth, such as certain chains of `if
>> ... else if`
>> with switches.
>> - **Nesting vs top-level.** The "obvious" thing to do at the top
>> level of a
>> construct is not always the "obvious" thing to do in a nested
>> construct.
>> - **Totality vs partiality.** When a pattern is partial on the
>> operand type
>> (e.g., `case String` when the operand of switch is `Object`), it
>is
>> almost
>> never the case we want to match null (except in the case of the
>`null`
>> constant pattern), whereas when a pattern is total on the operand
>> type (e.g.,
>> `case Object` in the same example), it is more justifiable to
>match
>> null.
>> - **Inference.** It would be nice if a `var` pattern were simply
>> inference for
>> a type pattern, rather than some possibly-non-denotable union.
>>
>> As a starting example, consider:
>>
>> ```
>> record Box(Object o) { }
>>
>> Box box = ...
>> switch (box) {
>> case Box(Chocolate c):
>> case Box(Frog f):
>> case Box(var o):
>> }
>> ```
>>
>> It would be highly confusing and error-prone for either of the first
>two
>> patterns to match `Box(null)` -- given that `Chocolate` and `Frog`
>> have no type
>> relation, it should be perfectly safe to reorder the two. But,
>because
>> the last
>> pattern seems so obviously total on boxes, it is quite likely that
>> what the
>> author wants is to match all remaining boxes, including those that
>> contain null.
>> (Further, it would be terrible if there were _no_ way to say "Match
>> any `Box`,
>> even if it contains `null`. (While one might initially think this
>> could be
>> repaired with OR patterns, imagine that `Box` had _n_ components --
>> we'd need to
>> OR together _2^n_ patterns, with complex merging, to express all the
>> possible
>> combinations of nullity.))
>>
>> Scala and C# took the approach of saying that "var" patterns are not
>> just type
>> inference, they are "any" patterns -- so `Box(Object o)` matches
>boxes
>> containing a non-null payload, where `Box(var o)` matches all boxes.
>This
>> means, unfortunately, that `var` is not mere type inference -- which
>> complicates
>> the role of `var` in the language considerably. Users should not
>have
>> to choose
>> between the semantics they want and being explicit about types; these
>
>> should be
>> orthogonal choices. The above `switch` should be equivalent to:
>>
>> ```
>> Box box = ...
>> switch (box) {
>> case Box(Chocolate c):
>> case Box(Frog f):
>> case Box(Object o):
>> }
>> ```
>>
>> and the choice to use `Object` or `var` should be solely one of
>> whether the
>> manifest types are deemed to improve or impair readability.
>>
>> #### Construct and pattern nullability
>>
>> Currently, `instanceof` always says `false` on `null`, and `switch`
>always
>> throws on `null`. Whatever null opinions a construct has, these are
>> applied
>> before we even test any patterns.
>>
>> We can formalize the intuition outlined above as: type patterns that
>> are _total_
>> on their target operand (`var x`, and `T t` on an operand of type
>`U`,
>> where `U
>> <: T`) match null, and non-total type patterns do not. (Another way
>to say
>> this is: a `var` pattern is the "any" pattern, and a type pattern
>that
>> is total
>> on its operand type is also an "any" pattern.) Additionally, the
>`null`
>> constant pattern matches null. These are the _only_ nullable
>patterns.
>>
>> In our `Box` example, this means that the last case (whether written
>> as `Box(var
>> o)` or `Box(Object o)`) matches all boxes, including those containing
>null
>> (because the nested pattern is total on the nested operand), but the
>> first two
>> cases do not.
>>
>> If we retain the current absolute hostility of `switch` to nulls, we
>can't
>> trivially refactor from
>>
>> ```
>> switch (o) {
>> case Box(Chocolate c):
>> case Box(Frog f):
>> case Box(var o):
>> }
>> ```
>> to
>>
>> ```
>> switch (o) {
>> case Box(var contents):
>> switch (contents) {
>> case Chocolate c:
>> case Frog f:
>> case Object o:
>> }
>> }
>> }
>> ```
>>
>> because the inner `switch(contents)` would NPE before we tried to
>> match any of
>> the patterns it contains. Instead, the user would explicitly have to
>
>> do an `if
>> (contents == null)` test, and, if the intent was to handle `null` in
>> the same
>> way as the `Object o` case, some duplication of code would be
>needed.
>> We can
>> address this sharp corner by slightly relaxing the null-hostility of
>> `switch`,
>> as described below.
>>
>> A similar sharp corner is the decomposition of a nested pattern
>`P(Q)`
>> into
>> `P(alpha) & alpha instanceof Q`; while this is intended to be a
>> universally
>> valid transformation, if P's 1st component might be null and Q is
>> total, this
>> transformation would not be valid because of the existing (mild)
>> null-hostility
>> of `instanceof`. Again, we may be able to address this by adjusting
>> the rules
>> surrounding `instanceof` slightly.
>>
>> ## Generalizing switch
>>
>> The refactoring example above motivates why we might want to relax
>the
>> null-handling behavior of `switch`. On the other hand, the one thing
>the
>> current behavior has going for it is that at least the current
>> behavior is easy
>> to reason about; it always throws when confronted with a `null`. Any
>
>> relaxed
>> behavior would be more complex; some switches would still have to
>> throw (for
>> compatibility with existing semantics), and some (which can't be
>expressed
>> today) would accept nulls. This is a tricky balance to achieve, but
>I
>> think we
>> have a found a good one.
>>
>> A starting point is that we don't want to require readers to do an
>_O(n)_
>> analysis of each of the `case` labels just to determine whether a
>> given switch
>> accepts `null` or not; this should be an _O(1)_ analysis. (We do not
>
>> want to
>> introduce a new flavor of `switch`, such as `switch-nullable`; this
>> might seem
>> to fix the proximate problem but would surely create others. As we've
>
>> done with
>> expression switch and patterns, we'd rather rehabilitate `switch`
>than
>> create
>> an almost-but-not-quite-the-same variant.)
>>
>> Let's start with the null pattern, which we'll spell for sake of
>> exposition
>> `case null`. What if you were allowed to say `case null` in a
>switch,
>> and the
>> switch would do the obvious thing?
>>
>> ```
>> switch (o) {
>> case null -> System.out.println("Ugh, null");
>> case String s -> System.out.println("Yay, non-null: " + s);
>> }
>> ```
>>
>> Given that the `case null` appears so close to the `switch`, it does
>> not seem
>> confusing that this switch would match `null`; the existence of `case
>
>> null` at
>> the top of the switch makes it pretty clear that this is intended
>> behavior. (We
>> could further restrict the null pattern to being the first pattern in
>
>> a switch,
>> to make this clearer.)
>>
>> Now, let's look at the other end of the switch -- the last case.
>What
>> if the
>> last pattern is a total pattern? (Note that if any `case` has a
>total
>> pattern,
>> it _must_ be the last one, otherwise the cases after that would be
>> dead, which
>> would be an error.) Is it also reasonable for that to match null?
>> After all,
>> we're saying "everything":
>>
>> ```
>> switch (o) {
>> case String s: ...
>> case Object o: ...
>> }
>> ```
>>
>> Under this interpretation, the switch-refactoring anomaly above goes
>away.
>>
>> The direction we're going here is that if we can localize the
>> null-acceptance of
>> switches in the first (is it `case null`?) and last (is it total?)
>> cases, then
>> the incremental complexity of allowing _some_ switches to accept null
>
>> might be
>> outweighed by the incremental benefit of treating `null` more
>> uniformly (and
>> thus eliminating the refactoring anomalies.) Note also that there is
>
>> no actual
>> code compatibility issue; this is all mental-model compatibility.
>>
>> So far, we're suggesting:
>>
>> - A switch with a constant `null` case will accept nulls;
>> - If present, a constant `null` case must go first;
>> - A switch with a total (any) case matches also accepts nulls;
>> - If present, a total (any) case must go last.
>>
>> #### Relocating the problem
>>
>> It might be more helpful to view these changes as not changing the
>> behavior of
>> `switch`, but of the `default` case of `switch`. We can equally well
>
>> interpret
>> the current behavior as:
>>
>> - `switch` always accepts `null`, but matching the `default` case of
>
>> a `switch`
>> throws `NullPointerException`;
>> - any `switch` without a `default` case has an implicit do-nothing
>> `default`
>> case.
>>
>> If we adopt this change of perspective, then `default`, not `switch`,
>
>> is in
>> control of the null rejection behavior -- and we can view these
>changes as
>> adjusting the behavior of `default`. So we can recast the proposed
>> changes as:
>>
>> - Switches accept null;
>> - A constant `null` case will match nulls, and must go first;
>> - A total switch (a switch with a total `case`) cannot have a
>> `default` case;
>> - A non-total switch without a `default` case gets an implicit
>> do-nothing
>> `default` case;
>> - Matching the (implicit or explicit) default case with a `null`
>operand
>> always throws NPE.
>>
>> The main casualty here is that the `default` case does not mean the
>same
>> thing as `case var x` or `case Object o`. We can't deprecate
>> `default`, but
>> for pattern switches, it becomes much less useful.
>>
>> #### What about method (declared) patterns?
>>
>> So far, we've declared all patterns, except the `null` constant
>> pattern and the
>> total (any) pattern, to not match `null`. What about patterns that
>are
>> explicitly declared in code? It turns out we can rule out these
>matching
>> `null` fairly easily.
>>
>> We can divide declared patterns into three kinds: deconstruction
>> patterns (dual
>> to constructors), static patterns (dual to static methods), and
>instance
>> patterns (dual to instance methods.) For both deconstruction and
>instance
>> patterns, the match target becomes the receiver; method bodies are
>never
>> expected to deal with the case where `this == null`.
>>
>> For static patterns, it is conceivable that they could match `null`,
>> but this
>> would put a fairly serious burden on writers of static patterns to
>> check for
>> `null` -- which they would invariably forget, and many more NPEs
>would
>> ensue.
>> (Think about writing the pattern for `Optional.of(T t)` -- it would
>be
>> overwhelmingly likely we'd forget to check the target for nullity.)
>> SO there
>> is a strong argument to simply say "declared patterns never match
>> null", to
>> not put writers of such patterns in this situation.
>>
>> So, only the top and bottom patterns in a switch could match null; if
>
>> the top
>> pattern is not `case null`, and the bottom pattern is not total, then
>
>> the switch
>> throws NPE on null, otherwise it accepts null.
>>
>> #### Adjusting instanceof
>>
>> The remaining anomaly we had was about unrolling nested patterns when
>
>> the inner
>> pattern is total. We can plug this by simply outlawing total
>patterns in
>> `instanceof`.
>>
>> This may seem like a cheap trick, but it makes sense on its own. If
>the
>> following statement was allowed:
>>
>> ```
>> if (e instanceof var x) { X }
>> ```
>>
>> it would simply be confusing; on the one hand, it looks like it
>should
>> always
>> match, but on the other, `instanceof` is historically null-hostile.
>> And, if the
>> pattern always matches, then the `if` statement is silly; it should
>be
>> replaced
>> with:
>>
>> ```
>> var x = e;
>> X
>> ```
>>
>> since there's nothing conditional about it. So by banning "any"
>> patterns on the
>> RHS of `instanceof`, we both avoid a confusion about what is going to
>
>> happen,
>> and we prevent the unrolling anomaly.
>>
>> For reasons of compatibility, we will have to continue to allow
>>
>> ```
>> if (e instanceof Object) { ... }
>> ```
>>
>> which succeeds on all non-null operands.
>>
>> We've been a little sloppy with the terminology of "any" vs "total";
>> note that
>> in
>>
>> ```
>> Point p;
>> if (p instanceof Point(var x, var y)) { }
>> ```
>>
>> the pattern `Point(var x, var y)` is total on `Point`, but not an
>> "any" pattern
>> -- it still doesn't match on p == null.
>>
>> On the theory that an "any" pattern in `instanceof` is silly, we may
>> also want
>> to ban other "silly" patterns in `instanceof`, such as constant
>> patterns, since
>> all of the following have simpler forms:
>>
>> ```
>> if (x instanceof null) { ... }
>> if (x instanceof "") { ... }
>> if (i instanceof 3) { ... }
>> ```
>>
>> In the first round (type patterns in `instanceof`), we mostly didn't
>> confront
>> this issue, saying that `instanceof T t` matched in all the cases
>where
>> `instanceof T` would match. But given that the solution for `switch`
>
>> relies
>> on "any" patterns matching null, we may wish to adjust the behavior
>of
>> `instanceof` before it exits preview.
>>
>>
>> [jep305]: https://openjdk.java.net/jeps/305
>> [patternmatch]: pattern-match.html
>>
--
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200723/473cf827/attachment-0001.htm>
More information about the amber-spec-experts
mailing list