[pattern-switch] Exhaustiveness

Thu Aug 20 20:34:27 UTC 2020

> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "Tagir Valeev" <amaembo at gmail.com>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Jeudi 20 Août 2020 21:09:00
> Objet: Re: [pattern-switch] Exhaustiveness

> Here's an attempt at a formalism for capturing this.

> There are several categories of patterns we might call total on a type T. We
> could refine the taxonomy as:

> - Strongly total -- matches all values of T.
> - Weakly total -- matches all values of T except perhaps null.

> What we want to do is characterize the aggregate totality on T of a _set_ of
> patterns P*. A set of patterns could in the aggregate be either of the above,
> or also:

> - Optimistically total -- matches all values of subtypes of T _known at compile
> time_, except perhaps null.

> Note that we have an ordering:

> partial < optimistically total < weakly total < strongly total

> Now, some rules about defining the totality of a set of patterns.

> T-Total: The singleton set containing the type pattern `T t` is strongly total
> on U <: T. (This is the rule we've been discussing, but that's not the point of
> this mail -- we just need a base case right now.)

> T-Subset: If a set of patterns P* contains a subset of patterns that is X-total
> on T, then P* is X-total on T.

> T-Sealed: If T is sealed to U* (direct subtypes only), and for each U in U*,
> there is some subset of P* that is optimistically total on U, then P* is
> optimistically total on T.

> T-Nested: Given a deconstructor D(U) and a collection of patterns { P1..Pn }, if
> { P1..Pn } is X-total on U, then { D(P1)..D(Pn) } is min(X,weak)-total on D.

> OK, examples. Let's say

> Container<T> = Box<T> | Bag<T>
> Shape = Round | Rect
> Round = Circle | Ellipse

> { Container c }: total on Container by T-Total.

> { Box b, Bag b }: optimistically total on Container

> - Container sealed to Box and Bag
> - `Box b` total on Box, `Bag b` total on Bag
> - { Box b, Bag b } optimistically total on Container by T-Sealed

> { Box(Round r), Box(Rect r) }: optimistically total on Box<Shape>

> - Box sealed to Round and Rect
> - { Round r, Rect r } optimistically total on Shape by T-Sealed
> - { Box(Round r), Box(Rect r) } optimistically total on Box<Shape> by T-Nested

> { Box(Object o) } weakly total on Box<Object>

> - Object o total on Object
> - { Object o } total on Object by T-Subset
> - { Box(Object o) } weakly total on Box<Object> by T-Nested

> { Box(Rect r), Box(Circle c), Box(Ellipse e) } optimistically total on
> Box<Shape>

> - Shape sealed to Round and Rect
> - { Rect r } total on Rect
> - { Circle c, Ellipse e } optimistically total on Round
> - { Rect r, Circle c, Ellipse e } is optimistically total on Shape, because for
> each of { Rect, Round }, there is a subset that is optimistically total on that
> type
> - { Box(Rect r), Box(Circle c), Box(Ellipse e) } optimistically total on Box by
> T-Nested

> We can enhance this model to construct the residue (a characterization of what
> falls through the cracks), and therefore has to be handled by a catch-all in a
> putatively total switch. A grammar for the residue would be:

> R := null | novel | D(R)

> so might includes Box(null), Box(novel), Bag(Box(novel)), etc. We would need to
> extend this to support deconstructors with multiple bindings too.

> OK, coming back to reality:

> - The patterns of a switch expression must be at least optimistically total.
> - The translation of a switch expression must include a synthetic case that
> catches all elements of the residue of its patterns, and throws the appropriate
> exceptions:
> - NPE for a null
> - ICCE for a novel value
> - One of the above, or maybe something else, for D(novel), D(null), D(E(novel,
> null)), etc

> We still have not addressed how we might nominate a _statement_ switch as being
> some form of total; that's a separate story.

> Also a separate story: under what conditions in the new world do switches throw
> NPE, but this seems like progress.

> Given the weird shape of the residue, it's not clear there's a clean way to
> extrapolate the NPE|ICCE rule, since we might have Foo(null, novel), and would
> arbitrarily have to pick which exception to throw, and neither would really be
> all that great. Perhaps there's a new exception type lurking here.
I disagree here, if me move the null checks upfront the type check a NPE will always be thrown before a ICCE on the same pattern which is the usual rule for any JDK methods 
(if you see a pattern matching as an equivalent of a cascade of instanceof + method call). 

Rémi 

> On 8/20/2020 12:58 PM, Tagir Valeev wrote:

>> Hello!

>>> Are we in agreement on what _should_ happen in these cases?

>> This sounds like a good idea. I see the following alternatives (not
>> saying that they are better, just putting them on the table).

>> 1. Do not perform exhaustiveness analysis for nested patterns, except
>> if all nested components are total. So, in your example, either
>> explicit default, or case Box and case Bag will be necessary. This is
>> somewhat limiting but I'm not sure that the exhaustiveness of complex
>> nested patterns would be very popular. If one really needs this, they
>> could use nested switch expression:
>> return switch(container) {
>>   case Box(var s) -> switch(s) {case Rect r -> ...; case Circle c ->
>> ...; /*optional case null is possible*/};
>>   case Bag(var s) -> switch(s) {case Rect r -> ...; case Circle c ->
>> ...; /*optional case null is possible*/};
>> }
>> Here Box(var s) has total nested component, so it matches everything
>> that Box matches, the same with Bag, thus we perform exhaustiveness
>> analysis using Container declaration.
>> This approach does not close the door for us. We can rethink later and
>> add exhaustiveness analysis for nested patterns when we will finally
>> determine how it should look like. Note that this still allows making
>> Optional.of(var x) + Optional.empty exhaustive if we provide an
>> appropriate mechanism to declare this kind of patterns.

>> 2. Allow deconstructors and records to specify whether null is a
>> possible value for a given component. Like make deconstructors and
>> records null-hostile by default and provide a syntax (T? or T|null or
>> whatever) to allow nulls. In this case, if the deconstructor is
>> null-friendly, then the exhaustive pattern must handle Box(null).
>> Otherwise, Box(null) is a compilation error. Yes, I know, this may
>> quickly evolve to the point when we will need to add a full-fledged
>> nullability to the type system. But probably it's possible to allow
>> nullability specification for new language features only? Ok, ok,
>> sorry, my imagination drove me too far away from reality. Forget it.

>> With best regards,
>> Tagir Valeev.

>> On Thu, Aug 20, 2020 at 10:57 PM Brian Goetz [ mailto:brian.goetz at oracle.com |
>> <brian.goetz at oracle.com> ] wrote:

>>> Tagir's question about exhaustiveness in switches points to some technical debt
>>> left over from expression switches.

>>> (Note: this entire discussion has nothing to do with whether `case Object o` is
>>> nullable; it has strictly to do with extending the existing treatment of
>>> exhaustive switches over enums to sealed classes, when we can conclude a switch
>>> over such a type is total without a default / total case, and what implicit
>>> cases we have to insert to make up for that.  Please let's not conflate this
>>> thread with that issue.)

>>> When we did expression switch, for an exhaustive switch that covered all the
>>> enums without a default, we inserted an extra catch-all case that throws ICCE,
>>> on the theory that nulls are already checked by the switch and so anything that
>>> hits the synthetic default must be a novel enum value, which merits an ICCE.
>>> This worked for enum switches (where all case labels are discrete values), but
>>> doesn't quite scale to sealed types.  Let's fix that.

>>> As a recap, suppose we have

>>>     enum E { A, B; }

>>> and suppose that, via separate compilation, a novel value C is introduced that
>>> was unknown at the time the switch was compiled.

>>> An "exhaustive" statement switch on E:

>>>     switch (e) {
>>>         case A:
>>>         case B:
>>>     }

>>> throws NPE on null but does nothing on C, because switch statements make no
>>> attempt at being exhaustive.

>>> An _expression_ switch that is deemed exhaustive without a default case:

>>>     var s = switch (e) {
>>>         case A -> ...
>>>         case B -> ...
>>>     }

>>> throws NPE on null and ICCE on C.

>>> At the time, we were concerned about the gap between statement and expression
>>> switches, and talked about having a way to make statement switches exhaustive.
>>> That's still on the table, and we should still address this, but that's not
>>> the subject of this mail.

>>> What I want to focus on in this mail is the interplay between exhaustiveness
>>> analysis and (exhaustive) switch semantics, and what code we have to inject to
>>> make up for gaps.  We've identified two sources of gaps: nulls, and novel enum
>>> values.  When we get to sealed types, we can add novel subtypes to the list of
>>> things we have to detect and implicitly reject; when we get to deconstruction
>>> patterns, we need to address these at nested levels too.

>>> Let's analyze switches on Container<Shape> assuming:

>>>     Container<T> = Box<T> | Bag<T>
>>>     Shape = Rect | Circle

>>> and assume a novel shape Pentagon shows up unexpectedly via separate
>>> compilation.

>>> If we have a switch _statement_ with:

>>>     case Box(Rect r)
>>>     case Box(Circle c)
>>>     case Bag(Rect r)
>>>     case Bag(Circle c)

>>> then the only value we implicitly handle is null; everything else just falls out
>>> of the switch, because they don't try to be exhaustive.

>>> If this is an expression switch, then I think its safe to say:

>>>  - The switch should deemed exhaustive; no Box(null) etc cases needed.
>>>  - We get an NPE on null.

>>> But that leaves Box(null), Bag(null), Box(Pentagon), and Bag(Pentagon).  We have
>>> to do something (the switch has to be total) with these, and again, asking
>>> users to manually handle these is unreasonable.  A reasonable strawman here is:

>>>     ICCE on Box(Pentagon) and Bag(Pentagon)
>>>     NPE on Box(null) and Bag(null)

>>> Essentially, what this means is: we need to explicitly consider null and novel
>>> values/types of enum/sealed classes in our exhaustiveness analysis, and, if
>>> these are not seen to be explicitly covered and the implicit coverage plays
>>> into the conclusion of overall weak totality, then we need to insert implicit
>>> catch-alls for these cases.

>>> If we switch over:

>>>     case Box(Rect r)
>>>     case Box(Circle c)
>>>     case Box b
>>>     case Bag(Rect r)
>>>     case Bag(Circle c)

>>> then Box(Pentagon) and Box(null) are handled by the `Box b` case and don't need
>>> to be handled by a catch-all.

>>> If we have:

>>>     case Box(Rect r)
>>>     case Box(Circle c)
>>>     case Bag(Rect r)
>>>     case Bag(Circle c)
>>>     default

>>> then Box(Pentagon|null) and Bag(Pentagon|null) clearly fall into the default
>>> case, so no special handling is needed there.

>>> Are we in agreement on what _should_ happen in these cases?  If so, I can put a
>>> more formal basis on it.

>>> On 8/14/2020 1:20 PM, Brian Goetz wrote:

>>> - Exhaustiveness and null. (Tagir)  For sealed domains (enums and sealed types),
>>> we kind of cheated with expression switches because we could count on the
>>> switch filtering out the null.  But Tagir raises an excellent point, which is
>>> that we do not yet have a sound definition of exhaustiveness that scales to
>>> nested patterns (do Box(Rect) and Box(Circle) cover Box(Shape)?)  This is an
>>> interaction between sealed types and patterns that needs to be ironed out.
>>>   (Thanks Tagir!)

>>> [ Breaking this out from Tagir's more comprehensive reply ]

>>> It's unclear for me how exhaustiveness on nested patterns plays with
>>> null. case Box(Circle c) and case Box(Rect r) don't cover case
>>> Box(null) which is a valid possibility for Box<Shape> type.

>>> It’s not even clear how exhaustiveness plays with null even without nesting, so
>>> let's start there.

>>> Consider this switch:

>>>     switch (trafficLight) {
>>>         case GREEN, YELLOW: driveReallyFast();
>>>         case RED: sigh();
>>>     }

>>> Is it exhaustive?  Well, we want to say yes.  And with the existing
>>> null-hostility of switch, it is.  But even without that, we’d like to say yes,
>>> because a null enum value is almost always an error, and making users deal with
>>> cases that don’t happen in reality is kind of rude.

>>> For a domain sealed to a set of alternatives (enums or sealed classes), let’s
>>> say that a set of patterns is _weakly exhaustive_ if it covers all the
>>> alternatives but not null, and _strongly exhaustive_ if it also covers null.
>>> When we did switch expressions, we said that weakly exhaustive coverings
>>> didn’t need a default in a switch expression.  I think we’re primed to say the
>>> same thing for sealed classes.  But, this “weak is good enough” leans on the
>>> fact that the existing hostility of switch will cover what we miss.  We get no
>>> such cover in nested cases.

>>> I think it’s worth examining further why we are willing to accept the weak
>>> coverage with enums.  Is it really that we’re willing to assume that enums just
>>> should never be null?  If we had type cardinalities in the language, would we
>>> treat `enum X` as declaring a cardinality-1 type called X?  I think we might.
>>> OK, what about sealed classes?  Would the same thing carry over?  Not so sure
>>> there.  And this is a problem, because we ultimately want:

>>>     case Optional.of(var x):
>>>     case Optional.empty():

>>> to be exhaustive on Optional<T>, and said exhaustiveness will likely lean on
>>> some sort of sealing.

>>> This is related to Guy's observation that totality is a "subtree all the way
>>> down" property.  Consider:

>>>     sealed class Container<T> permits Box, Bag { }
>>>     sealed class Shape permits Rect, Circle { }

>>> Ignoring null, Box+Bag should be exhaustive on container, and Rect+Circle should
>>> be exhaustive on shape.  So if we are switching over a Container<Shape>, then
>>> what of:

>>>     case Box(Rect r):
>>>     case Box(Circle c):
>>>     case Bag(Rect r):
>>>     case Bag(Circle c):

>>> We have some "nullity holes" in three places: Box(null), Bag(null), and null
>>> itself.   Is this set of cases exhaustive on Bags, Boxes, or Containers?

>>> I think users would like to be able to write the above four cases and treat it
>>> as exhaustive; having to explicitly provide Box(null) / Box b, Bag(null) / Bag
>>> b, or a catch-all to accept null+Box(null)+Bag(null) would all be deemed
>>> unpleasant ceremony.

>>> Hmm...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200820/ea122ede/attachment-0001.htm>