[pattern-switch] Exhaustiveness
Brian Goetz
brian.goetz at oracle.com
Thu Aug 20 19:09:00 UTC 2020
Here's an attempt at a formalism for capturing this.
There are several categories of patterns we might call total on a type
T. We could refine the taxonomy as:
- Strongly total -- matches all values of T.
- Weakly total -- matches all values of T except perhaps null.
What we want to do is characterize the aggregate totality on T of a
_set_ of patterns P*. A set of patterns could in the aggregate be
either of the above, or also:
- Optimistically total -- matches all values of subtypes of T _known
at compile time_, except perhaps null.
Note that we have an ordering:
partial < optimistically total < weakly total < strongly total
Now, some rules about defining the totality of a set of patterns.
T-Total: The singleton set containing the type pattern `T t` is strongly
total on U <: T. (This is the rule we've been discussing, but that's
not the point of this mail -- we just need a base case right now.)
T-Subset: If a set of patterns P* contains a subset of patterns that is
X-total on T, then P* is X-total on T.
T-Sealed: If T is sealed to U* (direct subtypes only), and for each U in
U*, there is some subset of P* that is optimistically total on U, then
P* is optimistically total on T.
T-Nested: Given a deconstructor D(U) and a collection of patterns {
P1..Pn }, if { P1..Pn } is X-total on U, then { D(P1)..D(Pn) } is
min(X,weak)-total on D.
OK, examples. Let's say
Container<T> = Box<T> | Bag<T>
Shape = Round | Rect
Round = Circle | Ellipse
{ Container c }: total on Container by T-Total.
{ Box b, Bag b }: optimistically total on Container
- Container sealed to Box and Bag
- `Box b` total on Box, `Bag b` total on Bag
- { Box b, Bag b } optimistically total on Container by T-Sealed
{ Box(Round r), Box(Rect r) }: optimistically total on Box<Shape>
- Box sealed to Round and Rect
- { Round r, Rect r } optimistically total on Shape by T-Sealed
- { Box(Round r), Box(Rect r) } optimistically total on Box<Shape> by
T-Nested
{ Box(Object o) } weakly total on Box<Object>
- Object o total on Object
- { Object o } total on Object by T-Subset
- { Box(Object o) } weakly total on Box<Object> by T-Nested
{ Box(Rect r), Box(Circle c), Box(Ellipse e) } optimistically total on
Box<Shape>
- Shape sealed to Round and Rect
- { Rect r } total on Rect
- { Circle c, Ellipse e } optimistically total on Round
- { Rect r, Circle c, Ellipse e } is optimistically total on Shape,
because for each of { Rect, Round }, there is a subset that is
optimistically total on that type
- { Box(Rect r), Box(Circle c), Box(Ellipse e) } optimistically total
on Box by T-Nested
We can enhance this model to construct the residue (a characterization
of what falls through the cracks), and therefore has to be handled by a
catch-all in a putatively total switch. A grammar for the residue would be:
R := null | novel | D(R)
so might includes Box(null), Box(novel), Bag(Box(novel)), etc. We would
need to extend this to support deconstructors with multiple bindings too.
OK, coming back to reality:
- The patterns of a switch expression must be at least optimistically
total.
- The translation of a switch expression must include a synthetic case
that catches all elements of the residue of its patterns, and throws the
appropriate exceptions:
- NPE for a null
- ICCE for a novel value
- One of the above, or maybe something else, for D(novel), D(null),
D(E(novel, null)), etc
We still have not addressed how we might nominate a _statement_ switch
as being some form of total; that's a separate story.
Also a separate story: under what conditions in the new world do
switches throw NPE, but this seems like progress.
Given the weird shape of the residue, it's not clear there's a clean way
to extrapolate the NPE|ICCE rule, since we might have Foo(null, novel),
and would arbitrarily have to pick which exception to throw, and neither
would really be all that great. Perhaps there's a new exception type
lurking here.
On 8/20/2020 12:58 PM, Tagir Valeev wrote:
> Hello!
>
>> Are we in agreement on what _should_ happen in these cases?
> This sounds like a good idea. I see the following alternatives (not
> saying that they are better, just putting them on the table).
>
> 1. Do not perform exhaustiveness analysis for nested patterns, except
> if all nested components are total. So, in your example, either
> explicit default, or case Box and case Bag will be necessary. This is
> somewhat limiting but I'm not sure that the exhaustiveness of complex
> nested patterns would be very popular. If one really needs this, they
> could use nested switch expression:
> return switch(container) {
> case Box(var s) -> switch(s) {case Rect r -> ...; case Circle c ->
> ...; /*optional case null is possible*/};
> case Bag(var s) -> switch(s) {case Rect r -> ...; case Circle c ->
> ...; /*optional case null is possible*/};
> }
> Here Box(var s) has total nested component, so it matches everything
> that Box matches, the same with Bag, thus we perform exhaustiveness
> analysis using Container declaration.
> This approach does not close the door for us. We can rethink later and
> add exhaustiveness analysis for nested patterns when we will finally
> determine how it should look like. Note that this still allows making
> Optional.of(var x) + Optional.empty exhaustive if we provide an
> appropriate mechanism to declare this kind of patterns.
>
> 2. Allow deconstructors and records to specify whether null is a
> possible value for a given component. Like make deconstructors and
> records null-hostile by default and provide a syntax (T? or T|null or
> whatever) to allow nulls. In this case, if the deconstructor is
> null-friendly, then the exhaustive pattern must handle Box(null).
> Otherwise, Box(null) is a compilation error. Yes, I know, this may
> quickly evolve to the point when we will need to add a full-fledged
> nullability to the type system. But probably it's possible to allow
> nullability specification for new language features only? Ok, ok,
> sorry, my imagination drove me too far away from reality. Forget it.
>
> With best regards,
> Tagir Valeev.
>
> On Thu, Aug 20, 2020 at 10:57 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>> Tagir's question about exhaustiveness in switches points to some technical debt left over from expression switches.
>>
>> (Note: this entire discussion has nothing to do with whether `case Object o` is nullable; it has strictly to do with extending the existing treatment of exhaustive switches over enums to sealed classes, when we can conclude a switch over such a type is total without a default / total case, and what implicit cases we have to insert to make up for that. Please let's not conflate this thread with that issue.)
>>
>>
>> When we did expression switch, for an exhaustive switch that covered all the enums without a default, we inserted an extra catch-all case that throws ICCE, on the theory that nulls are already checked by the switch and so anything that hits the synthetic default must be a novel enum value, which merits an ICCE. This worked for enum switches (where all case labels are discrete values), but doesn't quite scale to sealed types. Let's fix that.
>>
>> As a recap, suppose we have
>>
>> enum E { A, B; }
>>
>> and suppose that, via separate compilation, a novel value C is introduced that was unknown at the time the switch was compiled.
>>
>> An "exhaustive" statement switch on E:
>>
>> switch (e) {
>> case A:
>> case B:
>> }
>>
>> throws NPE on null but does nothing on C, because switch statements make no attempt at being exhaustive.
>>
>> An _expression_ switch that is deemed exhaustive without a default case:
>>
>> var s = switch (e) {
>> case A -> ...
>> case B -> ...
>> }
>>
>> throws NPE on null and ICCE on C.
>>
>> At the time, we were concerned about the gap between statement and expression switches, and talked about having a way to make statement switches exhaustive. That's still on the table, and we should still address this, but that's not the subject of this mail.
>>
>> What I want to focus on in this mail is the interplay between exhaustiveness analysis and (exhaustive) switch semantics, and what code we have to inject to make up for gaps. We've identified two sources of gaps: nulls, and novel enum values. When we get to sealed types, we can add novel subtypes to the list of things we have to detect and implicitly reject; when we get to deconstruction patterns, we need to address these at nested levels too.
>>
>> Let's analyze switches on Container<Shape> assuming:
>>
>> Container<T> = Box<T> | Bag<T>
>> Shape = Rect | Circle
>>
>> and assume a novel shape Pentagon shows up unexpectedly via separate compilation.
>>
>> If we have a switch _statement_ with:
>>
>> case Box(Rect r)
>> case Box(Circle c)
>> case Bag(Rect r)
>> case Bag(Circle c)
>>
>> then the only value we implicitly handle is null; everything else just falls out of the switch, because they don't try to be exhaustive.
>>
>> If this is an expression switch, then I think its safe to say:
>>
>> - The switch should deemed exhaustive; no Box(null) etc cases needed.
>> - We get an NPE on null.
>>
>> But that leaves Box(null), Bag(null), Box(Pentagon), and Bag(Pentagon). We have to do something (the switch has to be total) with these, and again, asking users to manually handle these is unreasonable. A reasonable strawman here is:
>>
>> ICCE on Box(Pentagon) and Bag(Pentagon)
>> NPE on Box(null) and Bag(null)
>>
>> Essentially, what this means is: we need to explicitly consider null and novel values/types of enum/sealed classes in our exhaustiveness analysis, and, if these are not seen to be explicitly covered and the implicit coverage plays into the conclusion of overall weak totality, then we need to insert implicit catch-alls for these cases.
>>
>> If we switch over:
>>
>> case Box(Rect r)
>> case Box(Circle c)
>> case Box b
>> case Bag(Rect r)
>> case Bag(Circle c)
>>
>> then Box(Pentagon) and Box(null) are handled by the `Box b` case and don't need to be handled by a catch-all.
>>
>> If we have:
>>
>> case Box(Rect r)
>> case Box(Circle c)
>> case Bag(Rect r)
>> case Bag(Circle c)
>> default
>>
>> then Box(Pentagon|null) and Bag(Pentagon|null) clearly fall into the default case, so no special handling is needed there.
>>
>>
>> Are we in agreement on what _should_ happen in these cases? If so, I can put a more formal basis on it.
>>
>>
>> On 8/14/2020 1:20 PM, Brian Goetz wrote:
>>
>>
>> - Exhaustiveness and null. (Tagir) For sealed domains (enums and sealed types), we kind of cheated with expression switches because we could count on the switch filtering out the null. But Tagir raises an excellent point, which is that we do not yet have a sound definition of exhaustiveness that scales to nested patterns (do Box(Rect) and Box(Circle) cover Box(Shape)?) This is an interaction between sealed types and patterns that needs to be ironed out. (Thanks Tagir!)
>>
>>
>> [ Breaking this out from Tagir's more comprehensive reply ]
>>
>> It's unclear for me how exhaustiveness on nested patterns plays with
>> null. case Box(Circle c) and case Box(Rect r) don't cover case
>> Box(null) which is a valid possibility for Box<Shape> type.
>>
>>
>> It’s not even clear how exhaustiveness plays with null even without nesting, so let's start there.
>>
>> Consider this switch:
>>
>> switch (trafficLight) {
>> case GREEN, YELLOW: driveReallyFast();
>> case RED: sigh();
>> }
>>
>> Is it exhaustive? Well, we want to say yes. And with the existing null-hostility of switch, it is. But even without that, we’d like to say yes, because a null enum value is almost always an error, and making users deal with cases that don’t happen in reality is kind of rude.
>>
>> For a domain sealed to a set of alternatives (enums or sealed classes), let’s say that a set of patterns is _weakly exhaustive_ if it covers all the alternatives but not null, and _strongly exhaustive_ if it also covers null. When we did switch expressions, we said that weakly exhaustive coverings didn’t need a default in a switch expression. I think we’re primed to say the same thing for sealed classes. But, this “weak is good enough” leans on the fact that the existing hostility of switch will cover what we miss. We get no such cover in nested cases.
>>
>> I think it’s worth examining further why we are willing to accept the weak coverage with enums. Is it really that we’re willing to assume that enums just should never be null? If we had type cardinalities in the language, would we treat `enum X` as declaring a cardinality-1 type called X? I think we might. OK, what about sealed classes? Would the same thing carry over? Not so sure there. And this is a problem, because we ultimately want:
>>
>> case Optional.of(var x):
>> case Optional.empty():
>>
>> to be exhaustive on Optional<T>, and said exhaustiveness will likely lean on some sort of sealing.
>>
>> This is related to Guy's observation that totality is a "subtree all the way down" property. Consider:
>>
>> sealed class Container<T> permits Box, Bag { }
>> sealed class Shape permits Rect, Circle { }
>>
>> Ignoring null, Box+Bag should be exhaustive on container, and Rect+Circle should be exhaustive on shape. So if we are switching over a Container<Shape>, then what of:
>>
>> case Box(Rect r):
>> case Box(Circle c):
>> case Bag(Rect r):
>> case Bag(Circle c):
>>
>> We have some "nullity holes" in three places: Box(null), Bag(null), and null itself. Is this set of cases exhaustive on Bags, Boxes, or Containers?
>>
>> I think users would like to be able to write the above four cases and treat it as exhaustive; having to explicitly provide Box(null) / Box b, Bag(null) / Bag b, or a catch-all to accept null+Box(null)+Bag(null) would all be deemed unpleasant ceremony.
>>
>> Hmm...
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200820/4d0e0232/attachment-0001.htm>
More information about the amber-spec-experts
mailing list