Patterns and nulls

Fri Mar 16 00:01:46 UTC 2018

After going over this in more detail, I have some simplifications we can 
apply to this regarding null, default, and exhaustiveness.

Key observation: we were subtly taking sides on the accept nulls vs 
reject nulls divide, and that left us in a hard-to-{explain,defend} 
place.  Instead we should make it so that both null-lovers and 
null-haters have reasonable idioms for doing their favorite thing.  (And 
we almost started doing the same thing with exhaustiveness for statement 
switches, but we pulled back from that precipice.)

The basic changes to the below story are:
  - We don't need the distinction between constant switches and pattern 
switches, except in some tiny corners of the spec;
  - Default need not be shuffled out to retirement; it means "anything 
else, except null", and can be used in any switch;
  - We define a category of _types_ that are "exhaustible", which means 
its possible to cover all the values without a catch-all. This include 
enums, some primitives (definitely boolean, maybe byte, eventually maybe 
others, especially if we add ranges), and, when we have them, sealed types.
  - A switch is exhaustive if (a) it is over an exhaustive type and all 
cases are explicitly covered, (b) it contains a "default" case, or (c) 
it contains a total pattern case.
  - A switch NPEs on null if it contains neither "case null" nor a total 
type test pattern.
  - An expression switch over an exhaustive type that contains neither a 
default nor total type pattern will throw some exception 
(MatchException, ICCE, etc) when it encounters a target that didn't 
exist at compile time (an enum constant or sealed type member added 
later via separate compilation).

All the rest still holds.

What was missing, and what made me uncomfortable, was not that we were 
biasing "case Object" towards including null, but not having a way to 
finish the switch with "everything, except null" -- which is what is 
natural in some cases.  So now, the null-lovers can use an explicit case 
null, or use "Object" (or var or _ or some other total pattern) at the 
bottom of their switches and the nulls are lumped in there.  The 
null-haters can keep using default, and the switch NPEs just like it 
always did.  And its fairly easy to look at a switch (does it have case 
null, or a null-accepting total pattern (which is always last)?) and 
tell what it does on null.

There's a similar divide on exhaustiveness for statement switches.  Some 
people think biasing towards exhaustiveness is a good idea; some think 
that switch is just like "if", and not all if's need an "else", and any 
push towards exhaustiveness is meddling.  For the exhaustiveness-lovers, 
they can manually enumerate all the cases (enums, sealed type members), 
and have a throwing default which will detect unexpected targets that 
would be impossible at compile time.  (At some point, maybe we'll help 
them by adding support for "default: unreachable", which would provide 
not only runtime detection but would enlist the compiler's flow analysis 
as well.)  For the non-exhaustiveness-fans, just don't use default or 
other total pattern.  Everyone can get what they want.

On 3/14/2018 12:58 PM, Brian Goetz wrote:
> In the message "More on patterns, generics, null, and primitives", 
> Gavin outlines how these constructs will be treated in pattern 
> matching.  This mail is a refinement of that, specifically, to refine 
> how nulls are treated.
>
> Rambling Background Of Why This Is A Problem At All
> ---------------------------------------------------
>
> Nulls will always be a source of corner cases and surprises, so the 
> best we can likely do is move the surprises around to coincide with 
> existing surprise modes.  One of the existing surprise modes is that 
> switches on reference types (boxes, strings, and enums) currently 
> always NPE when passed a null.  You could characterize switch's 
> current treatment of null as "La la la can't hear you la la la."  (I 
> think this decision was mostly made by frog-boiling; in Java 1.0, 
> there were no switches on reference types, so it was not an issue; 
> when switches on boxes was added, it was done by appeal to 
> auto-unboxing, which throws on null, and null enums are rare enough 
> that no one felt it was important enough to do something different for 
> them.  Then when we added string switch in 7, we were already mostly 
> sliding the slippery slope of past precedent.)
>
> The "la la la" approach has gotten us pretty far, but I think finally 
> runs out of gas when we have nested patterns.  It might be OK to NPE 
> when x = null here:
>
>     switch (x) {
>         case String: ...
>         case Integer: ...
>         default: ...
>     }
>
> but it is certainly not OK to NPE when b = new Box(null):
>
>     switch (b) {
>         case Box(String s): ...
>         case Box(Integer i): ...
>         case Box(Object o): ...
>     }
>
> since `Box(null)` is a perfectly reasonable box.  (Which of these 
> patterns matches `Box(null)` is a different story, see below.)  So 
> problem #1 with is that we need a way to match nulls in nested 
> patterns; having nested patterns throw whenever any intermediate 
> binding produces null would be crazy.  So, we have to deal with nulls 
> in this way.  It seems natural, therefore, to be able to confront it 
> directly:
>
>     case Box(null): ...
>
> which is just an ordinary nested pattern, where our target matches 
> `Box(var x)` and further x matches null.  Which means `x matches null` 
> need to be a thing, even if switch is hostile to nulls.
>
> But if you pull on this string a bit more, we'd also like to do the 
> same at the top level, because we'd like to be able to refactor
>
>     switch (b) {
>         case Box(null): ...
>         case Box(Candy): ...
>         case Box(Object): ...
>     }
>
> into
>
>     switch (b) {
>         case Box(var x):
>             switch (x) {
>                 case null: ...
> case Candy: ...
> case Object: ...
>             }
>     }
>
> with no subtle semantics changes.  I think this is what users will 
> expect, and cutting them on sharp edges here wouldn't be doing them 
> favors.
>
>
> Null and Type Patterns
> ----------------------
>
> The previous iteration outlined in Gavin's mail was motivated by a 
> sensible goal, but I think we took it a little too literally. Which is 
> that if I have a `Box(null)`, it should match the following:
>
>     case Box(var x):
>
> because it would be weird if `var x` in a nested context really meant 
> "everything but null."  This led us to the position that
>
>     case Box(Object o):
>
> should also match `Box(null)`, because `var` is just type inference, 
> and the compiler infers `Object` here from the signature of the `Box` 
> deconstructor.  So `var` and the type that gets inferred should be 
> treated the same.  (Note that Scala departs from this, and the results 
> are pretty confusing.)
>
> You might convince yourself that `Box(Object)` not matching 
> `Box(null)` is not a problem, just add a case to handle null, with an 
> OR pattern (aka non-harmful fallthrough):
>
>     case Box(null): // fall through
>     case Box(Object): ...
>
> But, this only works in the simple case.  What if my Box deconstructor 
> had four binding variables:
>
>     case Box(P, Q, R, S):
>
> Now, to capture the same semantics, you need four more cases:
>
>     case Box(null, Q, R, S): // fall through
>     case Box(P, null, R, S):// fall through
>     case Box(P, Q, null, S): // fall through
>     case Box(P, Q, R, null): // fall through
>     case Box(P, Q, R, S):
>
> But wait, it gets worse, since if P and friends have binding 
> variables, and the null pattern does not, the binding variables will 
> not be DA and therefore not be usable.  And if we graft binding 
> variables onto constant patterns, we have a potential typing problem, 
> since the type of merged binding variables in OR patterns should 
> match.  So this is a tire fire, let's back away slowly.
>
> So, we want at least some type patterns to match null, at least in 
> nested contexts.  Got it.
>
> This led us to: a type pattern `T t` should match null.  But clearly, 
> in the switch
>
>     switch (aString) {
>         case String s: ...
>     }
>
> it NPEs (since that's what it does today.)  So we moved the null 
> hostility to `switch`, which involved an analysis of whether `case 
> null` was present.  As Kevin pointed out, that was pretty confusing 
> for the users to keep track of.  So that's not so good.
>
> Also not so good: if type patterns match null, then the dominance 
> order rule says you can't put a `case null` arm after a type pattern 
> arm, because the `case null` will be dead.  (Just like you can't catch 
> `IOException` after catching `Throwable`.)  Which deprived case null 
> of most of its remaining usefulness, which is: lump null in with the 
> default.  If users want to use `case null`, they most likely want this:
>
>     switch (o) {
>         case A: ...
>         case B: ...
>         case null: // fall through
>         default:
>             // deal with unexpected values
>     }
>
> If we can't do that -- which the latest iteration said we can't -- its 
> pretty useless.  So, we got something wrong with type patterns too.  
> Tricky buggers, these nulls!
>
>
> Some Problems With the Current Plan
> -----------------------------------
>
> The current plan, even though it came via a sensible path, has lots of 
> problems.  Including:
>
>  - Its hard to reason about which switches throw on null and which 
> don't.  (This will never be easy, but we can make it less hard.)
>  - We have asymmetries between nested and non-nested patterns; if we 
> unroll a nested pattern to a nested switch, the semantics shift subtly 
> out from under us.
>  - There's no way to say "default including null", which is what 
> people would actually want to do if they had explicit control over 
> nulls.  Having `String s` match null means our ordering rules force 
> the null case too early, depriving us of the ability to lump it in 
> with another case.
>
> Further, while the intent of `Box(var x)` matches `Box(null)` was 
> right, and that led us to `Box(Object)` matches `Box(null)`, we didn't 
> pull this string to the end.  So let's break some assumptions and 
> start over.
>
> Let's assume we have the following declarations:
>
>     record Box(Object);
>     Object o;
>     String s;
>     Box b;
>
> Implicitly, `Box` has a deconstruction pattern whose signature is 
> `Box(out Object o)`.
>
> What will users expect on the following?
>
>     Box b = new Box(null);
>     switch (b) {
>         case Box(Candy x): ...
>         case Box(Frog f): ...
>         case Box(Object o): ...
>     }
>
> There are four non-ridiculous possibilities:
>  - NPE
>  - Match none
>  - Match Box(Candy)
>  - Match Box(Object)
>
> I argued above why NPE is undesirable; I think matching none of them 
> would also be pretty surprising, since `Box(null)` is a perfectly 
> reasonable element of the value set decribed by the pattern 
> `Box(Object)`.  If all type patterns match null, we'd match 
> `Box(Candy)` -- but that's pretty weird and arbitrary, and probably 
> not what the user expects.  It also means -- and this is a serious 
> smell -- that we couldn't freely reorder the independent cases 
> `Box(Candy)` and `Box(Frog)` without subtly altering behavior.  Yuck!
>
> So the only reasonable outcome is that it matches `Box(Object)`. We'll 
> need a credible theory why we bypass the candy and the frog buckets, 
> but I think this is what the user will expect -- `Box(Object)` is our 
> catch-all bucket.
>
> A Credible Theory
> -----------------
>
> Recall that matching a nested pattern `x matches Box(P)` means:
>
>     x matches Box(var alpha) && alpha matches P
>
> The theory by which we can reasonably claim that `Box(Object)` matches 
> `Box(null)` is that the nested pattern `Object` is _total_ on the type 
> of its target (alpha), and therefore can be statically deemed to match 
> without additional dynamic checks.  In
>
>         case Box(Candy x): ...
>         case Box(Frog f): ...
>         case Box(Object o): ...
>
> the first two cases require additional dynamic type tests (instanceof 
> Candy / Frog), but the latter, if the target is a `Box` at all, 
> requires no further dynamic testing.  So we can _define_ `T t` to mean:
>
>     match(T t, e : U) === U <: T ? true : e instanceof U
>
> In other words, a total type pattern matches null, but a partial type 
> pattern does not.  That's great for the type system weenies, but does 
> it help the users?  I claim it does. It means that in:
>
>     Box b = new Box(null);
>     switch (b) {
>         case Box(Candy x): ...
>         case Box(Frog f): ...
>         case Box(Object o): ...
>     }
>
> We match `Box(Object)`, which is the catch-all `Box` handler. We can 
> freely reorder the first two cases, because they're unordered by 
> dominance, but we can't reorder either of them with `Box(Object)`, 
> because that would create a dead case arm. `Box(var x)` and `Box(T x)` 
> mean the same thing when `T` is the type that inference produces.
>
> So `Box(Candy)` selects all boxes known to contain candy; `Box(Frog)` 
> all boxes known to contain frogs; `Box(null)` selects a box containing 
> null, and `Box(_)` or `Box(var x)` or `Box(Object o)` selects all boxes.
>
> Further, we can unroll the above to:
>
>     Box b = new Box(null);
>     switch (b) {
>         case Box(var x):
> switch (x) {
> case Candy c: ...
> case Frog f: ...
> case Object o: ...
>             }
>     }
>
> and it means _the same thing_; the nulls flow into the `Object` catch 
> basin, and I can still freely recorder the Candy/Frog cases. Whew. 
> This feels like we're getting somewhere.
>
> We can also now flow the `case null` down to where it falls through 
> into the "everything else" bucket, because type patterns no longer 
> match nulls.  If specified at all, this is probably where the user 
> most wants to put it.
>
> Note also that the notion of a "total pattern" (one whose 
> applicability, possibly modulo null, can be determined statically) 
> comes up elsewhere too.  We talked about a let-bind statement:
>
>    let Point(var x, var y) = p
>
> In order for the compiler to know that an `else` is not required on a 
> let-bind, the pattern has to be total on the static type of the 
> target.  So this notion of totality is a useful one.
>
> Where totality starts to feel uncomfortable is the fact that while 
> null _matches_ `Object o`, it is not `instanceof Object`.  More on 
> this later.
>
> This addresses all the problems we stated above, so what's the problem?
>
> Default becomes legacy
> ----------------------
>
> The catch is that the irregularity of `default` becomes even more 
> problematic.  The cure is we give `default` a gold watch, thank it for 
> its services, and grant it "Keyword Emeritus" status.
>
> What's wrong with default?  First, it's syntactically irregular. It's 
> not a pattern, so doesn't easily admit nesting or binding variables.  
> And second, its semantically irregular; it means "everything else (but 
> not null!)"  Which makes it a poor catch-all.  We'd like for our 
> catch-all case -- the one that dominates all other possible cases -- 
> to catch everything.  We thought we wanted `default` to be equivalent 
> to a total pattern, but default is insufficiently total.
>
> So, let's define a _constant switch_ as one whose target is the 
> existing constant types (primitives, their boxes, strings, and enums) 
> and whose labels are all constants (the latter condition might not be 
> needed).  In a constant switch, retcon default to mean "all the 
> constants I've not explicitly enumerated, except null."  (If you want 
> to flow nulls into the default bin too, just add an explicit `case 
> null` to fall into default, _or_ replace `default` with a total 
> pattern.)  We act as if that constant switches have an implicit "case 
> null: NPE" _at the bottom_.  If you don't handle null explicitly (a 
> total pattern counts as handling it explicitly), you fall into that 
> bucket.
>
> Then, we _ban_ default in non-constant switches.  So if you want 
> patterns, swap your old deficient `default` for new shiny total 
> patterns, which are a better default, and are truly exhaustive (rather 
> than modulo-null exhaustive).  If we can do a little more to express 
> the intention of exhaustiveness for statement switches (which are not 
> required to be exhaustive), this gives us a path to "switches never 
> throw NPE if you follow XYZ rules."
>
> There's more work to do here to get to this statically-provable 
> null-safe switch future, but I think this is a very positive 
> direction.  (Of course, we can't prevent NPEs from people matching 
> against `Object o` and then dereferencing o.)
>
> Instanceof becomes instanceof
> -----------------------------
>
> The other catch is that we can't use `instanceof` to be the spelling 
> of our `matches` operator, because it conflicts with existing 
> `instanceof` treatment of nulls.  I think that's OK; `instanceof` is a 
> low-level primitive; matching is a high-level construct defined 
> partially in terms of instanceof.
>
>