Patterns and nulls

Tue Aug 21 18:56:21 UTC 2018

Returning to this topic…

As mentioned in the original thread, some of what was in here went too 
far. I think we’re comfortable saying:

  * A /type pattern/, on its own, should coincide with |instanceof|,
    meaning it never matches null
  * A /var pattern/ is just a type pattern with the type supplied by
    inference

It still leaves us with two choices for how to write a pattern that 
matches any |Box|, including |Box(null)|.

 1. Just write |Box(Object? o)| if you want all boxes, or write
    |Box(Object o)| if you mean a box containing a non-null.
 2. Adjust the rules for nested patterns to treat a total
    (type-restating) type pattern specially, so |Box(Frog f)| would only
    match boxes containing non-null frogs, but |Box(Object o)| would
    match all boxes.

The former is more principled, as it lets you say what you mean in a 
straightforward way. The latter is more irregular, but might be more 
inline with user intuition. I still worry that people will repeatedly 
cut themselves on the sharp edge of |Box(Object o)| not matching all boxes.

Under either of these rule sets, we can use |instanceof| as our match 
operator.

So I think it comes down to a simple decision about whether we want to 
distort nested total (type-restating) type patterns to be null-friendly 
or null-hostile.

On 3/14/2018 12:58 PM, Brian Goetz wrote:

> In the message "More on patterns, generics, null, and primitives", 
> Gavin outlines how these constructs will be treated in pattern 
> matching.  This mail is a refinement of that, specifically, to refine 
> how nulls are treated.
>
> Rambling Background Of Why This Is A Problem At All
> ---------------------------------------------------
>
> Nulls will always be a source of corner cases and surprises, so the 
> best we can likely do is move the surprises around to coincide with 
> existing surprise modes.  One of the existing surprise modes is that 
> switches on reference types (boxes, strings, and enums) currently 
> always NPE when passed a null. You could characterize switch's current 
> treatment of null as "La la la can't hear you la la la."  (I think 
> this decision was mostly made by frog-boiling; in Java 1.0, there were 
> no switches on reference types, so it was not an issue; when switches 
> on boxes was added, it was done by appeal to auto-unboxing, which 
> throws on null, and null enums are rare enough that no one felt it was 
> important enough to do something different for them.  Then when we 
> added string switch in 7, we were already mostly sliding the slippery 
> slope of past precedent.)
>
> The "la la la" approach has gotten us pretty far, but I think finally 
> runs out of gas when we have nested patterns.  It might be OK to NPE 
> when x = null here:
>
>     switch (x) {
>         case String: ...
>         case Integer: ...
>         default: ...
>     }
>
> but it is certainly not OK to NPE when b = new Box(null):
>
>     switch (b) {
>         case Box(String s): ...
>         case Box(Integer i): ...
>         case Box(Object o): ...
>     }
>
> since `Box(null)` is a perfectly reasonable box.  (Which of these 
> patterns matches `Box(null)` is a different story, see below.)  So 
> problem #1 with is that we need a way to match nulls in nested 
> patterns; having nested patterns throw whenever any intermediate 
> binding produces null would be crazy.  So, we have to deal with nulls 
> in this way.  It seems natural, therefore, to be able to confront it 
> directly:
>
>     case Box(null): ...
>
> which is just an ordinary nested pattern, where our target matches 
> `Box(var x)` and further x matches null.  Which means `x matches null` 
> need to be a thing, even if switch is hostile to nulls.
>
> But if you pull on this string a bit more, we'd also like to do the 
> same at the top level, because we'd like to be able to refactor
>
>     switch (b) {
>         case Box(null): ...
>         case Box(Candy): ...
>         case Box(Object): ...
>     }
>
> into
>
>     switch (b) {
>         case Box(var x):
>             switch (x) {
>                 case null: ...
> case Candy: ...
> case Object: ...
>             }
>     }
>
> with no subtle semantics changes.  I think this is what users will 
> expect, and cutting them on sharp edges here wouldn't be doing them 
> favors.
>
>
> Null and Type Patterns
> ----------------------
>
> The previous iteration outlined in Gavin's mail was motivated by a 
> sensible goal, but I think we took it a little too literally. Which is 
> that if I have a `Box(null)`, it should match the following:
>
>     case Box(var x):
>
> because it would be weird if `var x` in a nested context really meant 
> "everything but null."  This led us to the position that
>
>     case Box(Object o):
>
> should also match `Box(null)`, because `var` is just type inference, 
> and the compiler infers `Object` here from the signature of the `Box` 
> deconstructor.  So `var` and the type that gets inferred should be 
> treated the same.  (Note that Scala departs from this, and the results 
> are pretty confusing.)
>
> You might convince yourself that `Box(Object)` not matching 
> `Box(null)` is not a problem, just add a case to handle null, with an 
> OR pattern (aka non-harmful fallthrough):
>
>     case Box(null): // fall through
>     case Box(Object): ...
>
> But, this only works in the simple case.  What if my Box deconstructor 
> had four binding variables:
>
>     case Box(P, Q, R, S):
>
> Now, to capture the same semantics, you need four more cases:
>
>     case Box(null, Q, R, S): // fall through
>     case Box(P, null, R, S):// fall through
>     case Box(P, Q, null, S): // fall through
>     case Box(P, Q, R, null): // fall through
>     case Box(P, Q, R, S):
>
> But wait, it gets worse, since if P and friends have binding 
> variables, and the null pattern does not, the binding variables will 
> not be DA and therefore not be usable.  And if we graft binding 
> variables onto constant patterns, we have a potential typing problem, 
> since the type of merged binding variables in OR patterns should 
> match.  So this is a tire fire, let's back away slowly.
>
> So, we want at least some type patterns to match null, at least in 
> nested contexts.  Got it.
>
> This led us to: a type pattern `T t` should match null.  But clearly, 
> in the switch
>
>     switch (aString) {
>         case String s: ...
>     }
>
> it NPEs (since that's what it does today.)  So we moved the null 
> hostility to `switch`, which involved an analysis of whether `case 
> null` was present.  As Kevin pointed out, that was pretty confusing 
> for the users to keep track of.  So that's not so good.
>
> Also not so good: if type patterns match null, then the dominance 
> order rule says you can't put a `case null` arm after a type pattern 
> arm, because the `case null` will be dead.  (Just like you can't catch 
> `IOException` after catching `Throwable`.)  Which deprived case null 
> of most of its remaining usefulness, which is: lump null in with the 
> default.  If users want to use `case null`, they most likely want this:
>
>     switch (o) {
>         case A: ...
>         case B: ...
>         case null: // fall through
>         default:
>             // deal with unexpected values
>     }
>
> If we can't do that -- which the latest iteration said we can't -- its 
> pretty useless.  So, we got something wrong with type patterns too.  
> Tricky buggers, these nulls!
>
>
> Some Problems With the Current Plan
> -----------------------------------
>
> The current plan, even though it came via a sensible path, has lots of 
> problems.  Including:
>
>  - Its hard to reason about which switches throw on null and which 
> don't.  (This will never be easy, but we can make it less hard.)
>  - We have asymmetries between nested and non-nested patterns; if we 
> unroll a nested pattern to a nested switch, the semantics shift subtly 
> out from under us.
>  - There's no way to say "default including null", which is what 
> people would actually want to do if they had explicit control over 
> nulls.  Having `String s` match null means our ordering rules force 
> the null case too early, depriving us of the ability to lump it in 
> with another case.
>
> Further, while the intent of `Box(var x)` matches `Box(null)` was 
> right, and that led us to `Box(Object)` matches `Box(null)`, we didn't 
> pull this string to the end.  So let's break some assumptions and 
> start over.
>
> Let's assume we have the following declarations:
>
>     record Box(Object);
>     Object o;
>     String s;
>     Box b;
>
> Implicitly, `Box` has a deconstruction pattern whose signature is 
> `Box(out Object o)`.
>
> What will users expect on the following?
>
>     Box b = new Box(null);
>     switch (b) {
>         case Box(Candy x): ...
>         case Box(Frog f): ...
>         case Box(Object o): ...
>     }
>
> There are four non-ridiculous possibilities:
>  - NPE
>  - Match none
>  - Match Box(Candy)
>  - Match Box(Object)
>
> I argued above why NPE is undesirable; I think matching none of them 
> would also be pretty surprising, since `Box(null)` is a perfectly 
> reasonable element of the value set decribed by the pattern 
> `Box(Object)`.  If all type patterns match null, we'd match 
> `Box(Candy)` -- but that's pretty weird and arbitrary, and probably 
> not what the user expects.  It also means -- and this is a serious 
> smell -- that we couldn't freely reorder the independent cases 
> `Box(Candy)` and `Box(Frog)` without subtly altering behavior.  Yuck!
>
> So the only reasonable outcome is that it matches `Box(Object)`.  
> We'll need a credible theory why we bypass the candy and the frog 
> buckets, but I think this is what the user will expect -- 
> `Box(Object)` is our catch-all bucket.
>
> A Credible Theory
> -----------------
>
> Recall that matching a nested pattern `x matches Box(P)` means:
>
>     x matches Box(var alpha) && alpha matches P
>
> The theory by which we can reasonably claim that `Box(Object)` matches 
> `Box(null)` is that the nested pattern `Object` is _total_ on the type 
> of its target (alpha), and therefore can be statically deemed to match 
> without additional dynamic checks.  In
>
>         case Box(Candy x): ...
>         case Box(Frog f): ...
>         case Box(Object o): ...
>
> the first two cases require additional dynamic type tests (instanceof 
> Candy / Frog), but the latter, if the target is a `Box` at all, 
> requires no further dynamic testing.  So we can _define_ `T t` to mean:
>
>     match(T t, e : U) === U <: T ? true : e instanceof U
>
> In other words, a total type pattern matches null, but a partial type 
> pattern does not.  That's great for the type system weenies, but does 
> it help the users?  I claim it does. It means that in:
>
>     Box b = new Box(null);
>     switch (b) {
>         case Box(Candy x): ...
>         case Box(Frog f): ...
>         case Box(Object o): ...
>     }
>
> We match `Box(Object)`, which is the catch-all `Box` handler. We can 
> freely reorder the first two cases, because they're unordered by 
> dominance, but we can't reorder either of them with `Box(Object)`, 
> because that would create a dead case arm.  `Box(var x)` and `Box(T 
> x)` mean the same thing when `T` is the type that inference produces.
>
> So `Box(Candy)` selects all boxes known to contain candy; `Box(Frog)` 
> all boxes known to contain frogs; `Box(null)` selects a box containing 
> null, and `Box(_)` or `Box(var x)` or `Box(Object o)` selects all boxes.
>
> Further, we can unroll the above to:
>
>     Box b = new Box(null);
>     switch (b) {
>         case Box(var x):
> switch (x) {
> case Candy c: ...
> case Frog f: ...
> case Object o: ...
>             }
>     }
>
> and it means _the same thing_; the nulls flow into the `Object` catch 
> basin, and I can still freely recorder the Candy/Frog cases. Whew. 
> This feels like we're getting somewhere.
>
> We can also now flow the `case null` down to where it falls through 
> into the "everything else" bucket, because type patterns no longer 
> match nulls.  If specified at all, this is probably where the user 
> most wants to put it.
>
> Note also that the notion of a "total pattern" (one whose 
> applicability, possibly modulo null, can be determined statically) 
> comes up elsewhere too.  We talked about a let-bind statement:
>
>    let Point(var x, var y) = p
>
> In order for the compiler to know that an `else` is not required on a 
> let-bind, the pattern has to be total on the static type of the 
> target.  So this notion of totality is a useful one.
>
> Where totality starts to feel uncomfortable is the fact that while 
> null _matches_ `Object o`, it is not `instanceof Object`.  More on 
> this later.
>
> This addresses all the problems we stated above, so what's the problem?
>
> Default becomes legacy
> ----------------------
>
> The catch is that the irregularity of `default` becomes even more 
> problematic.  The cure is we give `default` a gold watch, thank it for 
> its services, and grant it "Keyword Emeritus" status.
>
> What's wrong with default?  First, it's syntactically irregular.  It's 
> not a pattern, so doesn't easily admit nesting or binding variables.  
> And second, its semantically irregular; it means "everything else (but 
> not null!)"  Which makes it a poor catch-all.  We'd like for our 
> catch-all case -- the one that dominates all other possible cases -- 
> to catch everything.  We thought we wanted `default` to be equivalent 
> to a total pattern, but default is insufficiently total.
>
> So, let's define a _constant switch_ as one whose target is the 
> existing constant types (primitives, their boxes, strings, and enums) 
> and whose labels are all constants (the latter condition might not be 
> needed).  In a constant switch, retcon default to mean "all the 
> constants I've not explicitly enumerated, except null."  (If you want 
> to flow nulls into the default bin too, just add an explicit `case 
> null` to fall into default, _or_ replace `default` with a total 
> pattern.)  We act as if that constant switches have an implicit "case 
> null: NPE" _at the bottom_.  If you don't handle null explicitly (a 
> total pattern counts as handling it explicitly), you fall into that 
> bucket.
>
> Then, we _ban_ default in non-constant switches.  So if you want 
> patterns, swap your old deficient `default` for new shiny total 
> patterns, which are a better default, and are truly exhaustive (rather 
> than modulo-null exhaustive).  If we can do a little more to express 
> the intention of exhaustiveness for statement switches (which are not 
> required to be exhaustive), this gives us a path to "switches never 
> throw NPE if you follow XYZ rules."
>
> There's more work to do here to get to this statically-provable 
> null-safe switch future, but I think this is a very positive 
> direction.  (Of course, we can't prevent NPEs from people matching 
> against `Object o` and then dereferencing o.)
>
> Instanceof becomes instanceof
> -----------------------------
>
> The other catch is that we can't use `instanceof` to be the spelling 
> of our `matches` operator, because it conflicts with existing 
> `instanceof` treatment of nulls.  I think that's OK; `instanceof` is a 
> low-level primitive; matching is a high-level construct defined 
> partially in terms of instanceof.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180821/5a321866/attachment-0001.html>