[patterns] Nullability in patterns, and pattern-aware constructs (again)

Wed Jan 8 20:27:54 UTC 2020

In the past, we've gone around a few times on nullability and pattern 
matching.  Back when we were enamored of `T?` types over in Valhalla 
land, we tentatively landed on using `T?` also for nullable type 
patterns.  But the bloom came off that rose pretty quickly, and Valhalla 
is moving away from it, and that makes it far less attractive in this 
context.

There are a number of tangled concerns that we've tried a few times to 
unknot:

  - Construct nullability.  Constructs to which we want to add pattern 
awareness (instanceof, switch) already have their own opinion about 
nulls.  Instanceof always says false when presented with a null, and 
switch always NPEs.

  - Pattern nullability.  Some patterns clearly would never match null 
(deconstruction patterns), whereas others (an "any" pattern, and surely 
the `null` constant pattern, if there was one) might make sense to match 
null.

  - Nesting vs top-level.  Most of the time, we don't want to match null 
at the top level, but frequently in a nested position we do. This 
conflicts with...

  - Totality vs partiality.  When a pattern is partial on the operand 
type (e.g., `case String` when the operand of switch is `Object`), it is 
almost never the case we want to match null (well, except for the `null` 
constant pattern), whereas when a pattern is total on the operand type 
(e.g., `case Object` in the same example), it is more justifiable to 
match null.

  - Refactoring friendliness.  There are a number of cases that we would 
like to freely refactor back and forth (e.g., if-instanceof chain vs 
pattern switch).  In particular, refactoring a switch on nested patterns 
to a nested switch (case Foo(T t), case Foo(U u) to a nested switch on T 
and U) is problematic under some of the interpretations of nested patterns.

  - Inference.  It would be nice if a `var` pattern were simply 
inference for a type pattern, rather than some possibly-non-denotable 
union. (Both Scala and C# treat these differently, which means you have 
to choose between type inference and the desired semantics; I don't want 
to put users in the position of making this choice.)

Let's try (again) to untangle these.  A compelling example is this one:

     Box box;
     switch (box) {
         case Box(Chocolate c):
         case Box(Frog f):
         case Box(var o):
     }

It would be highly confusing and error-prone for either of the first two 
patterns to match Box(null) -- given that Chocolate and Frog have no 
type relation (ok, maybe they both implement `Edible`), it should be 
perfectly safe to reorder the two.  But, because the last pattern is so 
obviously total on boxes, it is quite likely that what the author wants 
is to match all remaining boxes, including those that contain null. 
(Further, it would be super-bad if there were _no_way to say "Match any 
Box, even if it contains null.  While one might think this could be 
repaired with OR patterns, imagine that `Box` had N components -- we'd 
need to OR together 2^n patterns, with complex merging, to express all 
the possible combinations of nullity.)

Scala and C# took the path of saying that "var" patterns are not just 
type inference, they are "any" patterns -- so `Box(Object o)` matches 
boxes containing a non-null payload, where `Box(var o)` matches all 
boxes.  I find this choice to be both questionable (the story that `var` 
is just inference is nice) and also that it puts users in the position 
of having to choose between the semantics they want and being explicit 
about types.  I see the expedience of it, but I do not think this is the 
right answer for Java.

In the previous round, we posited that there were _type 
patterns_(denoted `T t`) and _nullable type patterns_(denoted `T? t`), 
which had the advantage that you could be explicit about what you wanted 
(nulls or not), and which was sort of banking on Valhalla plunking for 
the `T? ` notation.  But without that, only having `T?` in patterns, and 
no where else, will stick out like a sore thumb.

There are many ways to denote "T or null", of course:

  - Union types: `case (T|Null) t`
  - OR patterns: `case (T t) | (Null t)`, or `case (T t) | (null t)` 
(the former is a union with a null TYPE pattern, the latter with a null 
CONSTANT pattern)
  - Merging/fallthrough: `case T t, Null t`
  - Some way to spell "nullable T": `case T? t`, `case nullable T t`, 
`case T|null t`

But, I don't see any of these as being all that attractive in the Box 
case, when the most likely outcome is that the user wants the last case 
to match all boxes.

Here's a scheme that I think is workable, which we hovered near sometime 
in the past, and which I want to go back to.  We'll start with the 
observation that `instanceof` and `switch` are currently hostile to 
nulls (instanceof says false, switch throws, and probably in the future, 
let/bind will do so also.)

  - We accept that some constructs may have legacy hostility to nulls 
(but, see below for a possible relaxation);
  - There are no "nullable type patterns", just type patterns;
- Type patterns that are _total_ on their target operand (`case T` on an 
operand of type `U`, where `U <: T`) match null, and non-total type 
patterns do not.
  - Var patterns can be considered "just type inference" and will mean 
the same thing as a type pattern for the inferred type.

In this world, the patterns that match null (if the construct allows it 
through) are `case null` and the total patterns -- which could be 
written `var x` (and maybe `_`, or maybe not), or `Object x`, or even a 
narrower type if the operand type is narrower.

In our Box example, this means that the last case (whether written as 
`Box(var o)` or `Box(Object o)`) matches all boxes, including those 
containing null (because the nested pattern is total on the nested 
operand), but the first two cases do not.

An objection raised against this scheme earlier is that readers will 
have to look at the declaration site of the pattern to know whether the 
nested pattern is total. This is a valid concern (to be traded off 
against the other valid concerns), but this does not seem so bad in 
practice to me -- it will be common to use var or other broad type, in 
which case it will be obvious.)

One problem with this interpretation is that we can't trivially refactor 
from

     switch (o) {
         case Box(Chocolate c):
         case Box(Frog f):
         case Box(var o):
     }

to

     switch (o) {
         case Box(var contents):
             switch (contents) {
                 case Chocolate c:
                 case Frog f:
                 case Object o:
             }
         }
     }

because the inner `switch(contents)` would NPE, because switch is 
null-hostile.  Instead, the user would explicitly have to do an `if 
(contents == null)` test, and, if the intent was to handle null in the 
same way as the bottom case, some duplication of code would be needed.  
This is irritating, but I don't think it is disqualifying -- it is in 
the same category of null irritants that we have throughout the language.

Similarly, we lose the pleasing decomposition that the nested pattern 
`P(Q)` is the same pattern as `P(alpha) & alpha instanceof Q` when P's 
1st component might be null and the pattern Q is total -- because of the 
existing null-hostility of `instanceof`.  (This is not unlike the 
complaint that Optional doesn't follow the monad law, with a similar 
consequence -- and a similar justification.)

So, summary:
  - the null constant pattern matches null;
  - "any" patterns match null;
  - A total type pattern is an "any" pattern;
  - var is just type inference;
  - no other patterns match null;
  - existing constructs retain their existing null behaviors.

I'll follow up with a separate message about switch null-hostility.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200108/e446ce86/attachment.htm>