[patterns] Nullability in patterns, and pattern-aware constructs (again)

Fri Jan 10 20:00:47 UTC 2020

Closing the loop, this raises the question of "what about instanceof and 
total patterns"?  I posit that the following locutions are silly:

     if (e instanceof var x) { ... }    // always true
     if (e instanceof _) { ... }        // always true

and probably should be banned.  If we did, though, what about:

     if (e instanceof Object o) { ... } // always true
if (e instanceof Object) { ... }   // always true, but currently allowed

I would think we would ban the former as well, but we have to keep the 
latter around for compatibility.  (Which is partially why I discouraged 
calling the latter an "anonymous pattern" in the spec, and instead 
proposed to treat it as a different flavor of `instanceof`.)

SO, proposed: disallow "any" patterns (_, var x, or total T x) in 
instanceof. Instanceof is for partial patterns.

Note that

     Point p;
     if (p instanceof Point(var x, var y)) { }

is total, but we would't want to disallow it, as this pattern could 
still fail if p == null.

We might want to go a little further, and ban constant patterns in 
instanceof too, since all of the following have simpler forms:

     if (x instanceof null) { ... }
     if (x instanceof "") { ... }
     if (i instanceof 3) { ... }

Or not -- I suspect not.

On 1/8/2020 3:27 PM, Brian Goetz wrote:
> In the past, we've gone around a few times on nullability and pattern 
> matching.  Back when we were enamored of `T?` types over in Valhalla 
> land, we tentatively landed on using `T?` also for nullable type 
> patterns.  But the bloom came off that rose pretty quickly, and 
> Valhalla is moving away from it, and that makes it far less attractive 
> in this context.
>
> There are a number of tangled concerns that we've tried a few times to 
> unknot:
>
>  - Construct nullability.  Constructs to which we want to add pattern 
> awareness (instanceof, switch) already have their own opinion about 
> nulls.  Instanceof always says false when presented with a null, and 
> switch always NPEs.
>
>  - Pattern nullability.  Some patterns clearly would never match null 
> (deconstruction patterns), whereas others (an "any" pattern, and 
> surely the `null` constant pattern, if there was one) might make sense 
> to match null.
>
>  - Nesting vs top-level.  Most of the time, we don't want to match 
> null at the top level, but frequently in a nested position we do. This 
> conflicts with...
>
>  - Totality vs partiality.  When a pattern is partial on the operand 
> type (e.g., `case String` when the operand of switch is `Object`), it 
> is almost never the case we want to match null (well, except for the 
> `null` constant pattern), whereas when a pattern is total on the 
> operand type (e.g., `case Object` in the same example), it is more 
> justifiable to match null.
>
>  - Refactoring friendliness.  There are a number of cases that we 
> would like to freely refactor back and forth (e.g., if-instanceof 
> chain vs pattern switch).  In particular, refactoring a switch on 
> nested patterns to a nested switch (case Foo(T t), case Foo(U u) to a 
> nested switch on T and U) is problematic under some of the 
> interpretations of nested patterns.
>
>  - Inference.  It would be nice if a `var` pattern were simply 
> inference for a type pattern, rather than some possibly-non-denotable 
> union.  (Both Scala and C# treat these differently, which means you 
> have to choose between type inference and the desired semantics; I 
> don't want to put users in the position of making this choice.)
>
>
> Let's try (again) to untangle these.  A compelling example is this one:
>
>     Box box;
>     switch (box) {
>         case Box(Chocolate c):
>         case Box(Frog f):
>         case Box(var o):
>     }
>
> It would be highly confusing and error-prone for either of the first 
> two patterns to match Box(null) -- given that Chocolate and Frog have 
> no type relation (ok, maybe they both implement `Edible`), it should 
> be perfectly safe to reorder the two.  But, because the last pattern 
> is so obviously total on boxes, it is quite likely that what the 
> author wants is to match all remaining boxes, including those that 
> contain null. (Further, it would be super-bad if there were _no_way to 
> say "Match any Box, even if it contains null.  While one might think 
> this could be repaired with OR patterns, imagine that `Box` had N 
> components -- we'd need to OR together 2^n patterns, with complex 
> merging, to express all the possible combinations of nullity.)
>
> Scala and C# took the path of saying that "var" patterns are not just 
> type inference, they are "any" patterns -- so `Box(Object o)` matches 
> boxes containing a non-null payload, where `Box(var o)` matches all 
> boxes.  I find this choice to be both questionable (the story that 
> `var` is just inference is nice) and also that it puts users in the 
> position of having to choose between the semantics they want and being 
> explicit about types.  I see the expedience of it, but I do not think 
> this is the right answer for Java.
>
>
> In the previous round, we posited that there were _type 
> patterns_(denoted `T t`) and _nullable type patterns_(denoted `T? t`), 
> which had the advantage that you could be explicit about what you 
> wanted (nulls or not), and which was sort of banking on Valhalla 
> plunking for the `T? ` notation.  But without that, only having `T?` 
> in patterns, and no where else, will stick out like a sore thumb.
>
> There are many ways to denote "T or null", of course:
>
>  - Union types: `case (T|Null) t`
>  - OR patterns: `case (T t) | (Null t)`, or `case (T t) | (null t)` 
> (the former is a union with a null TYPE pattern, the latter with a 
> null CONSTANT pattern)
>  - Merging/fallthrough: `case T t, Null t`
>  - Some way to spell "nullable T": `case T? t`, `case nullable T t`, 
> `case T|null t`
>
> But, I don't see any of these as being all that attractive in the Box 
> case, when the most likely outcome is that the user wants the last 
> case to match all boxes.
>
>
> Here's a scheme that I think is workable, which we hovered near 
> sometime in the past, and which I want to go back to. We'll start with 
> the observation that `instanceof` and `switch` are currently hostile 
> to nulls (instanceof says false, switch throws, and probably in the 
> future, let/bind will do so also.)
>
>  - We accept that some constructs may have legacy hostility to nulls 
> (but, see below for a possible relaxation);
>  - There are no "nullable type patterns", just type patterns;
> - Type patterns that are _total_ on their target operand (`case T` on 
> an operand of type `U`, where `U <: T`) match null, and non-total type 
> patterns do not.
>  - Var patterns can be considered "just type inference" and will mean 
> the same thing as a type pattern for the inferred type.
>
> In this world, the patterns that match null (if the construct allows 
> it through) are `case null` and the total patterns -- which could be 
> written `var x` (and maybe `_`, or maybe not), or `Object x`, or even 
> a narrower type if the operand type is narrower.
>
> In our Box example, this means that the last case (whether written as 
> `Box(var o)` or `Box(Object o)`) matches all boxes, including those 
> containing null (because the nested pattern is total on the nested 
> operand), but the first two cases do not.
>
> An objection raised against this scheme earlier is that readers will 
> have to look at the declaration site of the pattern to know whether 
> the nested pattern is total. This is a valid concern (to be traded off 
> against the other valid concerns), but this does not seem so bad in 
> practice to me -- it will be common to use var or other broad type, in 
> which case it will be obvious.)
>
> One problem with this interpretation is that we can't trivially 
> refactor from
>
>     switch (o) {
>         case Box(Chocolate c):
>         case Box(Frog f):
>         case Box(var o):
>     }
>
> to
>
>     switch (o) {
>         case Box(var contents):
>             switch (contents) {
>                 case Chocolate c:
>                 case Frog f:
>                 case Object o:
>             }
>         }
>     }
>
> because the inner `switch(contents)` would NPE, because switch is 
> null-hostile.  Instead, the user would explicitly have to do an `if 
> (contents == null)` test, and, if the intent was to handle null in the 
> same way as the bottom case, some duplication of code would be 
> needed.  This is irritating, but I don't think it is disqualifying -- 
> it is in the same category of null irritants that we have throughout 
> the language.
>
> Similarly, we lose the pleasing decomposition that the nested pattern 
> `P(Q)` is the same pattern as `P(alpha) & alpha instanceof Q` when P's 
> 1st component might be null and the pattern Q is total -- because of 
> the existing null-hostility of `instanceof`.  (This is not unlike the 
> complaint that Optional doesn't follow the monad law, with a similar 
> consequence -- and a similar justification.)
>
> So, summary:
>  - the null constant pattern matches null;
>  - "any" patterns match null;
>  - A total type pattern is an "any" pattern;
>  - var is just type inference;
>  - no other patterns match null;
>  - existing constructs retain their existing null behaviors.
>
>
> I'll follow up with a separate message about switch null-hostility.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200110/cd5d843e/attachment-0001.htm>