[patterns] Nullability in patterns, and pattern-aware constructs (again)

Wed Jan 8 20:55:38 UTC 2020

Assuming you are happy with the plan in the previous message, let's move 
on to the ...

SPECIAL BONUS ROUND -- POSSIBLY LESS NULL-HOSTILE SWITCHES

(If you are not happy with the previous mail, please direct your 
comments to that first; this is purely a maybe-add-on to the previous.  
Also, remember that patterns in switch is not coming in the next round, 
but the following one.)

In the past, when we tried to relax the null-hostility to switch, we got 
some pushback along the lines of "Please don't make me grovel through 
all the cases to determine if a switch is null-friendly or 
null-hostile."  Making the user do an O(n) analysis would be bad here, 
but maybe an O(1) analysis is acceptable.  (We could have an alternate 
name for switch (`switch-nullable`), and this would address the problem 
(at the cost of, arguably, creating a new problem), but I'd like 
something a little more general.)

Let's start with `case null`.  What if you were allowed to say `case 
null` in a switch, and the switch would do the obvious thing?

     switch (o) {
         case null -> System.out.println("Ugh, null");
         case Object o -> System.out.println("Yay, non-null: " + o);
     }

I don't think anyone would argue that it is all that confusing that this 
switch can deal with null inputs; the proximity of `case null` to the 
`switch` makes it pretty clear that we want to handle nulls.  (We could 
restrict this to the top of the switch, though that would be an 
exception to the general rule about pattern domination, though an 
acceptable one.)

OK, now what about at the other end of the switch?  What if the last 
pattern is total (say, an any pattern.)  Is it also reasonable for that 
to match null?

     switch (o) {
         case String s: ...
         case _: ...
     }

Is it reasonable for the last line to match null?  After all, we're 
saying "everything".  (For now, please ignore any latent assumptions 
that the last line means the same thing as "default".)

I realize that there may be mixed opinions here, but what I'm getting at 
is: what if the reader only had to look at the first and last case to 
determine how nulls were handled in a switch? Would the benefits of 
uniform treatment of nulls in pattern matching (eliminating the 
attendant refactoring anomalies) outweigh the fact that users have to 
make a slight shift in their thinking about "switch is always 
null-hostile"?  (Note that there is no actual code compatibility issue; 
this is all mental-model compatibility.)

The key idea here is to shift our orientation from "switch is null 
hostile" to "matching the `default` clause of a switch is null-hostile" 
-- and saying that non-total switches get an implicit default clause 
(just as we insert an extra throwing default clause into total switch 
expressions, just in case the classfiles change in an incompatible way 
between compilation and runtime.)

So, if there is a `case null`, or the last clause is a total pattern, it 
gets the null; otherwise, there is either an explicit or implicit 
default clause, which is null-hostile.

The main costs are two:
  - a `default` is not the same as `case _` or `case var x` or `case 
Object x`;
  - the mental model that switches are null-hostile needs to be shifted 
to "switch defaults are null-hostile."

These are not trivial costs, but neither are the benefits.  I don't 
think its a slam-dunk either way; it's really a question of whether we 
want to spend some of our "surprise the user" budget on making the 
language more regular.

Note that whichever way we go here, has no effect on the semantics 
outlined in the previous mail.

On 1/8/2020 3:27 PM, Brian Goetz wrote:
> In the past, we've gone around a few times on nullability and pattern 
> matching.  Back when we were enamored of `T?` types over in Valhalla 
> land, we tentatively landed on using `T?` also for nullable type 
> patterns.  But the bloom came off that rose pretty quickly, and 
> Valhalla is moving away from it, and that makes it far less attractive 
> in this context.
>
> There are a number of tangled concerns that we've tried a few times to 
> unknot:
>
>  - Construct nullability.  Constructs to which we want to add pattern 
> awareness (instanceof, switch) already have their own opinion about 
> nulls.  Instanceof always says false when presented with a null, and 
> switch always NPEs.
>
>  - Pattern nullability.  Some patterns clearly would never match null 
> (deconstruction patterns), whereas others (an "any" pattern, and 
> surely the `null` constant pattern, if there was one) might make sense 
> to match null.
>
>  - Nesting vs top-level.  Most of the time, we don't want to match 
> null at the top level, but frequently in a nested position we do. This 
> conflicts with...
>
>  - Totality vs partiality.  When a pattern is partial on the operand 
> type (e.g., `case String` when the operand of switch is `Object`), it 
> is almost never the case we want to match null (well, except for the 
> `null` constant pattern), whereas when a pattern is total on the 
> operand type (e.g., `case Object` in the same example), it is more 
> justifiable to match null.
>
>  - Refactoring friendliness.  There are a number of cases that we 
> would like to freely refactor back and forth (e.g., if-instanceof 
> chain vs pattern switch).  In particular, refactoring a switch on 
> nested patterns to a nested switch (case Foo(T t), case Foo(U u) to a 
> nested switch on T and U) is problematic under some of the 
> interpretations of nested patterns.
>
>  - Inference.  It would be nice if a `var` pattern were simply 
> inference for a type pattern, rather than some possibly-non-denotable 
> union.  (Both Scala and C# treat these differently, which means you 
> have to choose between type inference and the desired semantics; I 
> don't want to put users in the position of making this choice.)
>
>
> Let's try (again) to untangle these.  A compelling example is this one:
>
>     Box box;
>     switch (box) {
>         case Box(Chocolate c):
>         case Box(Frog f):
>         case Box(var o):
>     }
>
> It would be highly confusing and error-prone for either of the first 
> two patterns to match Box(null) -- given that Chocolate and Frog have 
> no type relation (ok, maybe they both implement `Edible`), it should 
> be perfectly safe to reorder the two.  But, because the last pattern 
> is so obviously total on boxes, it is quite likely that what the 
> author wants is to match all remaining boxes, including those that 
> contain null. (Further, it would be super-bad if there were _no_way to 
> say "Match any Box, even if it contains null.  While one might think 
> this could be repaired with OR patterns, imagine that `Box` had N 
> components -- we'd need to OR together 2^n patterns, with complex 
> merging, to express all the possible combinations of nullity.)
>
> Scala and C# took the path of saying that "var" patterns are not just 
> type inference, they are "any" patterns -- so `Box(Object o)` matches 
> boxes containing a non-null payload, where `Box(var o)` matches all 
> boxes.  I find this choice to be both questionable (the story that 
> `var` is just inference is nice) and also that it puts users in the 
> position of having to choose between the semantics they want and being 
> explicit about types.  I see the expedience of it, but I do not think 
> this is the right answer for Java.
>
>
> In the previous round, we posited that there were _type 
> patterns_(denoted `T t`) and _nullable type patterns_(denoted `T? t`), 
> which had the advantage that you could be explicit about what you 
> wanted (nulls or not), and which was sort of banking on Valhalla 
> plunking for the `T? ` notation.  But without that, only having `T?` 
> in patterns, and no where else, will stick out like a sore thumb.
>
> There are many ways to denote "T or null", of course:
>
>  - Union types: `case (T|Null) t`
>  - OR patterns: `case (T t) | (Null t)`, or `case (T t) | (null t)` 
> (the former is a union with a null TYPE pattern, the latter with a 
> null CONSTANT pattern)
>  - Merging/fallthrough: `case T t, Null t`
>  - Some way to spell "nullable T": `case T? t`, `case nullable T t`, 
> `case T|null t`
>
> But, I don't see any of these as being all that attractive in the Box 
> case, when the most likely outcome is that the user wants the last 
> case to match all boxes.
>
>
> Here's a scheme that I think is workable, which we hovered near 
> sometime in the past, and which I want to go back to. We'll start with 
> the observation that `instanceof` and `switch` are currently hostile 
> to nulls (instanceof says false, switch throws, and probably in the 
> future, let/bind will do so also.)
>
>  - We accept that some constructs may have legacy hostility to nulls 
> (but, see below for a possible relaxation);
>  - There are no "nullable type patterns", just type patterns;
> - Type patterns that are _total_ on their target operand (`case T` on 
> an operand of type `U`, where `U <: T`) match null, and non-total type 
> patterns do not.
>  - Var patterns can be considered "just type inference" and will mean 
> the same thing as a type pattern for the inferred type.
>
> In this world, the patterns that match null (if the construct allows 
> it through) are `case null` and the total patterns -- which could be 
> written `var x` (and maybe `_`, or maybe not), or `Object x`, or even 
> a narrower type if the operand type is narrower.
>
> In our Box example, this means that the last case (whether written as 
> `Box(var o)` or `Box(Object o)`) matches all boxes, including those 
> containing null (because the nested pattern is total on the nested 
> operand), but the first two cases do not.
>
> An objection raised against this scheme earlier is that readers will 
> have to look at the declaration site of the pattern to know whether 
> the nested pattern is total. This is a valid concern (to be traded off 
> against the other valid concerns), but this does not seem so bad in 
> practice to me -- it will be common to use var or other broad type, in 
> which case it will be obvious.)
>
> One problem with this interpretation is that we can't trivially 
> refactor from
>
>     switch (o) {
>         case Box(Chocolate c):
>         case Box(Frog f):
>         case Box(var o):
>     }
>
> to
>
>     switch (o) {
>         case Box(var contents):
>             switch (contents) {
>                 case Chocolate c:
>                 case Frog f:
>                 case Object o:
>             }
>         }
>     }
>
> because the inner `switch(contents)` would NPE, because switch is 
> null-hostile.  Instead, the user would explicitly have to do an `if 
> (contents == null)` test, and, if the intent was to handle null in the 
> same way as the bottom case, some duplication of code would be 
> needed.  This is irritating, but I don't think it is disqualifying -- 
> it is in the same category of null irritants that we have throughout 
> the language.
>
> Similarly, we lose the pleasing decomposition that the nested pattern 
> `P(Q)` is the same pattern as `P(alpha) & alpha instanceof Q` when P's 
> 1st component might be null and the pattern Q is total -- because of 
> the existing null-hostility of `instanceof`.  (This is not unlike the 
> complaint that Optional doesn't follow the monad law, with a similar 
> consequence -- and a similar justification.)
>
> So, summary:
>  - the null constant pattern matches null;
>  - "any" patterns match null;
>  - A total type pattern is an "any" pattern;
>  - var is just type inference;
>  - no other patterns match null;
>  - existing constructs retain their existing null behaviors.
>
>
> I'll follow up with a separate message about switch null-hostility.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200108/e4137d28/attachment-0001.htm>