Next up for patterns: type patterns in switch

Mon Aug 10 17:57:01 UTC 2020

There seems to be an awful lot of confusion about the motivation for the 
nullity proposal, so let me step back and address this from first 
principles.

Let's factor away the null-tolerance of the constructs (switch and 
instanceof) from what patterns mean, and then we can return to how, if 
necessary, to resolve any mismatches.  We'll do this by defining what it 
means for a target to match a pattern, and only then define the 
semantics of the pattern-aware constructs in terms of that.

Let me also observe that some people, in their belief that `null` was a 
mistake, tend to have a latent hostility to null, and therefore tend to 
want new features to be at least as null-hostile as the most 
null-hostile of old features.  (A good example is streams; it was 
suggested (by some of the same people) that it should be an error for 
streams to have null elements.  And we considered this briefly -- and 
concluded this would have been a terrible idea!  The lesson of that 
investigation was that the desire to "fix" the null mistake by patching 
individual holes is futile, and tends to lead to worse results.  
Instead, being null-agnostic was the right move for streams.)

I think we're also being distracted by the fact that, in part because 
we've chosen `instanceof` as our syntax, we want to use `instanceof` as 
our mental model for what matching means.  This is a good guiding 
principle but we must be careful of following it blindly.

As a modeling simplification, let's assume that all patterns have 
exactly one binding variable, and the type of that binding variable is 
part of the pattern definition.  We could model our match predicate and 
(conditional) binding function as:

     match :: (Pattern t) u -> Maybe t

A pattern represents the fusion of an applicability predicate, zero or 
more conditional extractions, and a binding mechanism.  For the simple 
case of a type pattern `Foo f`, the applicability predicate is "are you 
a Foo", and there are two possible interpretations -- "would 
`instanceof` say you are a `Foo`" (which means non-null), or "could you 
be assigned to a variable of type Foo" (or, equivalently, "are you in 
the value set of Foo".)

A pattern P is _total_ on U if `match P u` returns `Some t` for every `u 
: U`.  Total patterns are useful because they allow the compiler to 
reason about control flow and provide better error checking (detecting 
dead code, silly pattern matches, totality of expression switches, etc.)

Let's go back to our trusty Box example.  We can think of the `Box` 
constructor as a mapping:

     enBox :: t -> Box t

and the Box deconstructor as

     unBox :: Box t -> t

Now, what algebraic relationship do we want between enBox and unBox?  
The whole point is that a Box is a structure containing some properties, 
and that patterns let us destructure Boxes to recover those properties.  
enBox and unBox should form a projection-embedding pair, which means 
that enBox is allowed to be picky about what `t` values it accepts 
(think of the Rational constructor as throwing on denom==0), but, once 
boxed, we should be able to recover whatever is in the box.  (The Box 
code gets to mediate access in both directions, but the _language_ 
shouldn't make guesses about what this code is going to do.)

 From the perspective of Box, is `null` a valid value of T?  The answer 
is: "That's the Box author's business.  The constructor accepts a T, and 
`null` is a valid member of T's value set.  So if the imperative body of 
the constructor doesn't do anything special to reject it, then it's part 
of the domain."  And if its part of the domain, then `unBox` should hand 
back what we handed to `enBox`.  T in, T out.

It has been a driving goal throughout the pattern matching exploration 
to exploit these dualities, because (among other things) this minimizes 
sharp edges and makes composition do what you expect it to.  If I do:

     Box<T> b = new Box(t);

and this succeeds, then our `match` function applied to `Box(T)` and `b` 
should yield what we started with -- `t`.  Singling out `null` for 
special treatment here as an illegal binding result is unwarranted; it 
creates a sharp edge where you can put things into boxes but you can 
only get them out on tuesdays.  The language has no business telling Box 
it can't contain nulls, or punishing null-happy boxes by making them 
harder to deconstruct. Null-hostility is for the Box author to choose or 
not.  I should be able to compose construction and deconstruction 
without surprises.

Remember, we're not yet talking about language syntax here -- we're 
talking about the semantics of matching (and what we let class authors 
model).  At this level, there is simply no other reasonable set of 
semantics here -- the `Box(T)` deconstructor, when applied to a valid 
Box<T>, should be able to recover whatever was passed to the `new 
Box(T)` constructor.  Nulls should be rejected by pattern matching at 
the point where they would be derferenced, not preemptively.

There's also only one reasonable definition of the semantics of nested 
matching.  If `P : Pattern t`, then the nested pattern P(Q) matches u iff

     u matches P(T alpha) && alpha matches Q

It follows that if `Box(Object o)` is going to to be total on all boxes, 
then Object o is total on all objects.

(There's also only one reasonable definition of the `var` pattern; it is 
type inference where we infer the type pattern for whatever type is the 
target of the match.  So if `P : Pattern T`, then `P(var x)` infers `T 
x` for the nested pattern.)

Doing anything else is an impediment to composition (and composition is 
the only tool we have, as language designers, that separate us from the 
apes.)  I can compose constructors:

     Box<Flox<Pox<T>>> b  = new Box(new Flox(new Pox(t)));

and I should be able to take this apart exactly the same way:

     if (b matches Box(Flox(Pox(var t)))

The reason `Flox(Pox p)` doesn't match null floxes is not because 
patterns shouldn't match null, but because a _deconstruction pattern_ 
that takes apart a Flox is intrinsically going to look inside the Flox 
-- which means dereferencing it.  But an ordinary type pattern is not 
necessarily going to.

Looking at it from another angle, there is a natural interpretation of 
applying a total pattern as a generalization of assignment.  It's not an 
accident that `T t` (or `var x`) looks both like a pattern and like a 
local variable declaration.  We know that this:

     T t = e
or
     var t = e

is a local variable declaration with initializer, but we can also 
reasonably (and profitably) interpret it as a pattern match -- take the 
(total on T) pattern `T t`, and match `e : T` to it.  And the compiler 
already knows that this is going to succeed if `e : T`.  To gratuitously 
reject null here makes no sense.  (Totality is important here; if the 
pattern were not total, then `t` would not be DA after the assignment, 
and therefore the declaration either has to throw a runtime error, or 
the compiler has to reject it.)

## Back to switch and instanceof

The above discussion argues why there is only one reasonable null 
behavior for patterns _in the abstract_.   But, I hear you cry, the 
semantics for switch and instanceof today are entirely reasonable and 
intuitive, so how could they be so wrong?

And the answer is: we have only been able to use `switch` and 
`instanceof` so far for pretty trivial things!  When we add patterns to 
the language, we're raising the expressive ability of these constructs 
to some power.  And extrapolating from our existing intuitions about 
these are like extrapolating the behavior of polynomials from their 
zeroth-order Taylor expansion.

(Now, that this point, the split-over-lump crowd says "Then you should 
define new constructs, if they're so much more powerful." But I still 
claim it is far better to refine our intuitions about what switch means, 
even with some discomfort, than to try to keep track of the subtle 
differences between switch and snitch.)

So, why do we have the current null behavior for `instanceof` and 
`switch`?  Well, right now, `instanceof` only lets you ask a very very 
simple question -- "is the dynamic type of the target X".  And, the 
designers judged (reasonable) that, since 99.999% of the time, what 
you're about to do is cast the target and then deference it, saying "no" 
is less error-prone than saying OK and then having the subsequent 
dereference fail.

But now, `instanceof` can answer far more sophisticated questions, and 
that 99.999% becomes a complete unknown.  With what confidence can you 
say that the body of:

     if (b instanceof Box(var t)) { ... }

is going to dereference t?  If you say more than 50%, you're lying. It 
would be totally reasonable to just take that t and assign it somewhere 
else, rebox it into another box, pass it to some T-consuming method, 
etc.  And who are we to say that Box-consuming protocols are somehow 
"bad" if they like to truck in null contents? That's not our business!  
So the conditions under which "always says no" was reasonable for Java 
1.0 are no longer applicable.

The same is true for switch, because of the very limited reference types 
which switch permits (and which were only added in Java 5) -- boxed 
primitives, strings, and enums.  In all of these cases, we are asking 
very simple questions ("are you 3"), and these are domains where nulls 
have historically been denigrated -- so it seemed reasonable for switch 
to be hostile to them.  But once we introduce patterns, the set of 
questions you can ask gets enormously larger, and the set of types you 
can switch over does too.  The old conditions don't apply.  In:

     switch (o) {
         case Box(var t): ...
         case Bag(var t): ...
     }

we care about the contents, not the wrapping; the switch is there to do 
the unwrapping for us.  Who are we to say "sorry, no one should ever be 
allowed to put a null in a Bag?"  That's not our business!

At this point, I suspect Remi says "I'm not saying you can't put a null 
in a Box, but there should be a different way to unpack it." But unless 
you can say with 99.99% certainty that nulls are always errors, it is 
better to be agnostic to nulls in the plumbing and let users filter them 
at the ultimate point of consumption, than to make the plumbing 
null-hostile and make users jump through hoops to get the nulls to 
flow.  The same was true for streams; we made the (absolutely correct) 
choice to let the nulls flow through the stream, and, if you are using a 
maybe-null-containing source, and doing null-incompatible things on the 
elements, it's on you to filter them.  It is easier to filter nulls than 
to to add back a special encoding for nulls.  (And, the result of that 
experiment was pretty conclusive: of the hundreds of stack overflow 
questions I have seen on streams, not one centered around unexpected 
nulls.)

If we have guards, and you want to express "no Boxes with nulls", that's 
easy:

     case Box(var t) when t != null: ...

And again, as with `instanceof`, we have no reason to believe that 
there's a 99.99% chance that the next thing the user is going to do is 
dereference it.  So the justification that null-hostility is the 
"obvious" semantics here doesn't translate to the new, more powerful 
language feature.

And it gets worse: the people who really want the nulls now have to do 
additional error-prone work, either use some ad-hoc epicyclical syntax 
at each use site (and, if the deconstruction pattern has five bindings, 
you have to say it five times), or having to duplicate blocks of code to 
avoid the switch anomaly.

The conclusion of this section is that while the existing null behavior 
for instanceof and switch is justified relative to their _current_ 
limitations, once we remove those limitations, those behaviors are much 
more arbitrary (and kind of mean: "nulls are so bad, that if you are a 
null-using person, we will make it harder for you, 'for your own good'.")

#### Split the baby?

Now, there is room to make a reasonable argument that we'd rather keep 
the existing switch behavior, but accept the null-friendly matching 
behavior.  My take is that this is a bad trade, but let's look at it 
more carefully.

Gain: I don't have to learn a new set of rules about what 
switch/instanceof do with null.

Loss: code duplication.  If I want my fallback to handle nulls, I have 
to duplicate code; instead of

     switch (o) {
         case String s: A
         case Long l: B
         case Object o: C
     }

I have to do

     if (o == null) { C }
     else switch (o) {
         case String s: A
         case Long l: B
         case Object o: C
     }

resulting in duplicating C.  (We have this problem today, but because of 
the limitations of switch today, it is rarely a problem. When our case 
labels are more powerful, we'll be using switch for more stuff, and it 
will surely come up more often.)

Loss: refactoring anomaly.  Refactoring a nested switch with:

      case P(Q):
      case P(R):
      case P(S):

to

     case P(var x):
         switch (x) {
             case Q: ...
             case R: ...
             case S: ...
         }
     }

doesn't work in the obvious way.  Yes, there's a way to refactor it, and 
the IDE will do it correctly.  But it becomes a sharp edge that users 
will trip over.  The reason the above refactoring is desirable is 
because users will reasonably assume it works, and rather than cut them 
with a sharp edge, we can just make it way they way they reasonable 
think it should.

So, we could make this trade, and it would be more "minimal" -- but I 
think it would result in a less useful switch in the long run.  I think 
we would regret it.

#### Conclusion

If we were designing pattern matching and switch together from scratch, 
we would never even consider the current nullity behavior; the "wait 
until someone actually dereferences before we throw" is the obvious and 
only reasonable choice.  We're being biased based on our existing 
assumptions about instanceof and switch.  This is a reasonable starting 
point, but we have to admit that these biases in turn come from the fact 
that the current interpretations of those constructs are dramatically 
limited compared to supporting patterns.

It is easy to trot out anecdotes where any of the possible schemes would 
cause a particular user to be confused.  But this is just a way to 
justify our biases.   The reality is that, as switch and instanceof get 
more powerful, we don't get to make as many assumptions about the 
liklihood of whether `null` is an error or not.  And, the more likely it 
is not an error, the less justification we have for giving it special 
semantics.

Let the nulls flow.