Feedback on nulls in switch

Brian Goetz brian.goetz at oracle.com
Wed Aug 12 14:06:27 UTC 2020



> But I don't think developers
> want to think too much about totality - it should just be natural.

Exactly!  Just like, `case Box(var o)` -- it's total on boxes, 
naturally, without thinking about it.

Obviously, you don't agree about the naturality part, so let's start 
there -- until you can see why this is natural, there's no point in 
throwing syntax at the problem, because it wouldn't be addressing the 
right problem.

I think part of the problem here is you only have a partial 
understanding of what pattern matching is for, and you're extrapolating 
from what you have.  Like the elephant-observers, "an elephant is like a 
snake" is only a partial understanding, one that is locally valid if 
you're only dealing with that part of the elephant, but leads you to a 
very wrong place if you are trying to build an elephant habitat.

One aspect of the elephant is conditionality / partiality; since not all 
patterns are total, we want pattern matching to interact gracefully with 
the conditional features of the language, which in Java include 
instanceof and switch.  Patterns let us take these existing constructs 
and ask more sophisticated questions than we can today; not just "are 
you a box", but "are you a box containing a bag containing an apple."  
And we build up these more sophisticated questions through the best 
trick we have -- composition.  When I ask "are you a box containing a 
bag containing an apple", I am composing three simple patterns into a 
compound pattern, just like when I call `Box.of(Bag.of(apple))`.  Boxes 
and bags don't have to know about each other, or about apples.  If the 
tail of one arrow matches the head of the previous one, I can compose them.

But another aspect of the elephant is _destructuring_.  This is taking a 
composite structure and breaking it down into its constituent parts, 
with or without conditionality.  Languages with built-in structural 
types (sequences, tuples, etc) usually embed pattern matching deeply 
into the syntax of the language, because if the language gives you a way 
to put an apple and a pear together into a tuple (apple, pair), it 
should give you a way to recover the initial fruit from the pair.

In languages with side-effects (like Java), not all aggregation 
operations are reversible; if I bake a pie, I can't later recover the 
apples and the sugar.  But many are, and we like abstractions like these 
(collections, Optional, stream, etc) because they are very useful and 
easily reasoned about.  So those that are, should commit to the 
principle.  It would be OK for a list implementation to behave like this:

     Listy list = new Listy();
     list.add(null) // throws NPE

because a List is free to express constraints on its domain.  But it 
would be exceedingly bizarre for a list implementation to behave like this:

     Listy list = new Listy();
     list.add(3);     // ok, I like ints
     list.add(null); // ok, I like nulls too
     assertTrue(list.size() == 2);   // ok
     assertTrue(list.get(0) == 3); // ok
     assertTrue(list.get(1) == null);  // NPE!

If the list takes in nulls, it should give them back.

Destructuring and conditionality must interact naturally, since the 
pattern `Bag.of(anything)` is neither intrinsically total (it's not 
total on Crate) nor intrinsically partial (it is total on Bag.) Only 
when it gets connected with a context do we know whether we are asking a 
conditional question with optional destructuring to follow, or engaging 
in pure destructuring.

And, if we are engaging in pure destructuring, we should do so! Just 
like our weird List that lets you put nulls in but doesn't let you take 
them out, it would be a weird Box that lets you put nulls in but not 
treat `Box(null)` as being in the set of values described by 
`Box(anything)`.

I think what is confusing you here is that the notion of "total pattern 
for destructuring" is not a concept you've engaged with substantially, 
and you're trying to bring your "an elephant is about conditionality" 
model, and getting confused.   And this is understandable.  But it's OK 
if your language gets more powerful by dealing with situations you've 
not seen before -- even if it means a little stretching.

So, returning to the language design problem: patterns are useful _both_ 
for describing conditionality _and_ for describing destructuring without 
conditionality.  And sometimes the conditionality is in the outer layer, 
and sometimes it is in the inner layer.  We want a mechanism that works 
naturally with all of these.  And in real-world code, a very common 
pattern is that the last case in a switch (or, the last case in a 
sequence of related cases in a switch) ends up being a catch-all for 
some portion of the space:

     switch (o) {
         case Box(Frog f): ...
         case Box(Chocolate c): ...
         case Box(var x): ...  // "all the rest of the boxes"

         case Bag(Candy c): ...
         case Bag(Groceries g): ...
         case Bag(var x): ... // "all the rest of the bags"
     }

In these situations, the last case is overwhelmingly intended as a 
catch-all for "any kind of bag."  This is by far the most  natural 
default interpretation; "bag of anything but null, because we hate nulls 
and don't want them in our bags" is not.

Further, there's nothing magic about this particular expression of a 
catch-all.  We could alternately write a catch-all for Box by matching 
`Box b` (which will match all boxes, including Box(null)) and then 
asking for `b.getContents()` (and if the box contains null, then 
getContents() will return the null.)

Pattern matching is about destructuring; destructuring should be 
transparent and non-judgmental.  If the box contains a null, or a frog, 
or the keys to the kingdom, destructuring should be in the business of 
opening the box, removing the contents, and handing them to you politely.

Now, you might think I've pulled a trick here, by writing Box(anything) 
instead of Box(var x).  And yes, there is a discussion to be had about 
the meaning of `var x` as a pattern.  But, the key point of this 
explanation is to highlight the fact that "Box containing anything" 
_must be_ the base case, and conditionality should be injected via 
composition with a more restrictive pattern, not the other way around.


Further, I think the focus on NPEs is misplaced here.  There seems to be 
a widespread fear that somehow there will be an epidemic of NPEs if we 
let any of our switch case match null.  For example:

     switch (o) {
         case Box(Frog f): ...
         case Object oo: ...  // if o is null, we'll NPE!  The horror!
     }

People seem very worried about this "new" NPE, where something matched a 
case and got bound to a variable and then we were suckered into 
dereferencing it with dramatic results.  But ... what would happen 
today?  If `o` were null,  the switch would ALREADY be NPEing before we 
even try to match any cases!  We haven't created a "new" NPE risk, we've 
taken an _existing_ NPE risk and moved it to where it _might not_ NPE.  
This seems like an improvement, and in any case, this sort of NPE does 
not seem to be in the top hundred problems that Java developers have 
every day.

As I explained already, there are two cases:

  - Your domain already excludes null (whether by convention or 
enforcement); there's no Box(null) running around in the wild. (Like 
Optional.)  Then it makes no difference whether Box(P) matches Box(null) 
for any P, because there are no Box(null) to be matched to, so the 
question will never be asked.

  - Your domain uses null.  Then cases like `Box(Object o)` should 
already be dealing with the possibility that the box contains null! 
Trying to shield you from the nulls is where we get the kind of "treat 
everything like its own special case" activity that you warned about in 
your first message.

The bottom line here is that pattern matching is something bigger than 
you are imagining.  So let's try to see the whole elephant before we try 
to build its habitat.


Some comments inline.

> Developers can manually write `case null` or `case Box(null)`, and

This argument represents a sort of Kubler-Ross "bargaining" stage in the 
process of accepting the elephant, and I know because I went through 
this stage too.  "Sure, how much work is it to write an extra Box(null) 
case if you want to handle all boxes?"

Let me explain why this is a terrible idea (and I say this without 
judgment, because I went through it too.)

First, it interacts poorly with binding.  Suppose I want to represent 
"all boxes", but there's no "match all boxes" pattern for whatever 
reason, only a Box(null) and a Box(everything else) pattern.  OK, fine:

     case Box(null), Box(var x): ....

But, no, x is not in scope here, because not all the patterns bound an 
x.  That sucks, because patterns are supposed to be about 
destructuring!   Fine, how about:

     case Box(null x), Box(var x): ....

This is looking kind of stupid, because declaring a variable that can 
only hold null is weird, but whatever.  Now, what's the type of x?  The 
natural type of x here is not Object or T or whatever the bound on Box's 
content is, it's the (non-denotable) null type, because I gave it a 
super restrictive pattern.  (In `case Box(Frog f)`, the type of f is 
`Frog`, not `Object`, and that's a feature, not a bug (or a frog.))  So 
even if I merge the two x's into one variable (complex, but doable), its 
type is probably not what I want.

OK, let's pretend all of these problems are solved at some complexity 
cost.  But, our Box example is merely a canonical example.  In the real 
world, our pattern might have two bindings:

     case Pair(var a, var b): ...

To follow the "just say Box(null) if you want all boxes" trail, I would 
have to say:

     case Pair(null, null), Pair(null, var b), Pair(var a, null), 
Pair(var a, var b): ...

Or it  might have 3 variables, in which case there are eight clauses.  
Writing 2^n cases to express totality just doesn't scale. This would 
only be workable when a total match was a when-the-planets-align 
situation, but it is actually a pretty common situation.  Box(null) has 
a role, but he proper role is when you want to treat that case 
_differently_ than all the other cases.

But, if you're in the bargaining stage, you're almost there.  Just one 
to go!

> As I argued before, I think most developers writing logic don't want
> the null, even in nested patterns.

This argument is from the "denial" stage.  I hope we're past that :)

> With my semantics, Bag(null) and Box(null) throw NPE. Or the developer
> can use `deafult` multiple times to accept nulls:
>   switch (container) {
>          case Box(Frog f): ...
>          case Box(Chocolate c): ...
>          default Box(var x): ....
This approach is attacking it from the wrong direction.  If you want a 
way to express totality, it has to be in the pattern, not in the 
enclosing construct, otherwise you get patterns that can express certain 
things in switch but not instanceof (or vice versa), and then you can't 
refactor one to the other.  That's more complexity with less 
expressiveness.

You're doing a lot of mental gymnastics to work around something that 
isn't the problem you think it is.  So I recommend instead that you 
spend some of that energy trying to understand why Box(anything) is so 
important -- and it is -- and then work back from there before trying to 
"fix" the problem.



More information about the amber-dev mailing list