Letting the nulls flow (Was: Exhaustiveness)

Sun Aug 23 15:43:03 UTC 2020

Thanks, Tagir -- this is a perfect example of what I meant yesterday by 
how the "blow early, blow often" approach is a false promise.  It just 
means that responsible programmers who need to deal with null as a 
fact-of-life have to do *extra* work (which is therefore more 
duplicative or error-prone) to deal with it.

On 8/22/2020 11:46 PM, Tagir Valeev wrote:
> Hello!
>
> Some data from the current IntelliJ IDEA codebase
>
> We have 64 occurrences of this code pattern
> if($x$ == null) {...} // presumably completes abruptly
> switch($x) {...}
> Roughly half of them are enum switches and the other half is string switches
>
> Also, we have 29 occurrences of this code pattern:
> if($x$ != null) {
>    switch($x$) { ... }
>    ...
> }
>
> Also, we have one occurrence of this code pattern:
> if($x$ == null) {...
> } else {
>    switch($x) {...}
> }
>
> All of them could benefit from null-friendly switch. Btw often null
> branch is the same as default branch (or some other non-null branch).
>
> With best regards,
> Tagir Valeev
>
> On Sun, Aug 23, 2020 at 12:14 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>> Breaking into a separate thread.   I hope we can put this one to bed
>> once and for all.
>>
>>> I'm not hostile to that view, but may i ask an honest question, why
>>> this semantics is better ?
>>> Do you have examples where it makes sense to let the null to slip
>>> through the statement switch ? Because as i can see why being null
>>> hostile is a good default, it follows the motos "blow early, blow
>>> often" or "in case of doubt throws".
>> Charitably, I think this approach is borne of a belief that, if we keep
>> the nulls out by posting sentries at the door, we can live an interior
>> life unfettered by stray nulls.  But I think it is also time to
>> recognize that this approach to "block the nulls at the door" (a)
>> doesn't actually work, (b) creates sharp edges when the doors move
>> (which they do, though refactoring), and (c) pushes the problems elsewhere.
>>
>> (To illustrate (c), just look at the conversation about nulls in
>> patterns and switch we are having right now!  We all came to this
>> exercise thinking "switch is null-hostile, that's how it's always been,
>> that's how it must be", and are contorting ourselves to try to come up
>> with a consistent explanation.   But, if we look deeper, we see that
>> switch is *only accidentally* null-hostile, based on some highly
>> contextual decisions that were made when adding enum and autoboxing in
>> Java 5.  I'll talk more about that decision in a moment, but my point
>> right now is that we are doing a _lot_ of work to try to be consistent
>> with an arbitrary decision that was made in the past, in a specific and
>> limited context, and probably not with the greatest care.  Truly today's
>> problems come from yesterdays "solutions."  If we weren't careful, an
>> accidental decision about nulls in enum switch almost polluted the
>> semantics of pattern matching!  That would be terrible!  So let's stop
>> doing that, and let's stop creating new ways for our tomorrow's selves
>> to be painted into a corner.)
>>
>>
>> As background, I'll observe that every time a new context comes up,
>> someone suggests "we should make it null-hostile."  (Closely related: we
>> should make that new kind of variable immutable.)  And, nearly every
>> time, this ends up being the wrong choice.  This happened with Streams;
>> when we first wrestled with nulls in streams, someone pushed for "Just
>> have streams throw on null elements."  But this would have been
>> terrible; it would have meant that calculations on null-friendly
>> domains, that were prepared to engage null directly, simply could not
>> use streams in the obvious way; calculations like:
>>
>>       Stream.of(arrayOfStuff)
>>                   .map(Stuff::methodThatMightReturnNull)
>>                   .filter(x -> x != null)
>>                   .map(Stuff::doSomething)
>>                   .collect(toList())
>>
>> would not be directly expressible, because we would have already NPEed.
>> Sure, there are workarounds, but for what?  Out of a naive hope that, if
>> we inject enough null checks, no one will ever have to deal with null?
>> Out of irrational hatred for nulls?  Nothing good comes from either of
>> these motivations.
>>
>> But, this episode wasn't over.  It was then suggested "OK, we can't NPE,
>> but how about we filter the nulls?"  Which would have been worse.  It
>> would mean that, for example, doing a map+toArray on an array might not
>> have the same size as the initial array -- which would violate what
>> should be a pretty rock-solid intuition.  It would kill all the
>> pre-sized-array optimizations.  It would mean `zip` would have no useful
>> semantics.  Etc etc.
>>
>> In the end, we came to the right answer for streams, which is "let the
>> nulls flow".   And this is was the right choice because Streams is
>> general-purpose plumbing.  The "blow early" bias is about guarding the
>> gates, and thereby hopefully keeping the nulls from getting into the
>> house and having wild null parties at our expense. And this works when
>> the gates are few, fixed, and well marked.  But if your language
>> exhibits any compositional mechanisms (which is our best tool), then
>> what was the front door soon becomes the middle of the hallway after a
>> trivial refactoring -- which means that no refactorings are really
>> trivial.  Oof.
>>
>> We already went through a good example recently where it would be
>> foolish to try to exclude null (and yet we tried anyway) --
>> deconstruction patterns.  If a constructor
>>
>>       new Foo(x)
>>
>> can accept null, then a deconstructor
>>
>>       case Foo(var x)
>>
>> should dutifully serve up that null.  The guard-the-gates brigade tried
>> valiently to put up new gates at each deconstructor, but that would have
>> been a foolish place to put such a boundary.  I offered an analogy to
>> having deconstruction reject null over on amber-dev:
>>
>>> In languages with side-effects (like Java), not all aggregation
>>> operations are reversible; if I bake a pie, I can't later recover the
>>> apples and the sugar.  But many are, and we like abstractions like
>>> these (collections, Optional, stream, etc) because they are very
>>> useful and easily reasoned about.  So those that are, should commit to
>>> the principle.  It would be OK for a list implementation to behave
>>> like this:
>>>
>>>      Listy list = new Listy();
>>>      list.add(null) // throws NPE
>>>
>>> because a List is free to express constraints on its domain.  But it
>>> would be exceedingly bizarre for a list implementation to behave like
>>> this:
>>>
>>>      Listy list = new Listy();
>>>      list.add(3);     // ok, I like ints
>>>      list.add(null); // ok, I like nulls too
>>>      assertTrue(list.size() == 2);   // ok
>>>      assertTrue(list.get(0) == 3); // ok
>>>      assertTrue(list.get(1) == null);  // NPE!
>>>
>>> If the list takes in nulls, it should give them back.
>> Now, this is like the first suggested form of null-hostility in streams,
>> and to everyone's credit, no one suggested exactly that, but what was
>> suggested was the second, silent form of hostility -- just pretend you
>> don't see the nulls.  And, like with streams, that would have been
>> silly.  So, OK, we dodged the bullet of infecting patterns with special
>> nullity rules.  Whew.
>>
>> Now, switch.  As I mentioned, I think we're here mostly because we are
>> perpetuating the null biases of the past.  In Java 1.0, switches were
>> only over primitives, so there was no question about nulls.  In Java 5,
>> we added two new reference-typed switch targets: enums and boxes.  I
>> wasn't in the room when that decision was made, but I can imagine how it
>> went: Java 5 was a *very* full release, and under dramatic pressure to
>> get out the door.  The discussion came up about nulls, maybe someone
>> even suggested `case null` back then.  And I'm sure the answer was some
>> form of "null enums and primitive boxes are almost always bugs, let's
>> not bend over backwards and add new complexity to the language (case
>> null) just to accomodate this bug, let's just throw NPE."
>>
>> And, given how limited switch was, and the special characteristics of
>> enums and boxes, this was probably a pragmatic decision, but I think we
>> lost sight of the subtleties of the context.  It is almost certainly
>> right that 99.999% of the time, a null enum or box is a bug.  But this
>> is emphatically not true when we broaden the type to Object.  Since the
>> context and conditions change, the decision should be revisited before
>> copying it to other contexts.
>>
>> In Java 7, when we added switching on strings, I do remember the
>> discussion about nulls; it was mostly about "well, there's a precedent,
>> and it's not worth breaking the precedent even if null strings are more
>> common than null Integers, and besides, the mandate of Project Coin is
>> very limited, and `case null` would probably be out of scope."  While
>> this may have again been a pragmatic choice at the time given the
>> constraints, it further set us down a slippery slope where the
>> assumption that "switches always throw null" is set in concrete.  But
>> this assumption is not founded on solid ground.
>>
>> So, the better way to approach this is to imagine Java had no switch,
>> and we were adding a general switch today.  Would we really be
>> advocating so hard for "Oooh, another door we can guard, let's stick it
>> to the nulls there too"?  (And, even if we were tempted to, should we?)
>>
>> The plain fact is that we got away with null-hostility in the first
>> three forms of reference types in switch because switch (at the time)
>> was such a weak and non-compositional mechanism, and there are darn few
>> things it can actually do well.  But, if we were designing a
>> general-purpose switch, with rich labels and enhanced control flow
>> (e.g., guards) as we are today, where we envisioned refactoring between
>> switches on nested patterns and patterns with nested switches, this
>> would be more like a general plumbing mechanism, like streams, and when
>> plumbing has an opinion about the nulls, frantic calls to the plumber
>> are not far behind.  The nulls must flow unimpeded, because otherwise,
>> we create new anomalies and blockages like the streams examples I gave
>> earlier and refactoring surprises. And having these anomalies doesn't
>> really make life any better for the users -- it actually makes
>> everything just less predictable, because it means simple refactorings
>> are not simple -- and in a way that is very easy to forget about.
>>
>> If we really could keep the nulls out at the front gate, and thus define
>> a clear null-free domain to work in, then I would be far more
>> sympathetic to the calls of "new gates, new guards!"  But the gates
>> approach just doesn't work, and we have ample evidence of this.  And the
>> richer and more compositional we make the language, the more sharp edges
>> this creates, because old interiors become new gates.
>>
>> So, back to the case at hand (though we should bring specifics this back
>> to the case-at-hand thread): what's happening here is our baby switch is
>> growing up into a general purpose mechanism.  And, we should expect it
>> to take on responsibilities suited to its new abilities.
>>
>>
>> Now, for the backlash.  Whenever we make an argument for
>> what-appears-to-be relaxing an existing null-hostility, there is much
>> concern about how the nulls will run free and wreak havoc. But, let's
>> examine that more closely.
>>
>> The concern seems to be that, if if we let the null through the gate,
>> we'll just get more NPEs, at worse places.  Well, we can't get more
>> NPEs; at most, we can get exactly the same number.  But in reality, we
>> will likely get less.  There are three cases.
>>
>> 1.  The domain is already null-free.  In this case, it doesn't make a
>> difference; no NPEs before, none after.
>>
>> 2.  The domain is mostly null-free, but nulls do creep in, we see them
>> as bugs, and we are happy to get notified.  This is the case today with
>> enums, where a null enum is almost always a bug.  Yes, in cases like
>> this, not guarding the gates means that the bug will get further before
>> it is detected, or might go undetected.  This isn't fantastic, but this
>> also isn't a disaster, because it is rare and is still likely it will
>> get detected eventually.
>>
>> 3.  The domain is at least partially null tolerant.  Here, we are moving
>> an always-throw at the gates to a
>> might-throw-in-the-guts-if-you-forget.  But also, there are plenty of
>> things you can do with a null binding that don't NPE, such as pass it to
>> a method that deals sensibly with nulls, add it to an ArrayList, print
>> it, etc.  This is a huge improvement, from "must treat null in a
>> special, out of band way" to "treat null uniformly."  At worst, it is no
>> worse, and often better.
>>
>> And, when it comes to general purpose domains, #3 is much bigger than
>> #2.  So I think we have to optimize for #3.
>>
>>
>> Finally, there are those who argue we should "just" have nullable types
>> (T? and T!), and then all of this goes away.  I would love to get there,
>> but it would be a very long road.  But let's imagine we do get there.
>> OMG how terrible it would be when constructs like lambdas, switches, or
>> patterns willfully try to save us from the nulls, thus doing the job
>> (badly) of the type system!  We'd have explicitly nullable types for
>> which some constructs NPE anyway. Or, we'd have to redefine the
>> semantics of everything in complex ways based on whether the underlying
>> input types are nullable or not.  We would feel pretty stupid for having
>> created new corners to paint ourselves into.
>>
>> Our fears of untamed nulls wantonly running through the streets are
>> overblown.  Our attempts to contain the nulls through ad-hoc
>> gate-guarding have all been failures.  Let the nulls flow.
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200823/96b021cf/attachment-0001.htm>