Letting the nulls flow (Was: Exhaustiveness)
Brian Goetz
brian.goetz at oracle.com
Sun Aug 23 15:43:03 UTC 2020
Thanks, Tagir -- this is a perfect example of what I meant yesterday by
how the "blow early, blow often" approach is a false promise. It just
means that responsible programmers who need to deal with null as a
fact-of-life have to do *extra* work (which is therefore more
duplicative or error-prone) to deal with it.
On 8/22/2020 11:46 PM, Tagir Valeev wrote:
> Hello!
>
> Some data from the current IntelliJ IDEA codebase
>
> We have 64 occurrences of this code pattern
> if($x$ == null) {...} // presumably completes abruptly
> switch($x) {...}
> Roughly half of them are enum switches and the other half is string switches
>
> Also, we have 29 occurrences of this code pattern:
> if($x$ != null) {
> switch($x$) { ... }
> ...
> }
>
> Also, we have one occurrence of this code pattern:
> if($x$ == null) {...
> } else {
> switch($x) {...}
> }
>
> All of them could benefit from null-friendly switch. Btw often null
> branch is the same as default branch (or some other non-null branch).
>
> With best regards,
> Tagir Valeev
>
> On Sun, Aug 23, 2020 at 12:14 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>> Breaking into a separate thread. I hope we can put this one to bed
>> once and for all.
>>
>>> I'm not hostile to that view, but may i ask an honest question, why
>>> this semantics is better ?
>>> Do you have examples where it makes sense to let the null to slip
>>> through the statement switch ? Because as i can see why being null
>>> hostile is a good default, it follows the motos "blow early, blow
>>> often" or "in case of doubt throws".
>> Charitably, I think this approach is borne of a belief that, if we keep
>> the nulls out by posting sentries at the door, we can live an interior
>> life unfettered by stray nulls. But I think it is also time to
>> recognize that this approach to "block the nulls at the door" (a)
>> doesn't actually work, (b) creates sharp edges when the doors move
>> (which they do, though refactoring), and (c) pushes the problems elsewhere.
>>
>> (To illustrate (c), just look at the conversation about nulls in
>> patterns and switch we are having right now! We all came to this
>> exercise thinking "switch is null-hostile, that's how it's always been,
>> that's how it must be", and are contorting ourselves to try to come up
>> with a consistent explanation. But, if we look deeper, we see that
>> switch is *only accidentally* null-hostile, based on some highly
>> contextual decisions that were made when adding enum and autoboxing in
>> Java 5. I'll talk more about that decision in a moment, but my point
>> right now is that we are doing a _lot_ of work to try to be consistent
>> with an arbitrary decision that was made in the past, in a specific and
>> limited context, and probably not with the greatest care. Truly today's
>> problems come from yesterdays "solutions." If we weren't careful, an
>> accidental decision about nulls in enum switch almost polluted the
>> semantics of pattern matching! That would be terrible! So let's stop
>> doing that, and let's stop creating new ways for our tomorrow's selves
>> to be painted into a corner.)
>>
>>
>> As background, I'll observe that every time a new context comes up,
>> someone suggests "we should make it null-hostile." (Closely related: we
>> should make that new kind of variable immutable.) And, nearly every
>> time, this ends up being the wrong choice. This happened with Streams;
>> when we first wrestled with nulls in streams, someone pushed for "Just
>> have streams throw on null elements." But this would have been
>> terrible; it would have meant that calculations on null-friendly
>> domains, that were prepared to engage null directly, simply could not
>> use streams in the obvious way; calculations like:
>>
>> Stream.of(arrayOfStuff)
>> .map(Stuff::methodThatMightReturnNull)
>> .filter(x -> x != null)
>> .map(Stuff::doSomething)
>> .collect(toList())
>>
>> would not be directly expressible, because we would have already NPEed.
>> Sure, there are workarounds, but for what? Out of a naive hope that, if
>> we inject enough null checks, no one will ever have to deal with null?
>> Out of irrational hatred for nulls? Nothing good comes from either of
>> these motivations.
>>
>> But, this episode wasn't over. It was then suggested "OK, we can't NPE,
>> but how about we filter the nulls?" Which would have been worse. It
>> would mean that, for example, doing a map+toArray on an array might not
>> have the same size as the initial array -- which would violate what
>> should be a pretty rock-solid intuition. It would kill all the
>> pre-sized-array optimizations. It would mean `zip` would have no useful
>> semantics. Etc etc.
>>
>> In the end, we came to the right answer for streams, which is "let the
>> nulls flow". And this is was the right choice because Streams is
>> general-purpose plumbing. The "blow early" bias is about guarding the
>> gates, and thereby hopefully keeping the nulls from getting into the
>> house and having wild null parties at our expense. And this works when
>> the gates are few, fixed, and well marked. But if your language
>> exhibits any compositional mechanisms (which is our best tool), then
>> what was the front door soon becomes the middle of the hallway after a
>> trivial refactoring -- which means that no refactorings are really
>> trivial. Oof.
>>
>> We already went through a good example recently where it would be
>> foolish to try to exclude null (and yet we tried anyway) --
>> deconstruction patterns. If a constructor
>>
>> new Foo(x)
>>
>> can accept null, then a deconstructor
>>
>> case Foo(var x)
>>
>> should dutifully serve up that null. The guard-the-gates brigade tried
>> valiently to put up new gates at each deconstructor, but that would have
>> been a foolish place to put such a boundary. I offered an analogy to
>> having deconstruction reject null over on amber-dev:
>>
>>> In languages with side-effects (like Java), not all aggregation
>>> operations are reversible; if I bake a pie, I can't later recover the
>>> apples and the sugar. But many are, and we like abstractions like
>>> these (collections, Optional, stream, etc) because they are very
>>> useful and easily reasoned about. So those that are, should commit to
>>> the principle. It would be OK for a list implementation to behave
>>> like this:
>>>
>>> Listy list = new Listy();
>>> list.add(null) // throws NPE
>>>
>>> because a List is free to express constraints on its domain. But it
>>> would be exceedingly bizarre for a list implementation to behave like
>>> this:
>>>
>>> Listy list = new Listy();
>>> list.add(3); // ok, I like ints
>>> list.add(null); // ok, I like nulls too
>>> assertTrue(list.size() == 2); // ok
>>> assertTrue(list.get(0) == 3); // ok
>>> assertTrue(list.get(1) == null); // NPE!
>>>
>>> If the list takes in nulls, it should give them back.
>> Now, this is like the first suggested form of null-hostility in streams,
>> and to everyone's credit, no one suggested exactly that, but what was
>> suggested was the second, silent form of hostility -- just pretend you
>> don't see the nulls. And, like with streams, that would have been
>> silly. So, OK, we dodged the bullet of infecting patterns with special
>> nullity rules. Whew.
>>
>> Now, switch. As I mentioned, I think we're here mostly because we are
>> perpetuating the null biases of the past. In Java 1.0, switches were
>> only over primitives, so there was no question about nulls. In Java 5,
>> we added two new reference-typed switch targets: enums and boxes. I
>> wasn't in the room when that decision was made, but I can imagine how it
>> went: Java 5 was a *very* full release, and under dramatic pressure to
>> get out the door. The discussion came up about nulls, maybe someone
>> even suggested `case null` back then. And I'm sure the answer was some
>> form of "null enums and primitive boxes are almost always bugs, let's
>> not bend over backwards and add new complexity to the language (case
>> null) just to accomodate this bug, let's just throw NPE."
>>
>> And, given how limited switch was, and the special characteristics of
>> enums and boxes, this was probably a pragmatic decision, but I think we
>> lost sight of the subtleties of the context. It is almost certainly
>> right that 99.999% of the time, a null enum or box is a bug. But this
>> is emphatically not true when we broaden the type to Object. Since the
>> context and conditions change, the decision should be revisited before
>> copying it to other contexts.
>>
>> In Java 7, when we added switching on strings, I do remember the
>> discussion about nulls; it was mostly about "well, there's a precedent,
>> and it's not worth breaking the precedent even if null strings are more
>> common than null Integers, and besides, the mandate of Project Coin is
>> very limited, and `case null` would probably be out of scope." While
>> this may have again been a pragmatic choice at the time given the
>> constraints, it further set us down a slippery slope where the
>> assumption that "switches always throw null" is set in concrete. But
>> this assumption is not founded on solid ground.
>>
>> So, the better way to approach this is to imagine Java had no switch,
>> and we were adding a general switch today. Would we really be
>> advocating so hard for "Oooh, another door we can guard, let's stick it
>> to the nulls there too"? (And, even if we were tempted to, should we?)
>>
>> The plain fact is that we got away with null-hostility in the first
>> three forms of reference types in switch because switch (at the time)
>> was such a weak and non-compositional mechanism, and there are darn few
>> things it can actually do well. But, if we were designing a
>> general-purpose switch, with rich labels and enhanced control flow
>> (e.g., guards) as we are today, where we envisioned refactoring between
>> switches on nested patterns and patterns with nested switches, this
>> would be more like a general plumbing mechanism, like streams, and when
>> plumbing has an opinion about the nulls, frantic calls to the plumber
>> are not far behind. The nulls must flow unimpeded, because otherwise,
>> we create new anomalies and blockages like the streams examples I gave
>> earlier and refactoring surprises. And having these anomalies doesn't
>> really make life any better for the users -- it actually makes
>> everything just less predictable, because it means simple refactorings
>> are not simple -- and in a way that is very easy to forget about.
>>
>> If we really could keep the nulls out at the front gate, and thus define
>> a clear null-free domain to work in, then I would be far more
>> sympathetic to the calls of "new gates, new guards!" But the gates
>> approach just doesn't work, and we have ample evidence of this. And the
>> richer and more compositional we make the language, the more sharp edges
>> this creates, because old interiors become new gates.
>>
>> So, back to the case at hand (though we should bring specifics this back
>> to the case-at-hand thread): what's happening here is our baby switch is
>> growing up into a general purpose mechanism. And, we should expect it
>> to take on responsibilities suited to its new abilities.
>>
>>
>> Now, for the backlash. Whenever we make an argument for
>> what-appears-to-be relaxing an existing null-hostility, there is much
>> concern about how the nulls will run free and wreak havoc. But, let's
>> examine that more closely.
>>
>> The concern seems to be that, if if we let the null through the gate,
>> we'll just get more NPEs, at worse places. Well, we can't get more
>> NPEs; at most, we can get exactly the same number. But in reality, we
>> will likely get less. There are three cases.
>>
>> 1. The domain is already null-free. In this case, it doesn't make a
>> difference; no NPEs before, none after.
>>
>> 2. The domain is mostly null-free, but nulls do creep in, we see them
>> as bugs, and we are happy to get notified. This is the case today with
>> enums, where a null enum is almost always a bug. Yes, in cases like
>> this, not guarding the gates means that the bug will get further before
>> it is detected, or might go undetected. This isn't fantastic, but this
>> also isn't a disaster, because it is rare and is still likely it will
>> get detected eventually.
>>
>> 3. The domain is at least partially null tolerant. Here, we are moving
>> an always-throw at the gates to a
>> might-throw-in-the-guts-if-you-forget. But also, there are plenty of
>> things you can do with a null binding that don't NPE, such as pass it to
>> a method that deals sensibly with nulls, add it to an ArrayList, print
>> it, etc. This is a huge improvement, from "must treat null in a
>> special, out of band way" to "treat null uniformly." At worst, it is no
>> worse, and often better.
>>
>> And, when it comes to general purpose domains, #3 is much bigger than
>> #2. So I think we have to optimize for #3.
>>
>>
>> Finally, there are those who argue we should "just" have nullable types
>> (T? and T!), and then all of this goes away. I would love to get there,
>> but it would be a very long road. But let's imagine we do get there.
>> OMG how terrible it would be when constructs like lambdas, switches, or
>> patterns willfully try to save us from the nulls, thus doing the job
>> (badly) of the type system! We'd have explicitly nullable types for
>> which some constructs NPE anyway. Or, we'd have to redefine the
>> semantics of everything in complex ways based on whether the underlying
>> input types are nullable or not. We would feel pretty stupid for having
>> created new corners to paint ourselves into.
>>
>> Our fears of untamed nulls wantonly running through the streets are
>> overblown. Our attempts to contain the nulls through ad-hoc
>> gate-guarding have all been failures. Let the nulls flow.
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200823/96b021cf/attachment-0001.htm>
More information about the amber-spec-experts
mailing list