Patterns and nulls
Brian Goetz
brian.goetz at oracle.com
Tue Aug 21 18:56:21 UTC 2018
Returning to this topic…
As mentioned in the original thread, some of what was in here went too
far. I think we’re comfortable saying:
* A /type pattern/, on its own, should coincide with |instanceof|,
meaning it never matches null
* A /var pattern/ is just a type pattern with the type supplied by
inference
If a type pattern |T t| only matches non-null instances, then we need a
way to match (with a binding variable), |T or null|. The obvious way to
spell this is |T? t|, and this doesn’t require adding nullable types at
all — it’s just a nullable type /pattern/. Let’s say we did this.
It still leaves us with two choices for how to write a pattern that
matches any |Box|, including |Box(null)|.
1. Just write |Box(Object? o)| if you want all boxes, or write
|Box(Object o)| if you mean a box containing a non-null.
2. Adjust the rules for nested patterns to treat a total
(type-restating) type pattern specially, so |Box(Frog f)| would only
match boxes containing non-null frogs, but |Box(Object o)| would
match all boxes.
The former is more principled, as it lets you say what you mean in a
straightforward way. The latter is more irregular, but might be more
inline with user intuition. I still worry that people will repeatedly
cut themselves on the sharp edge of |Box(Object o)| not matching all boxes.
Under either of these rule sets, we can use |default| to mean “all other
non null cases”, and/or |_| to mean “all other cases, including null”,
and we can allow |case null| to fall into |default|.
Whether |switch| throws on null depends on whether any patterns in the
switch are nullable; so far only |null| and |_| are nullable.
Under either of these rule sets, we can use |instanceof| as our match
operator.
So I think it comes down to a simple decision about whether we want to
distort nested total (type-restating) type patterns to be null-friendly
or null-hostile.
On 3/14/2018 12:58 PM, Brian Goetz wrote:
> In the message "More on patterns, generics, null, and primitives",
> Gavin outlines how these constructs will be treated in pattern
> matching. This mail is a refinement of that, specifically, to refine
> how nulls are treated.
>
> Rambling Background Of Why This Is A Problem At All
> ---------------------------------------------------
>
> Nulls will always be a source of corner cases and surprises, so the
> best we can likely do is move the surprises around to coincide with
> existing surprise modes. One of the existing surprise modes is that
> switches on reference types (boxes, strings, and enums) currently
> always NPE when passed a null. You could characterize switch's current
> treatment of null as "La la la can't hear you la la la." (I think
> this decision was mostly made by frog-boiling; in Java 1.0, there were
> no switches on reference types, so it was not an issue; when switches
> on boxes was added, it was done by appeal to auto-unboxing, which
> throws on null, and null enums are rare enough that no one felt it was
> important enough to do something different for them. Then when we
> added string switch in 7, we were already mostly sliding the slippery
> slope of past precedent.)
>
> The "la la la" approach has gotten us pretty far, but I think finally
> runs out of gas when we have nested patterns. It might be OK to NPE
> when x = null here:
>
> switch (x) {
> case String: ...
> case Integer: ...
> default: ...
> }
>
> but it is certainly not OK to NPE when b = new Box(null):
>
> switch (b) {
> case Box(String s): ...
> case Box(Integer i): ...
> case Box(Object o): ...
> }
>
> since `Box(null)` is a perfectly reasonable box. (Which of these
> patterns matches `Box(null)` is a different story, see below.) So
> problem #1 with is that we need a way to match nulls in nested
> patterns; having nested patterns throw whenever any intermediate
> binding produces null would be crazy. So, we have to deal with nulls
> in this way. It seems natural, therefore, to be able to confront it
> directly:
>
> case Box(null): ...
>
> which is just an ordinary nested pattern, where our target matches
> `Box(var x)` and further x matches null. Which means `x matches null`
> need to be a thing, even if switch is hostile to nulls.
>
> But if you pull on this string a bit more, we'd also like to do the
> same at the top level, because we'd like to be able to refactor
>
> switch (b) {
> case Box(null): ...
> case Box(Candy): ...
> case Box(Object): ...
> }
>
> into
>
> switch (b) {
> case Box(var x):
> switch (x) {
> case null: ...
> case Candy: ...
> case Object: ...
> }
> }
>
> with no subtle semantics changes. I think this is what users will
> expect, and cutting them on sharp edges here wouldn't be doing them
> favors.
>
>
> Null and Type Patterns
> ----------------------
>
> The previous iteration outlined in Gavin's mail was motivated by a
> sensible goal, but I think we took it a little too literally. Which is
> that if I have a `Box(null)`, it should match the following:
>
> case Box(var x):
>
> because it would be weird if `var x` in a nested context really meant
> "everything but null." This led us to the position that
>
> case Box(Object o):
>
> should also match `Box(null)`, because `var` is just type inference,
> and the compiler infers `Object` here from the signature of the `Box`
> deconstructor. So `var` and the type that gets inferred should be
> treated the same. (Note that Scala departs from this, and the results
> are pretty confusing.)
>
> You might convince yourself that `Box(Object)` not matching
> `Box(null)` is not a problem, just add a case to handle null, with an
> OR pattern (aka non-harmful fallthrough):
>
> case Box(null): // fall through
> case Box(Object): ...
>
> But, this only works in the simple case. What if my Box deconstructor
> had four binding variables:
>
> case Box(P, Q, R, S):
>
> Now, to capture the same semantics, you need four more cases:
>
> case Box(null, Q, R, S): // fall through
> case Box(P, null, R, S):// fall through
> case Box(P, Q, null, S): // fall through
> case Box(P, Q, R, null): // fall through
> case Box(P, Q, R, S):
>
> But wait, it gets worse, since if P and friends have binding
> variables, and the null pattern does not, the binding variables will
> not be DA and therefore not be usable. And if we graft binding
> variables onto constant patterns, we have a potential typing problem,
> since the type of merged binding variables in OR patterns should
> match. So this is a tire fire, let's back away slowly.
>
> So, we want at least some type patterns to match null, at least in
> nested contexts. Got it.
>
> This led us to: a type pattern `T t` should match null. But clearly,
> in the switch
>
> switch (aString) {
> case String s: ...
> }
>
> it NPEs (since that's what it does today.) So we moved the null
> hostility to `switch`, which involved an analysis of whether `case
> null` was present. As Kevin pointed out, that was pretty confusing
> for the users to keep track of. So that's not so good.
>
> Also not so good: if type patterns match null, then the dominance
> order rule says you can't put a `case null` arm after a type pattern
> arm, because the `case null` will be dead. (Just like you can't catch
> `IOException` after catching `Throwable`.) Which deprived case null
> of most of its remaining usefulness, which is: lump null in with the
> default. If users want to use `case null`, they most likely want this:
>
> switch (o) {
> case A: ...
> case B: ...
> case null: // fall through
> default:
> // deal with unexpected values
> }
>
> If we can't do that -- which the latest iteration said we can't -- its
> pretty useless. So, we got something wrong with type patterns too.
> Tricky buggers, these nulls!
>
>
> Some Problems With the Current Plan
> -----------------------------------
>
> The current plan, even though it came via a sensible path, has lots of
> problems. Including:
>
> - Its hard to reason about which switches throw on null and which
> don't. (This will never be easy, but we can make it less hard.)
> - We have asymmetries between nested and non-nested patterns; if we
> unroll a nested pattern to a nested switch, the semantics shift subtly
> out from under us.
> - There's no way to say "default including null", which is what
> people would actually want to do if they had explicit control over
> nulls. Having `String s` match null means our ordering rules force
> the null case too early, depriving us of the ability to lump it in
> with another case.
>
> Further, while the intent of `Box(var x)` matches `Box(null)` was
> right, and that led us to `Box(Object)` matches `Box(null)`, we didn't
> pull this string to the end. So let's break some assumptions and
> start over.
>
> Let's assume we have the following declarations:
>
> record Box(Object);
> Object o;
> String s;
> Box b;
>
> Implicitly, `Box` has a deconstruction pattern whose signature is
> `Box(out Object o)`.
>
> What will users expect on the following?
>
> Box b = new Box(null);
> switch (b) {
> case Box(Candy x): ...
> case Box(Frog f): ...
> case Box(Object o): ...
> }
>
> There are four non-ridiculous possibilities:
> - NPE
> - Match none
> - Match Box(Candy)
> - Match Box(Object)
>
> I argued above why NPE is undesirable; I think matching none of them
> would also be pretty surprising, since `Box(null)` is a perfectly
> reasonable element of the value set decribed by the pattern
> `Box(Object)`. If all type patterns match null, we'd match
> `Box(Candy)` -- but that's pretty weird and arbitrary, and probably
> not what the user expects. It also means -- and this is a serious
> smell -- that we couldn't freely reorder the independent cases
> `Box(Candy)` and `Box(Frog)` without subtly altering behavior. Yuck!
>
> So the only reasonable outcome is that it matches `Box(Object)`.
> We'll need a credible theory why we bypass the candy and the frog
> buckets, but I think this is what the user will expect --
> `Box(Object)` is our catch-all bucket.
>
> A Credible Theory
> -----------------
>
> Recall that matching a nested pattern `x matches Box(P)` means:
>
> x matches Box(var alpha) && alpha matches P
>
> The theory by which we can reasonably claim that `Box(Object)` matches
> `Box(null)` is that the nested pattern `Object` is _total_ on the type
> of its target (alpha), and therefore can be statically deemed to match
> without additional dynamic checks. In
>
> case Box(Candy x): ...
> case Box(Frog f): ...
> case Box(Object o): ...
>
> the first two cases require additional dynamic type tests (instanceof
> Candy / Frog), but the latter, if the target is a `Box` at all,
> requires no further dynamic testing. So we can _define_ `T t` to mean:
>
> match(T t, e : U) === U <: T ? true : e instanceof U
>
> In other words, a total type pattern matches null, but a partial type
> pattern does not. That's great for the type system weenies, but does
> it help the users? I claim it does. It means that in:
>
> Box b = new Box(null);
> switch (b) {
> case Box(Candy x): ...
> case Box(Frog f): ...
> case Box(Object o): ...
> }
>
> We match `Box(Object)`, which is the catch-all `Box` handler. We can
> freely reorder the first two cases, because they're unordered by
> dominance, but we can't reorder either of them with `Box(Object)`,
> because that would create a dead case arm. `Box(var x)` and `Box(T
> x)` mean the same thing when `T` is the type that inference produces.
>
> So `Box(Candy)` selects all boxes known to contain candy; `Box(Frog)`
> all boxes known to contain frogs; `Box(null)` selects a box containing
> null, and `Box(_)` or `Box(var x)` or `Box(Object o)` selects all boxes.
>
> Further, we can unroll the above to:
>
> Box b = new Box(null);
> switch (b) {
> case Box(var x):
> switch (x) {
> case Candy c: ...
> case Frog f: ...
> case Object o: ...
> }
> }
>
> and it means _the same thing_; the nulls flow into the `Object` catch
> basin, and I can still freely recorder the Candy/Frog cases. Whew.
> This feels like we're getting somewhere.
>
> We can also now flow the `case null` down to where it falls through
> into the "everything else" bucket, because type patterns no longer
> match nulls. If specified at all, this is probably where the user
> most wants to put it.
>
> Note also that the notion of a "total pattern" (one whose
> applicability, possibly modulo null, can be determined statically)
> comes up elsewhere too. We talked about a let-bind statement:
>
> let Point(var x, var y) = p
>
> In order for the compiler to know that an `else` is not required on a
> let-bind, the pattern has to be total on the static type of the
> target. So this notion of totality is a useful one.
>
> Where totality starts to feel uncomfortable is the fact that while
> null _matches_ `Object o`, it is not `instanceof Object`. More on
> this later.
>
> This addresses all the problems we stated above, so what's the problem?
>
> Default becomes legacy
> ----------------------
>
> The catch is that the irregularity of `default` becomes even more
> problematic. The cure is we give `default` a gold watch, thank it for
> its services, and grant it "Keyword Emeritus" status.
>
> What's wrong with default? First, it's syntactically irregular. It's
> not a pattern, so doesn't easily admit nesting or binding variables.
> And second, its semantically irregular; it means "everything else (but
> not null!)" Which makes it a poor catch-all. We'd like for our
> catch-all case -- the one that dominates all other possible cases --
> to catch everything. We thought we wanted `default` to be equivalent
> to a total pattern, but default is insufficiently total.
>
> So, let's define a _constant switch_ as one whose target is the
> existing constant types (primitives, their boxes, strings, and enums)
> and whose labels are all constants (the latter condition might not be
> needed). In a constant switch, retcon default to mean "all the
> constants I've not explicitly enumerated, except null." (If you want
> to flow nulls into the default bin too, just add an explicit `case
> null` to fall into default, _or_ replace `default` with a total
> pattern.) We act as if that constant switches have an implicit "case
> null: NPE" _at the bottom_. If you don't handle null explicitly (a
> total pattern counts as handling it explicitly), you fall into that
> bucket.
>
> Then, we _ban_ default in non-constant switches. So if you want
> patterns, swap your old deficient `default` for new shiny total
> patterns, which are a better default, and are truly exhaustive (rather
> than modulo-null exhaustive). If we can do a little more to express
> the intention of exhaustiveness for statement switches (which are not
> required to be exhaustive), this gives us a path to "switches never
> throw NPE if you follow XYZ rules."
>
> There's more work to do here to get to this statically-provable
> null-safe switch future, but I think this is a very positive
> direction. (Of course, we can't prevent NPEs from people matching
> against `Object o` and then dereferencing o.)
>
> Instanceof becomes instanceof
> -----------------------------
>
> The other catch is that we can't use `instanceof` to be the spelling
> of our `matches` operator, because it conflicts with existing
> `instanceof` treatment of nulls. I think that's OK; `instanceof` is a
> low-level primitive; matching is a high-level construct defined
> partially in terms of instanceof.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180821/5a321866/attachment-0001.html>
More information about the amber-spec-experts
mailing list