Telling the totality story

Wed Mar 30 18:41:15 UTC 2022

Hello experts,

I've tried multiple times to come up with some constructive feedback about
patterns in switch and total patterns, but it seems very hard in this case.
After trying to analyse this thread along with the "Treatment of total
patterns", I can come up with the following points:

 * I've used switch on Patterns a lot in Java 17 preview, and had no
problems with treatment of null.
   "Treatment of total patterns" proposed changes based on "people are
still uncomfortable with `case Object o` binding null to o",
  but I don't fully understand the exact problems that people see with.

 * It's hard for me to discuss didactic aspects of pattern matching,
because I haven't fully recovered from the introduction of explicit null
processing in switch.
   For me, the original model as introduced in Java 17 was a simpler one,
both mathematically and didactically, and
   I can't fully justify to myself the switch.

 * If we are going to accept the complexity of additional null processing,
then I have to (and this is surprising to me) say that the story presented
here is
   more "natural" then the previous story.
   This story sounds to me like better following the single responsibility
principle, in that there is a single null-handling responsibility.
   I think even when I had a previous story, mentally I've translated it
into this story, even when I've tried not to admit it to myself.

 * The story itself is still not good for me, but I think the problem is
not the story, but the model itself.
   And the solution is not a better story, but a better underlying model,
possibly a switch back to a previous model.

If I can indulge myself and go back to think about other possible models, I
can collect the following facts that seem important to the model:

 * There are statically total and exhaustive patterns and people seem to
confuse those.
 * There are two sources of exhaustive pattern remainder:
    1. new classes or constants from the future that are missing during the
compilation, but appear during the execution
    2. unexpected null, null expectation is determined by the totality of
pattern, which is a static property.

We can observe the following:

 * When new class or constant appears, then switch-without-default fails to
compile, switch-with-default silently accepts new class or constant
 * When totality of pattern changes, then switch-without-default start to
throw on null, switch-with-default start to throw on null

The thing, that I think people may want, is to get some "fails to compile"

 * When totality of pattern changes, then switch-without-default fails to
compile, switch-with-default accepts null

We already can have desired switch-with-default behaviour, by replacing
`default` with `case default, null`, but we lack the instruments to make
switch-without-default fail to compile.
The current model doesn't provide this behaviour, what it does is
introduces another dimension for switch: switch[{with/without} null,
{with/without} default], that only affects dynamic behaviour of the switch,
but not compile-time behaviour.

Java has a long tradition to avoid any "action at distance" and instead to
encode static properties at the type level and make compilation fail, when
static properties change, so maybe we need to do it here and to have a way
to encode this static property, so that the example below will fail to
compile.

record B(String s);
record A(B b)
A a = new A(null);
String result = switch (a) no-null-in-remainder {
    case A(B(String s)) -> s;
}; // error: there exists a remainder values with null

But the adjusted version will successfully compile:

record B(String s);
record A(B b)
A a = new A(null);
String result = switch (a) no-null-in-remainder {
    case null -> "null";
    case A(B b) -> a.toString();
}; // ok

Having the same example, but without `no-null-in-remainder` will succeed:

record B(String s);
record A(B b)
A a = new A(null);
String result = switch (a) {
    case A(B(String s)) -> s;
}; // ok, throw MatchingException

Possibly we should have compilation fail when switch is annotated with
`no-null-in-remainder`, but have a remainder, and possibly compilation
should fail when there is no remainder, but switch is annotated with
`no-null-in-remainder`. This will mimic rules for methods annotated with
throws.

I think a proposal like this was disposed early, because we do not want to
introduce new flags that people need to always set and will regret when
they do not set them, but this argument
doesn't work here, because here the compiler will ask the user to set the
required flag or to remove it.

`no-null-in-remainder` is a static property only, there is no code that
distinguishes between null and unexpected class or unexpected constant in
the runtime.
In all cases the same MatchException is thrown.
`no-null-in-remainder` is defined so that there is no value in the
remainder containing null, this can be implemented because switch already
does exhaustiveness check at the compile time.
>From a didactic point of view `no-null-in-remainder` is about
exhaustiveness, it's not about null handling or about the semantics of
pattern matching.
Pattern matching is not affected at all, it's just the additional static
check. Something akin to previously proposed check for instanceof, that
required the pattern not to be total.

`no-null-in-remainder` can be called `total`, but this is very confusing
given that switch is already expected to be exhaustive and it's the
presence of "default" that controls "totality".
`no-null-in-remainder` can be called `no-remainder`, but this requires us
to think about a difference between compile time and run-time, at run-time
we can still get an exception from newly appeared, separately compiled new
class, so I think it's important to say that this has something to do with
"null".
The syntactic form that I've chosen is here to mimic those of a method's
"throws" clause.
Here we try to state that "switch" doesn't throw a MatchException, caused
by the null value, so we may probably use the word "throw", like
`no-throw-on-unmatched-null`.

Refactoring between nested switches is preserved only when the
no-null-in-remainder property is preserved:

    switch (x) {
        case P(Q): B
        case P(T): C
    }

is exactly the same as

    switch (x) {
        case P(var alpha):
            switch(alpha) no-null-in-remainder {
                case Q: B
                case T: C
            }
        }
   }

Compilation will fail on missing or excessive `no-null-in-remainder`
annotations during the refactoring.

I feel that `case null` in the current proposal tried to do something like
this, but failed to achieve it because it carried away with `case null`
being just a case for null value, and not an annotation affecting
exhaustive analysis.

The proposed model requires some good syntax for the
`no-null-in-remainder`, this is another problem, but maybe we can adopt
just the `null` keyword, as illustrated with compilation errors below.

Again we can say that this annotation is not about null, but about
remainder, but when we frame it like this we should talk about static and
dynamic behaviour separately. We can not prevent introduction of new
classes and constants from separate compilation. But if we frame it as
something about null only, then there is no difference.

String result = switch (a) {
    case null -> "null";
    case A(B b) -> a.toString();
}; // compilation error: all null-containing values are matched, switch
should be annotated with `null`

String result = switch (a) null {
    case null -> "null";
    case A(B b) -> a.toString();
}; // ok

class A = P | Q
A alpha = ...;
switch(alpha) null {
   case P p: B
   case Q q: C
 }; // compilation error: not all null-containing values are matched,
switch should not be annotated with `null`, possible null containing
values: {null}

String result = switch (a) null {
    case null -> "null";
    case A(B(String s)) -> a.toString();
}; // compilation error: not all null-containing values are matched, switch
should not be annotated with `null`, possible null containing values:
{A(null)}

--
Victor Nazarov

On Thu, Mar 3, 2022 at 9:42 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> Given the misconceptions about totality and whether "pattern matching
> means the same thing in all contexts", it is pretty clear that the story
> we're telling is not evoking the right concepts.  It is important not
> only to have the right answer, but also to have the story that helps
> people understand why its right, so let's take another whack at that.
>
> (The irony here is that it only makes a difference at one point -- null
> -- which is the least interesting part of the story.  So it is much
> sound and fury that signifies nothing.)
>
> None of what we say below represents a change in the language; it is
> just test-driving a new way to tell the same story.  (From a
> mathematical and specification perspective, it is a much worse story,
> since it is full of accidental detail about nulls (among other sins),
> but it might make some people happier, and maybe we can learn something
> from the reaction.)
>
> The key change here is to say that every pattern-matching construct
> needs special rules for null.  This is already true, so we're just
> doubling down on it.  None of this changes anything about how it works.
>
> #### Instanceof
>
> We define:
>
>      x instanceof P === (x != null) && x matches P
>
> This means that instanceof always says false on null; on non-null, we
> can ask the pattern.  This is consistent with instanceof today.  P will
> just never see a null in this case. We'll define "matches" soon.
>
> #### Switch
>
> We play the same game with switch.  A switch:
>
>      switch (x) {
>          case P1:
>          case P2:
>          ...
>      }
>
> is defined to mean:
>
>   - if x is null
>     - if one of P1..Pn is the special label "null", then we take that
> branch
>     - else we throw NPE
>   - otherwise, we start trying to match against P1..Pn sequentially
>   - if none match (its remainder), throw MatchException
>
> This is consistent with switch today, in that there are no null labels,
> so null always falls into the NPE bucket.  It is also easy to keep track
> of; if the switch does not say "case null", it does not match null.
> None of the patterns P1..Pn will ever see a null.
>
> #### Covering patterns
>
> Let's define what it means for a pattern to cover a type T. This is just
> a new name for totality; we say that an "any" pattern covers all types
> T, and a type pattern `U u` covers T when `T <: U`.  (When we do
> primitive type patterns, there will be more terms in this list, but
> they'll still be type patterns.)  No other pattern is a covering
> pattern.  Covering patterns all have a single binding (`var x`, `String
> s`).
>
> #### Let
>
> This is where it gets ugly.  Let has no opinions about null, but the
> game here is to pretend it does.  So in a let statement:
>
>      let P = e
>
>   - Evaluate e
>   - If e is null
>     - If P is a covering pattern, its binding is bound to null;
>     - else we throws NPE
>   - Otherwise, e is matched to P.  If it does not match (its remainder),
> a MatchException is thrown (or the else clause is taken, if one is
> present.)
>
> #### Nesting
>
> Given a nested pattern R(P), where R is `record R(T t) { }`, this means:
>
>   - Match the target to record R, and note the resulting t
>     - if P is a covering pattern, we put the resulting `t` in the
> binding of P without further matching
>     - Otherwise, we match t to P
>
> In other words: when a "non trivial" (i.e., non covering) pattern P is
> nested in a record pattern, then:
>
>      x instanceof R(P) === x instanceof R(var a) && a instanceof P
>
> and a covering pattern is interpreted as "not really a pattern", and we
> just slam the result of the outer binding into the inner binding.
>
> #### Matching
>
> What we've done now is ensure that no pattern ever actually encounters a
> null, because the enclosing constructs always filter the nulls out.
> This means we can say:
>
>   - var patterns match everything
>   - `T t` is gated by `instanceof T`
>   - R(P) is gated by `instanceof R`
>
>
> ## Is this a better way to tell the story?
>
> This is clearly a much more complex way to tell the story; every
> construct must have a special case for null, and we must be prepared to
> treat simple patterns (`var x`, `String s`) not always as patterns, but
> sometimes as mere declarations whose bindings will be forcibly crammed
> from "outside".
>
> Note too this will not scale to declared total patterns that treat null
> as a valid target value, if we ever want them; it depends on being able
> to "peek inside" the pattern from the outside, and say "oh, you're a
> covering pattern, I can set your binding without asking you."  That
> won't work with declared total patterns, which are opaque and may have
> multiple bindings.  (Saying "this doesn't bother me" is saying "I'm fine
> with a new null gate for total declared patterns.")
>
> It also raises the profile of null in the story, which kind of sucks;
> while it is not new "null gates", it is new stories that is prominently
> parameterized by null.  (Which is a funny outcome for a system
> explicitly designed to *not* create new null gates!)
>
>
> #### Further editorializing
>
> So, having written it out, I dislike it even more than that I thought I
> would when I started writing this mail. Overall this seems like a lot of
> mental gymnastics to avoid acknowledging the reality that pattern
> matching has three parameters, not two -- the pattern, the target, and
> the *static type* of the target.
>
> I think the root cause here is that when we faced the lump-or-split
> decision of "do we reuse instanceof for pattern matching or not" back in
> Java 13, the choice to lump (which was a rational and defensible choice,
> but so was splitting) created an expectation that "no pattern ever
> matches null", because instanceof is the first place people saw
> patterns, and instanceof never matches null.  While this expectation is
> only consistent with the simplest of patterns (and is only problematic
> at null), there seems to be a desire to force "the semantics of matching
> pattern P is whatever instanceof says matches P."
>
> I suspect that had we chosen the split route for instanceof (which would
> not have been as silly as choosing the split route for switch, but still
> lumping seemed better), we would not be facing this situation.  But even
> if we decide, in hindsight, that lumping was a mistake, I do not want to
> compound this mistake by distorting the meaning of pattern matching to
> cater to it.  It might seem harmless from the outside, but it will
> dramatically limit how far we can push pattern matching in the future.
>
>
> ## Tweaking the current story
>
> So, I think the main thing we can control about the story is the
> terminology.  I think part of what people find confusing is the use of
> the term "total", since that's a math-y term, and also it collides with
> "exhaustive", which is similar but not entirely coincident.
>
> I won't try to find new terms here, but I think there are three relevant
> terms needed:
>
>   - "Pattern set P* is exhaustive on type T."  This is a static
> property; it means that the pattern set P* covers all the "reasonable"
> values of T.  Null is only one "potentially unreasonable" value.
>   - "Pattern P matches expression e".  This is a dynamic property.
> - "Pattern P is total on type T".  This is both a static and dynamic
> property; it means the pattern matches all values of T, and that we know
> this statically.
>
> What we've been calling "remainder" is the set of T values that are not
> matched by any of the patterns in a set P* that is exhaustive on T.
> (Exhaustiveness is static; matching is dynamic; the difference is
> remainder.)
>
> I think the name "exhaustive" is pretty good; we've been using the term
> "exhaustiveness checking" for switch for a while.  The problem could be
> "matches", as it has a dynamic sound to it; the problem could also be
> "total".  The above excursion calls totality "covering", but having both
> "exhaustive" and "covering" in the same scheme is likely to confuse
> people, so I won't suggest that.  Perhaps "statically total"?  This has
> the nice implication that matching is statically known to always
> succeed.  This would give us:
>
>   - If P is statically total on T, it is exhaustive on T;
>   - Any patterns (var x) are statically total on all T;
>   - Type patterns `T t` are statically total on `U <: T` (and more, when
> we get to primitive type patterns)
>   - Statically total patterns match null (because they match everything)
>   - instanceof intercepts nulls: x instanceof P === x != null && x matches
> P
>   - switch intercepts nulls: as above
>   - Let does not have any new opinions about null
>   - Nesting unrolls to instanceof only for non-statically-total nested
> patterns
>
> Also I think being more explicit about the switch/instanceof rules will
> help, as this filters out all top-level null reasoning.
>
>