Remainder in pattern matching

Wed Mar 30 14:40:28 UTC 2022

We should have wrapped this up a while ago, so I apologize for the late 
notice, but we really have to wrap up exceptions thrown from pattern 
contexts (today, switch) when an exhaustive context encounters a 
remainder.  I think there's really one one sane choice, and the only 
thing to discuss is the spelling, but let's go through it.

In the beginning, nulls were special in switch.  The first thing is to 
evaluate the switch operand; if it is null, switch threw NPE.  (I don't 
think this was motivated by any overt null hostility, at least not at 
first; it came from unboxing, where we said "if its a box, unbox it", 
and the unboxing throws NPE, and the same treatment was later added to 
enums (though that came out in the same version) and strings.)

We have since refined switch so that some switches accept null. But for 
those that don't, I see no other move besides "if the operand is null 
and there is no null handling case, throw NPE." Null will always be a 
special remainder value (when it appears in the remainder.)

In Java 12, when we did switch expressions, we had to confront the issue 
of novel enum constants.  We considered a number of alternatives, and 
came up with throwing ICCE.  This was a reasonable choice, though as it 
turns out is not one that scales as well as we had hoped it would at the 
time.  The choice here is based on "the view of classfiles at compile 
time and run time has shifted in an incompatible way."  ICCE is, as 
Kevin pointed out, a reliable signal that your classpath is borked.

We now have two precedents from which to extrapolate, but as it turns 
out, neither is really very good for the general remainder case.

Recall that we have a definition of _exhaustiveness_, which is, at some 
level, deliberately not exhaustive.  We know that there are edge cases 
for which it is counterproductive to insist that the user explicitly 
cover, often for two reasons: one is that its annoying to the user 
(writing cases for things they believe should never happen), and the 
other that it undermines type checking (the most common way to do this 
is a default clause, which can sweep other errors under the rug.)

If we have an exhaustive set of patterns on a type, the set of possible 
values for that type that are not covered by some pattern in the set is 
called the _remainder_.  Computing the remainder exactly is hard, but 
computing an upper bound on the remainder is pretty easy.  I'll say "x 
may be in the remainder of P* on T" to indicate that we're defining the 
upper bound.

  - If P* contains a deconstruction pattern P(Q*), null may be in the 
remainder of P*.
  - If T is sealed, instances of a novel subtype of T may be in the 
remainder of P*.
  - If T is an enum, novel enum constants of T may be in the remainder 
of P*.
  - If R(X x, Y y) is a record, and x is in the remainder of Q* on X, 
then `R(x, any)` may be in the remainder of { R(q) : q in Q*} on R.

Examples:

     sealed interface X permits X1, X2 { }
     record X1(String s) implements X { }
     record X2(String s) implements X { }

     record R(X x1, X x2) { }

     switch (r) {
          case R(X1(String s), any):
          case R(X2(String s), X1(String s)):
          case R(X2(String s), X2(String s)):
     }

This switch is exhaustive.  Let N be a novel subtype of X.  So the 
remainder includes:

     null, R(N, _), R(_, N), R(null, _), R(X2, null)

It might be tempting to argue (in fact, someone has) that we should try 
to pick a "root cause" (null or novel) and throw that.  But I think this 
is both excessive and unworkable.

Excessive: This means that the compiler would have to enumerate the 
remainder set (its a set of patterns, so this is doable) and insert an 
extra synthetic clause for each.  This is a lot of code footprint and 
complexity for a questionable benefit, and the sort of place where bugs 
hide.

Unworkable: Ultimately such code will have to make an arbitrary choice, 
because R(N, null) and R(null, N) are in the remainder set.  So which is 
the root cause?  Null or novel?  We'd have to make an arbitrary choice.

So what I propose is the following simple answer instead:

  - If the switch target is null and no case handles null, throw NPE.  
(We know statically whether any case handles null, so this is easy and 
similar to what we do today.)
  - If the switch is an exhaustive enum switch, and no case handles the 
target, throw ICCE.  (Again, we know statically whether the switch is 
over an enum type.)
  - In any other case of an exhaustive switch for which no case handles 
the target, we throw a new exception type, java.lang.MatchException, 
with an error message indicating remainder.

The first two rules are basically dictated by compatibility.  In 
hindsight, we might have not chosen ICCE in 12, and gone with the 
general (third) rule instead, but that's water under the bridge.

We need to wrap this up in the next few days, so if you've concerns 
here, please get them on the record ASAP.

As a separate but not-separate exception problem, we have to deal with 
at least two additional sources of exceptions:

  - A dtor / record acessor may throw an arbitrary exception in the 
course of evaluating whether a case matches.

  - User code in the switch may throw an arbitrary exception.

For the latter, this has always been handled by having the switch 
terminate abruptly with the same exception, and we should continue to do 
this.

For the former, we surely do not want to swallow this exception (such an 
exception indicates a bug). The choices here are to treat this the same 
way we do with user code, throwing it out of the switch, or to wrap with 
MatchException.

I prefer the latter -- wrapping with MatchException -- because the 
exception is thrown from synthetic code between the user code and the 
ultimate thrower, which means the pattern matching feature is mediating 
access to the thrower.  I think we should handle this as "if a pattern 
invoked from pattern matching completes abruptly by throwing X, pattern 
matching completes abruptly with MatchException", because the specific X 
is not a detail we want the user to bind to.  (We don't want them to 
bind to anything, but if they do, we want them to bind to the logical 
action, not the implementation details.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220330/a2d32992/attachment-0001.htm>