Remainder in pattern matching
Remi Forax
forax at univ-mlv.fr
Wed Mar 30 16:08:02 UTC 2022
> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Sent: Wednesday, March 30, 2022 4:40:28 PM
> Subject: Remainder in pattern matching
> We should have wrapped this up a while ago, so I apologize for the late notice,
> but we really have to wrap up exceptions thrown from pattern contexts (today,
> switch) when an exhaustive context encounters a remainder. I think there's
> really one one sane choice, and the only thing to discuss is the spelling, but
> let's go through it.
> In the beginning, nulls were special in switch. The first thing is to evaluate
> the switch operand; if it is null, switch threw NPE. (I don't think this was
> motivated by any overt null hostility, at least not at first; it came from
> unboxing, where we said "if its a box, unbox it", and the unboxing throws NPE,
> and the same treatment was later added to enums (though that came out in the
> same version) and strings.)
> We have since refined switch so that some switches accept null. But for those
> that don't, I see no other move besides "if the operand is null and there is no
> null handling case, throw NPE." Null will always be a special remainder value
> (when it appears in the remainder.)
> In Java 12, when we did switch expressions, we had to confront the issue of
> novel enum constants. We considered a number of alternatives, and came up with
> throwing ICCE. This was a reasonable choice, though as it turns out is not one
> that scales as well as we had hoped it would at the time. The choice here is
> based on "the view of classfiles at compile time and run time has shifted in an
> incompatible way." ICCE is, as Kevin pointed out, a reliable signal that your
> classpath is borked.
> We now have two precedents from which to extrapolate, but as it turns out,
> neither is really very good for the general remainder case.
> Recall that we have a definition of _exhaustiveness_, which is, at some level,
> deliberately not exhaustive. We know that there are edge cases for which it is
> counterproductive to insist that the user explicitly cover, often for two
> reasons: one is that its annoying to the user (writing cases for things they
> believe should never happen), and the other that it undermines type checking
> (the most common way to do this is a default clause, which can sweep other
> errors under the rug.)
> If we have an exhaustive set of patterns on a type, the set of possible values
> for that type that are not covered by some pattern in the set is called the
> _remainder_. Computing the remainder exactly is hard, but computing an upper
> bound on the remainder is pretty easy. I'll say "x may be in the remainder of
> P* on T" to indicate that we're defining the upper bound.
> - If P* contains a deconstruction pattern P(Q*), null may be in the remainder of
> P*.
> - If T is sealed, instances of a novel subtype of T may be in the remainder of
> P*.
> - If T is an enum, novel enum constants of T may be in the remainder of P*.
> - If R(X x, Y y) is a record, and x is in the remainder of Q* on X, then `R(x,
> any)` may be in the remainder of { R(q) : q in Q*} on R.
> Examples:
> sealed interface X permits X1, X2 { }
> record X1(String s) implements X { }
> record X2(String s) implements X { }
> record R(X x1, X x2) { }
> switch (r) {
> case R(X1(String s), any):
> case R(X2(String s), X1(String s)):
> case R(X2(String s), X2(String s)):
> }
> This switch is exhaustive. Let N be a novel subtype of X. So the remainder
> includes:
> null, R(N, _), R(_, N), R(null, _), R(X2, null)
> It might be tempting to argue (in fact, someone has) that we should try to pick
> a "root cause" (null or novel) and throw that. But I think this is both
> excessive and unworkable.
[...] see below
> So what I propose is the following simple answer instead:
> - If the switch target is null and no case handles null, throw NPE. (We know
> statically whether any case handles null, so this is easy and similar to what
> we do today.)
> - If the switch is an exhaustive enum switch, and no case handles the target,
> throw ICCE. (Again, we know statically whether the switch is over an enum
> type.)
> - In any other case of an exhaustive switch for which no case handles the
> target, we throw a new exception type, java.lang.MatchException, with an error
> message indicating remainder.
I agree for the first rule, if null is not handled, let throw a NPE.
For when the static world and the dynamic world disagree, i think your analysis has miss an important question, switching on an enum throw an ICCE very late when we discover an unknown value, but in the case of a sealed type,
we can decide to reject the switch much sooner.
There is a spectrum of choices here, where to throw an ICCE, it can be
- when the class is verified (when the method is verified for OpenJ9)
- the first time the switch is reached (with an indy that validates once that the different sealed types "permits" set have not changed).
- when we have exhausted all branches associated with a sealed type
About excessive and unworkable,
> Excessive: This means that the compiler would have to enumerate the remainder
> set (its a set of patterns, so this is doable) and insert an extra synthetic
> clause for each. This is a lot of code footprint and complexity for a
> questionable benefit, and the sort of place where bugs hide.
Remainders are dangling else in a cascade of if ... else, so yes, we have to care of them.
So yes, it may a lot of bytecodes if we choose to add all branches but the benefit is not questionable, it's far better than the alternative which is GoodLuckFigureByYourselfException.
> Unworkable: Ultimately such code will have to make an arbitrary choice, because
> R(N, null) and R(null, N) are in the remainder set. So which is the root cause?
> Null or novel? We'd have to make an arbitrary choice.
Yes !
We have to make that choice, but it's not arbitrary, it's where we want to put the cursor, it's a trade off between supporting case when we are using a sealed type but not using that part of the pattern matching and an ICCE being thrown earlier.
Also, the rules you propose make the addition of patterns over enum values harder in the future.
Rémi
More information about the amber-spec-observers
mailing list