Compile-time type hierarchy information in pattern switch
Brian Goetz
brian.goetz at oracle.com
Tue Apr 3 16:36:43 UTC 2018
Along the lines of the previous discussion about separate compilation
skew with enums ... I'm trying to find the right place to draw the line
with respect to post-compilation class hierarchy changes.
Recall that we can impose a _dominance ordering_ on patterns; pattern P
dominates Q if everything that is matched by Q also is matched by P.
We already use this today, in catch blocks, to reject programs with dead
code; you can't say `catch Exception` before `catch IOException`,
because the latter block would be dead. We want to do the same with
patterns, so:
case String x: ...
case Object x: ...
is OK but
case Object x: ...
case String x: ...
is rejected at compile time.
Separately, we'd like for pattern matching to be efficient; the
definition of "inefficient" would be for pattern matching to be
inherently O(n), when we can frequently do much better. There's plenty
of literature on compiling patterns to decision trees, but none of them
address the problem we have to: separate compilation. So any decision
tree computed at compile time might be wrong in undesirable ways by
runtime. We could also compute a decision tree at runtime using indy;
while this is our intent, the devil is in the details. We don't want
computing the tree to be too expensive, nor do we want to have to
capture O(n^2) compile-time constraints to be validated at runtime. So
I'd like to focus on what changes we're willing to accept between
compilation and runtime, what our expectations would be for those changes.
We've already discussed one of these: novel values in enum / sealed type
switches, and for them, the answer is throwing some sort of exception.
Another that we dealt with long ago is changing enum ordinals; we
decided at the time that we're willing for this to be a BC change, so we
generate extra code that uses the as-runtime ordinals rather than the
as-compile-time ordinals when lowering the switch into an integer
switch. (If we weren't willing to tolerate such changes, we'd have a
simpler translation: just lower an enum switch to a switch on its
ordinal.)
Here's one that I suspect we're not expecting to recover terribly well
from: hierarchy inversion. Suppose at compile time A <: B. So the
following is a sensible switch body:
case String: println("String"); break;
case Object: println("Object"); break;
Now, imagine that by runtime, String no longer extends Object, but
instead Object absurdly extends String. Do we still expect the above to
print String for all Strings, and Object for everything else? Or is the
latter arm now dead at runtime, even though it wouldn't compile after
the change? Or is this now UB, because it would no longer compile?
A more realistic example of a hierarchy change is introducing an
interface. If we have:
interface I { }
class C { }
and a switch
case I: ...
case C: ...
and later, we make C implement I, we have a similar situation; the
switch would no longer compile. Are we allowed to make optimizations
based on the compile-time knowledge that C <! I?
As an example, suppose A, B, C, ... Z are final classes, and I is an
interface implemented by none of them. Then I can dispatch:
case A: ...
case B: ...
...
case I: ...
...
case Z: ...
case Object: ...
in two type operations; hash the class of the target and look it up in a
table containing A...Z, and then do a test against I. However, if I'm
required to deal with the case where some of A..Z are retrofitted to
implement I after compile time, and I'm expected to process the switch
in order based on how it is written, then I have to fall back to O(1)
type operations at runtime, or, I have to do as many as O(n^2) type
comparisons at link time. These are steep cliffs to fall off of.
(Mandating throwing an exception at link time is also expensive.)
Today, all switch cases are totally unordered, so we're free to execute
them in O(1) time. I'd like for that to continue to be the case, even
as we add more complex switches.
So, let's have a conversation about expectations for what we should do
for a switch at runtime that would no longer compile due to
post-compilation hierarchy changes (new supertypes, hierarchy
inversions, removed supertypes, final <--> nonfinal, etc.)
More information about the amber-spec-experts
mailing list