Rehabilitating switch -- a scorecard

Mon May 17 21:36:30 UTC 2021

This is a good time to look at the progress we've made with switch.  
When we started looking at extending switch to support pattern matching 
(four years ago!) we identified a lot of challenges deriving from 
switch's C legacy, some of which is summarized here:

http://cr.openjdk.java.net/~briangoetz/amber/switch-rehab.html

We had two primary driving goals for improving switch: switches as 
expressions, and switches with patterns as labels.  In turn, these 
pushed on a number of other uncomfortable aspects of switch: fall 
through, totality, scoping, and null handling.

Initially, we were unsure we would be able to rehabilitate switch to 
support these new requirements without being forever bogged down by the 
mistakes of the past.  Bit by bit, we have chipped away at the negative 
aspects of switch, while respecting the existing code that depends on 
those aspects.  I think where we've landed is, in many ways, better than 
we could have initially hoped for.

Throughout this exercise, there were periodic calls for "just toss it 
and invent something new" (which we sometimes called "snitch", shorthand 
for "new switch"*), and no shortage of people's attempts to design their 
ideal switch construct.  We resisted this line of attack, because we 
believed having two similar-but-different constructs living side by side 
would be more annoying (and confusing) to users than a rehabilitated, 
albeit more complex, construct.

The first round of improvements came with expression switches. This was 
the easy batch, because it didn't materially change the set of questions 
we could ask with switch, just the form in which we asked the question.  
This brought the following improvements:

  - Switches as expressions.  Many existing switch statements are in 
reality modeling expressions, in a more roundabout and less safe way.  
Expressing it directly is simpler and less error-prone.
  - Checked totality.  The compiler enforces that a switch expression is 
exhaustive (because, expressions must be total). In the case of enum 
switches, a switch that covers all the cases needs no default clause, 
and the compiler inserts an extra case to catch novel values and throw 
(ICCE) on them.  (Eventually the same will be true for switches on 
sealed classes as well.)
  - A fallthrough-free option.  Switches now give us a choice between 
two styles of _switch blocks_, the old willy-nilly style, and the new 
single-consequent (arrow) style.  Switches that choose arrow-style need 
not reason about fallthrough.

Unfortunately, it also brought a new asymmetry; switch expressions must 
be total (and you get enhanced type checking for this), but switch 
statements cannot be.  This is a shame, since the improved type checking 
for totality is one of the best things about the improvements in switch, 
as a switch that is total by virtue of actually covering all the cases 
acts as a tripwire against new enum constants / permitted subtypes being 
added later, rather than papering it over with a catch-all.  We explored 
several ways to explicitly add back totality checking, but this always 
felt like a hack, and requires the programmer to remember to ask for 
this checking.

Our resolution here offers a path to true healing with minimal user 
impact, by (temporarily) carving out the semantic space of old statement 
switches.  A "legacy switch" is a statement switch on a numeric 
primitive or its box, enum, or string, and which contains no pattern 
labels (i.e., a statement switch that is valid today.)  Like expression 
switches, we will require non-legacy statement switches to be 
exhaustive, and warn on non-exhaustive legacy switches.  (To make the 
warning go away, just insert a "default: " or "default: break" at the 
bottom of the switch; not painful.)  After some time, we should be able 
to make this warning an error, which again is easy to mitigate with a 
single line.  In the end, all switch constructs will be total and 
type-checked for exhaustiveness, and once done, the notion of "legacy 
switch" can be garbage-collected.

Looking ahead to patterns in switch, we have several legacy 
considerations to navigate:

  - Fallthrough and bindings.  While fallthrough is not inherently 
problematic (though the choice of fallthrough-by-default was 
unfortunate), if a case label introduces a pattern variable, then 
fallthrough to another case (at least one that doesn't introduce the 
same pattern variable with the same type) makes little sense, and such 
fallthrough has been outlawed.
  - Scoping.  The block of a switch is one big scope, rather than each 
case label group being its own scope.  (Again, one might call this a 
historical error, since there's little good that comes from this.)  With 
case labels introducing variable declarations, this could have been a 
big problem, if one case polluted later cases (forcing users to pick 
unique names for each binding in a switch statement), but flow scopoing 
solves that one.
  - Nulls.  In Java 1.0, switching over reference types was not 
permitted, so we didn't have to worry about this.  In Java 5, autoboxing 
and enums meant we could switch over some reference types, but for all 
of these, null was a "silly" value so we didn't care about NPEing on 
null.  In Java 7, when we added string switch, we could have conceivably 
allowed `case null`, but instead chose to follow the precedent set by 
Java 5.  But once we introduce switches over any type, with richer 
patterns, eagerly NPEing on null becomes much more problematic.  We've 
navigated this by say that switches can NPE on null if they have no 
nullable cases; nullable cases are those that explicitly say "null", and 
total patterns (which always come last since they dominate all others.)  
The old rule of "switches throw on null" becomes "switches throw on 
null, except when they say 'case null' or the bottom case is total."  
Default continues to mean what it always did -- "anything not already 
matched, except null."

The new treatment of null actually would have fallen out of the 
decisions on totality, had we not gotten there already via another 
path.  Our notion of totality accounts for "remainder", which includes 
things like novel subclasses of sealed types that did not exist at 
compile time, which it would not be reasonable to ask users to write 
code to deal with, and null fits into this treatment as well.  We type 
check that a switch is sufficiently total, and then insert extra code to 
catch "silly" values that are not otherwise handled, including null, and 
throw.  (This also enables DA analysis to truly trust switch totality.)

Where we land is a single unified switch construct that can be either a 
statement or an expression; that can use either old-style flow (colon) 
or the more constrained flow style (arrow); whose case labels can be 
constant, patterns (including guarded patterns), or a mix of the two; 
which can accept the legacy null-hostility behavior, or can override it 
by explicitly using nullable case labels; and which are almost always 
type checked for totality (with some temporary, legacy exceptions.) 
Fallthough is basically unchanged; you can get fallthrough when using 
the old-style flow, but becomes less important as fallthrough is 
(mostly) nonsensical in the presence of pattern cases with bindings, and 
the compiler prevents this misuse.  The distinction between "legacy" 
switches and pattern switches is temporary, with a path to getting to 
"all switches are total" over time.

I think we've done a remarkable job at rehabilitating this monster.

*Someone actually suggested using the syntax "new switch", on the basis 
that new was already a keyword.  Would not have aged well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20210517/a13b16bd/attachment-0001.htm>