[switch] Further unification on switch

Thu Apr 19 20:44:45 UTC 2018

We've been reviewing the work to date on switch expressions. Here's 
where we are, and here's a possible place we might move to, which I like 
a lot better than where we are now.

## Goals

As a reminder, remember that the primary goal here is _not_ switch 
expressions; switch expressions are supposed to just be an 
uncontroversial waypoint on the way to the real goal, which is a more 
expressive and flexible switch construct that works in a wider variety 
of situations, including supporting patterns, being less hostile to 
null, use as either an expression or a statement, etc.

And the reason we think that improving switch is the right primary goal 
is because a "do one of these based on ..." construct is _better_ than 
the corresponding chain of if-else-if, for multiple reasons:

  - Possibility for the compiler to do exhaustiveness analysis, 
potentially finding more bugs;
  - Possibility for more efficient dispatch -- a switch could be O(1), 
whereas an if-else chain is almost certainly O(n);
  - More semantically transparent -- it's obvious the user is saying "do 
one of these, based on ...";
  - Eliminates the need to repeat (and possibly get wrong) the switch 
target.

Switch does come with a lot of baggage (fallthrough by default, 
questionable scoping, need to explicitly break), and this baggage has 
produced the predictable distractions in the discussion -- a desire that 
we subordinate the primary goal (making switch more expressive) to the 
more contingent goal of "fixing" the legacy problems of switch.

These legacy problems of switch may be unfortunate, but to whatever 
degree we end up ameliorating these, this has to be purely a 
side-benefit -- it's not the primarily goal, no matter how annoying 
people find them.  (The desire to "fix" the mistakes of the past is 
frequently a siren song, which is why we don't allow ourselves to take 
these as first-class requirements.)

#### What we're not going to do

The worst possible outcome (which is also the most commonly suggested 
"solution" in forums like reddit) would be to invent a new construct 
that is similar to, but not quite the same as switch (`snitch`), without 
being a 100% replacement for today's quirky switch.  Today's switch is 
surely suboptimal, but it's not so fatally flawed that it needs to be 
euthanized, and we don't want to create an "undead" language construct 
forever, which everyone will still have to learn, and keep track of the 
differences between `switch` and `snitch`.  No thank you.

That means we extend the existing switch statement, and increase 
flexibility by supporting an expression form, and to the degree needed, 
embrace its quirks.  ("No statement left behind.")

#### Where we started

In the first five minutes of working on this project, we sketched out 
the following (call it the "napkin sketch"), where an expression switch 
has case arms of the form:

    case L -> e;
or
    case L -> { statement*; break e; }

This was enough to get started, but of course the devil is in the details.

#### Where we are right now

We moved away from the napkin sketch for a few reasons, in part because 
it seemed to be drawing us down the road towards switch and snitch -- 
which was further worrying as we still had yet to deal with the 
potential that pattern switch and constant switch might have differences 
as well.  We want a unified model of switch that deals well enough with 
all the cases -- expressions and statements, patterns and constants.

Our current model (call this Unification Attempt #1, or UA1 for short) 
is a step towards a unified model of switch, and this is a huge step 
forward.  In this model, there's one switch construct, and there's one 
set of control flow rules, including for break (like return, break takes 
a value in a value context and is void in a void context).

For convenience and safety, we then layered a shorthand atop 
value-bearing switches, which is to interpret

     case L -> e;

as

     case L: break e;

expecting the shorter form would be used almost all the time.  (This has 
a pleasing symmetry with the expression form of lambdas, and (at least 
for expression switches) alleviates two of the legacy pain points.  
Switch expressions have other things in common with lambdas too; they 
are the only ones that can have statements; they are the only ones that 
interact with nonlocal control flow.)

This approach offers a lot of flexibility (some would say too much).  
You can write "remi-style" expression switches:

     int x = switch (y) {
         case 1: break 2;
         case 2: break 4;
         default: break 8;
     };

or you can write "new-style" expression switches:

     int x = switch (y) {
         case 1 -> 2;
         case 2-> 4;
         default-> 8;
     };

Some people like the transparency of the first; others like the 
compactness and fallthrough-safety of the second.  And in cases where 
you mostly want the benefits of the second, but the real world conspires 
to make one or two cases difficult, you can mix them, and take full 
advantage of what "old switch" does -- with no new rules for control flow.

#### Complaints

There were the usual array of complaints over syntax -- many of which 
can be put down to "bleah, new is different, different is bad", but the 
most prominent one seems to be a generalized concern that other users 
(never us, of course, but we always fear for what others might do) won't 
be able to "handle" the power of mixed switches and will write terrible 
code, and then the world will burn.  (And, because the mixing comes with 
fallthrough, it further engenders the "you idiots, you fixed the wrong 
thing" reactions.) Personally, I think the fear of mixing is deeply 
overblown -- I think in most cases people will gravitate towards one of 
the two clean styles, and only mix where the complexity of the real 
world forces them to, but there's value in understanding the 
underpinnings of such reactions, even if in the end they'd turn out to 
be much hot air about nothing.

#### A real issue with mixing!

But, there is a real problem with our approach, which is: while a 
unified switch is the right goal, UA1 is not unified _enough_. 
Specifically, we haven't fully aligned the statement forms, and this 
conspires to reduce expressiveness and safety.  That is, in an 
expression switch you can say:

     case L -> e;

but in a statement switch you can't say

     case L -> s;

The reason for this is a purely accidental one: if we allowed this, then 
we _would_ likely find ourselves in the mixing hell that people are 
afraid of, which in turn would make the risk of accidental fallthrough 
_even worse_ than it is today.  So the failing of mixing is not that it 
will be abused, but that it constrains us from actually getting to a 
unified construct.

## Closing the gap

So, let's take one more step towards unifying the two forms (call this 
UA2), rather than a step away from it.  Let's say that _all_ switches 
can support either old-style (colon) or new-style (arrow) case labels -- 
but must stick to one kind of case label in a given switch:

     // statement switch
     switch (x) {
         case 1: println("one"); break;
         case 2: println("two"); break;
     }

or

     // also statement switch
     switch (x) {
         case 1 -> println("one");
         case 2 -> println("two");
     }

If a switch is a statement, the RHS is a statement, which can be a block 
statement:

     case L -> { a; b; }

We get there by first taking a step backwards, at least in terms of 
superficial syntax, to the syntax suggested by the napkin sketch, where 
if a switch is an expression, the RHS of an -> case is an expression or 
a block statement (in the latter case, it must complete abruptly by 
reason of either break-value or throw).  Just as we expected "break 
value" to be rare in expression switches under UA1 since developers will 
generally prefer the shorthand form where applicable, we expect it to be 
equally rare under UA2.

Then, as in UA1, we render unto expressions the things that belong to 
expressions; they must be total (an expression must yield a value or 
complete abruptly by reason of throwing.)

#### Look, accidental benefits!

Many of switches failings (fallthrough, scoping) are not directly 
specified features, as much as emergent properties of the structure and 
control flow of switches.  Since by definition you can't fall out of a 
arrow case, then an all-arrow switch gives the fallthrough-haters what 
they want "for free", with no need to treat it specially. In fact, its 
even better; in the all-arrow form, all of the things people hate about 
switch -- the need to say break, the risk of fallthrough, and the 
questionable scoping -- all go away.

#### Scorecard

There is one switch construct, which can be use as either an expression 
or a statement; when used as an expression, it acquires the 
characteristics of expressions (must be total, no nonlocal control flow 
out.)  Each can be expressed in one of two syntactic forms (arrow and 
colon.)  All forms will support patterns, null handling, and multiple 
labels per case.  The control flow and scoping rules are driven by 
structural properties of the chosen form.

The (statement, colon) case is the switch we have since Java 1.0, 
enhanced as above (patterns, nulls, etc.)

The (statement, arrow) case can be considered a nice syntactic shorthand 
for the previous, which obviates the annoyance of "break", implicitly 
prevents fallthrough of all forms, and avoids the confusion of current 
switch scoping.  Many existing statement switches that are not 
expressions in disguise can be refactored to this.

The (expression, colon) form is a subset of UA1, where you just never 
say "arrow".

The (expression, arrow) case can again be considered a nice shorthand 
for the previous, again a subset of UA1, where you just never say 
"colon", and as a result, again don't have to think about fallthrough.

Totality is a property of expression switches, regardless of form, 
because they are expressions, and expressions must be total.

Fallthrough is a property of the colon-structured switches; there are no 
changes here.

Nonlocal control flow _out_ of a switch (continue to an enclosing loop, 
break with label, return) are properties of statement switches.

So essentially, rather than dividing the semantics along 
expression/statement lines, and then attempting to opportunistically 
heap a bunch of irrelevant features like "no fallthrough" onto the 
expression side "because they're cool" even though they have nothing to 
do with expression-ness, we instead divide the world structurally: the 
colon form gives you the old control flow, and the arrow form gives you 
the new.  And either can be used as a statement, or an expression.  And 
no one will be confused by mixing.

Orthogonality FTW.  No statement gets left behind.

## Explaining it

Relative to UA1, we could describe this as adding back the blocks (its 
not really a block expression) from the napkin model, supporting an 
arrow form of statement switches with blocks too, and then restricting 
switches to all-arrow or all-colon.  Then each quadrant is a restriction 
of this model.  But that's not how we'd teach it.

Relative to Java 10, we'd probably say:

  - Switch statements now come in a simpler (arrow) flavor, where there 
is no fallthrough, no weird scoping, and no need to say break most of 
the time.  Many switches can be rewritten this way, and this form can 
even be taught first.
  - Switches can be used as either expressions or statements, with 
essentially identical syntax (some grammar differences, but this is 
mostly interesting only to spec writers).  If a switch is an expression, 
it should contain expressions; if a switch is a statement, it should 
contain statements.
  - Expression switches have additional restrictions that are derived 
exclusively from their expression-ness: totality, can only complete 
abruptly if by reason of throw.
  - We allow a break-with-value statement in an expression switch as a 
means of explicitly providing the switch result; this can be combined 
with a statement block to allow for statements+break-expression.

The result is one switch construct, with modern and legacy flavors, 
which supports either expressions or statements.  You can immediately 
look at the middle of a switch and tell (by arrow vs colon) whether it 
has the legacy control flow or not.