[switch] Further unification on switch
Brian Goetz
brian.goetz at oracle.com
Thu Apr 19 20:44:45 UTC 2018
We've been reviewing the work to date on switch expressions. Here's
where we are, and here's a possible place we might move to, which I like
a lot better than where we are now.
## Goals
As a reminder, remember that the primary goal here is _not_ switch
expressions; switch expressions are supposed to just be an
uncontroversial waypoint on the way to the real goal, which is a more
expressive and flexible switch construct that works in a wider variety
of situations, including supporting patterns, being less hostile to
null, use as either an expression or a statement, etc.
And the reason we think that improving switch is the right primary goal
is because a "do one of these based on ..." construct is _better_ than
the corresponding chain of if-else-if, for multiple reasons:
- Possibility for the compiler to do exhaustiveness analysis,
potentially finding more bugs;
- Possibility for more efficient dispatch -- a switch could be O(1),
whereas an if-else chain is almost certainly O(n);
- More semantically transparent -- it's obvious the user is saying "do
one of these, based on ...";
- Eliminates the need to repeat (and possibly get wrong) the switch
target.
Switch does come with a lot of baggage (fallthrough by default,
questionable scoping, need to explicitly break), and this baggage has
produced the predictable distractions in the discussion -- a desire that
we subordinate the primary goal (making switch more expressive) to the
more contingent goal of "fixing" the legacy problems of switch.
These legacy problems of switch may be unfortunate, but to whatever
degree we end up ameliorating these, this has to be purely a
side-benefit -- it's not the primarily goal, no matter how annoying
people find them. (The desire to "fix" the mistakes of the past is
frequently a siren song, which is why we don't allow ourselves to take
these as first-class requirements.)
#### What we're not going to do
The worst possible outcome (which is also the most commonly suggested
"solution" in forums like reddit) would be to invent a new construct
that is similar to, but not quite the same as switch (`snitch`), without
being a 100% replacement for today's quirky switch. Today's switch is
surely suboptimal, but it's not so fatally flawed that it needs to be
euthanized, and we don't want to create an "undead" language construct
forever, which everyone will still have to learn, and keep track of the
differences between `switch` and `snitch`. No thank you.
That means we extend the existing switch statement, and increase
flexibility by supporting an expression form, and to the degree needed,
embrace its quirks. ("No statement left behind.")
#### Where we started
In the first five minutes of working on this project, we sketched out
the following (call it the "napkin sketch"), where an expression switch
has case arms of the form:
case L -> e;
or
case L -> { statement*; break e; }
This was enough to get started, but of course the devil is in the details.
#### Where we are right now
We moved away from the napkin sketch for a few reasons, in part because
it seemed to be drawing us down the road towards switch and snitch --
which was further worrying as we still had yet to deal with the
potential that pattern switch and constant switch might have differences
as well. We want a unified model of switch that deals well enough with
all the cases -- expressions and statements, patterns and constants.
Our current model (call this Unification Attempt #1, or UA1 for short)
is a step towards a unified model of switch, and this is a huge step
forward. In this model, there's one switch construct, and there's one
set of control flow rules, including for break (like return, break takes
a value in a value context and is void in a void context).
For convenience and safety, we then layered a shorthand atop
value-bearing switches, which is to interpret
case L -> e;
as
case L: break e;
expecting the shorter form would be used almost all the time. (This has
a pleasing symmetry with the expression form of lambdas, and (at least
for expression switches) alleviates two of the legacy pain points.
Switch expressions have other things in common with lambdas too; they
are the only ones that can have statements; they are the only ones that
interact with nonlocal control flow.)
This approach offers a lot of flexibility (some would say too much).
You can write "remi-style" expression switches:
int x = switch (y) {
case 1: break 2;
case 2: break 4;
default: break 8;
};
or you can write "new-style" expression switches:
int x = switch (y) {
case 1 -> 2;
case 2-> 4;
default-> 8;
};
Some people like the transparency of the first; others like the
compactness and fallthrough-safety of the second. And in cases where
you mostly want the benefits of the second, but the real world conspires
to make one or two cases difficult, you can mix them, and take full
advantage of what "old switch" does -- with no new rules for control flow.
#### Complaints
There were the usual array of complaints over syntax -- many of which
can be put down to "bleah, new is different, different is bad", but the
most prominent one seems to be a generalized concern that other users
(never us, of course, but we always fear for what others might do) won't
be able to "handle" the power of mixed switches and will write terrible
code, and then the world will burn. (And, because the mixing comes with
fallthrough, it further engenders the "you idiots, you fixed the wrong
thing" reactions.) Personally, I think the fear of mixing is deeply
overblown -- I think in most cases people will gravitate towards one of
the two clean styles, and only mix where the complexity of the real
world forces them to, but there's value in understanding the
underpinnings of such reactions, even if in the end they'd turn out to
be much hot air about nothing.
#### A real issue with mixing!
But, there is a real problem with our approach, which is: while a
unified switch is the right goal, UA1 is not unified _enough_.
Specifically, we haven't fully aligned the statement forms, and this
conspires to reduce expressiveness and safety. That is, in an
expression switch you can say:
case L -> e;
but in a statement switch you can't say
case L -> s;
The reason for this is a purely accidental one: if we allowed this, then
we _would_ likely find ourselves in the mixing hell that people are
afraid of, which in turn would make the risk of accidental fallthrough
_even worse_ than it is today. So the failing of mixing is not that it
will be abused, but that it constrains us from actually getting to a
unified construct.
## Closing the gap
So, let's take one more step towards unifying the two forms (call this
UA2), rather than a step away from it. Let's say that _all_ switches
can support either old-style (colon) or new-style (arrow) case labels --
but must stick to one kind of case label in a given switch:
// statement switch
switch (x) {
case 1: println("one"); break;
case 2: println("two"); break;
}
or
// also statement switch
switch (x) {
case 1 -> println("one");
case 2 -> println("two");
}
If a switch is a statement, the RHS is a statement, which can be a block
statement:
case L -> { a; b; }
We get there by first taking a step backwards, at least in terms of
superficial syntax, to the syntax suggested by the napkin sketch, where
if a switch is an expression, the RHS of an -> case is an expression or
a block statement (in the latter case, it must complete abruptly by
reason of either break-value or throw). Just as we expected "break
value" to be rare in expression switches under UA1 since developers will
generally prefer the shorthand form where applicable, we expect it to be
equally rare under UA2.
Then, as in UA1, we render unto expressions the things that belong to
expressions; they must be total (an expression must yield a value or
complete abruptly by reason of throwing.)
#### Look, accidental benefits!
Many of switches failings (fallthrough, scoping) are not directly
specified features, as much as emergent properties of the structure and
control flow of switches. Since by definition you can't fall out of a
arrow case, then an all-arrow switch gives the fallthrough-haters what
they want "for free", with no need to treat it specially. In fact, its
even better; in the all-arrow form, all of the things people hate about
switch -- the need to say break, the risk of fallthrough, and the
questionable scoping -- all go away.
#### Scorecard
There is one switch construct, which can be use as either an expression
or a statement; when used as an expression, it acquires the
characteristics of expressions (must be total, no nonlocal control flow
out.) Each can be expressed in one of two syntactic forms (arrow and
colon.) All forms will support patterns, null handling, and multiple
labels per case. The control flow and scoping rules are driven by
structural properties of the chosen form.
The (statement, colon) case is the switch we have since Java 1.0,
enhanced as above (patterns, nulls, etc.)
The (statement, arrow) case can be considered a nice syntactic shorthand
for the previous, which obviates the annoyance of "break", implicitly
prevents fallthrough of all forms, and avoids the confusion of current
switch scoping. Many existing statement switches that are not
expressions in disguise can be refactored to this.
The (expression, colon) form is a subset of UA1, where you just never
say "arrow".
The (expression, arrow) case can again be considered a nice shorthand
for the previous, again a subset of UA1, where you just never say
"colon", and as a result, again don't have to think about fallthrough.
Totality is a property of expression switches, regardless of form,
because they are expressions, and expressions must be total.
Fallthrough is a property of the colon-structured switches; there are no
changes here.
Nonlocal control flow _out_ of a switch (continue to an enclosing loop,
break with label, return) are properties of statement switches.
So essentially, rather than dividing the semantics along
expression/statement lines, and then attempting to opportunistically
heap a bunch of irrelevant features like "no fallthrough" onto the
expression side "because they're cool" even though they have nothing to
do with expression-ness, we instead divide the world structurally: the
colon form gives you the old control flow, and the arrow form gives you
the new. And either can be used as a statement, or an expression. And
no one will be confused by mixing.
Orthogonality FTW. No statement gets left behind.
## Explaining it
Relative to UA1, we could describe this as adding back the blocks (its
not really a block expression) from the napkin model, supporting an
arrow form of statement switches with blocks too, and then restricting
switches to all-arrow or all-colon. Then each quadrant is a restriction
of this model. But that's not how we'd teach it.
Relative to Java 10, we'd probably say:
- Switch statements now come in a simpler (arrow) flavor, where there
is no fallthrough, no weird scoping, and no need to say break most of
the time. Many switches can be rewritten this way, and this form can
even be taught first.
- Switches can be used as either expressions or statements, with
essentially identical syntax (some grammar differences, but this is
mostly interesting only to spec writers). If a switch is an expression,
it should contain expressions; if a switch is a statement, it should
contain statements.
- Expression switches have additional restrictions that are derived
exclusively from their expression-ness: totality, can only complete
abruptly if by reason of throw.
- We allow a break-with-value statement in an expression switch as a
means of explicitly providing the switch result; this can be combined
with a statement block to allow for statements+break-expression.
The result is one switch construct, with modern and legacy flavors,
which supports either expressions or statements. You can immediately
look at the middle of a switch and tell (by arrow vs colon) whether it
has the legacy control flow or not.
More information about the amber-spec-observers
mailing list