[switch] Further unification on switch

Thu Apr 19 21:31:42 UTC 2018

I was starting to get fatalistically pessimistic about switch,
but the all-colon-as-statement vs all-arrow-as-expression
idea (with nothing in-between) seems pretty good!
And would be even better if JLS impact were carefully checked.

-Doug

On 04/19/2018 04:44 PM, Brian Goetz wrote:
> We've been reviewing the work to date on switch expressions. Here's
> where we are, and here's a possible place we might move to, which I like
> a lot better than where we are now.
> 
> ## Goals
> 
> As a reminder, remember that the primary goal here is _not_ switch
> expressions; switch expressions are supposed to just be an
> uncontroversial waypoint on the way to the real goal, which is a more
> expressive and flexible switch construct that works in a wider variety
> of situations, including supporting patterns, being less hostile to
> null, use as either an expression or a statement, etc.
> 
> And the reason we think that improving switch is the right primary goal
> is because a "do one of these based on ..." construct is _better_ than
> the corresponding chain of if-else-if, for multiple reasons:
> 
>  - Possibility for the compiler to do exhaustiveness analysis,
> potentially finding more bugs;
>  - Possibility for more efficient dispatch -- a switch could be O(1),
> whereas an if-else chain is almost certainly O(n);
>  - More semantically transparent -- it's obvious the user is saying "do
> one of these, based on ...";
>  - Eliminates the need to repeat (and possibly get wrong) the switch
> target.
> 
> Switch does come with a lot of baggage (fallthrough by default,
> questionable scoping, need to explicitly break), and this baggage has
> produced the predictable distractions in the discussion -- a desire that
> we subordinate the primary goal (making switch more expressive) to the
> more contingent goal of "fixing" the legacy problems of switch.
> 
> These legacy problems of switch may be unfortunate, but to whatever
> degree we end up ameliorating these, this has to be purely a
> side-benefit -- it's not the primarily goal, no matter how annoying
> people find them.  (The desire to "fix" the mistakes of the past is
> frequently a siren song, which is why we don't allow ourselves to take
> these as first-class requirements.)
> 
> #### What we're not going to do
> 
> The worst possible outcome (which is also the most commonly suggested
> "solution" in forums like reddit) would be to invent a new construct
> that is similar to, but not quite the same as switch (`snitch`), without
> being a 100% replacement for today's quirky switch.  Today's switch is
> surely suboptimal, but it's not so fatally flawed that it needs to be
> euthanized, and we don't want to create an "undead" language construct
> forever, which everyone will still have to learn, and keep track of the
> differences between `switch` and `snitch`.  No thank you.
> 
> That means we extend the existing switch statement, and increase
> flexibility by supporting an expression form, and to the degree needed,
> embrace its quirks.  ("No statement left behind.")
> 
> #### Where we started
> 
> In the first five minutes of working on this project, we sketched out
> the following (call it the "napkin sketch"), where an expression switch
> has case arms of the form:
> 
>    case L -> e;
> or
>    case L -> { statement*; break e; }
> 
> This was enough to get started, but of course the devil is in the details.
> 
> #### Where we are right now
> 
> We moved away from the napkin sketch for a few reasons, in part because
> it seemed to be drawing us down the road towards switch and snitch --
> which was further worrying as we still had yet to deal with the
> potential that pattern switch and constant switch might have differences
> as well.  We want a unified model of switch that deals well enough with
> all the cases -- expressions and statements, patterns and constants.
> 
> Our current model (call this Unification Attempt #1, or UA1 for short)
> is a step towards a unified model of switch, and this is a huge step
> forward.  In this model, there's one switch construct, and there's one
> set of control flow rules, including for break (like return, break takes
> a value in a value context and is void in a void context).
> 
> For convenience and safety, we then layered a shorthand atop
> value-bearing switches, which is to interpret
> 
>     case L -> e;
> 
> as
> 
>     case L: break e;
> 
> expecting the shorter form would be used almost all the time.  (This has
> a pleasing symmetry with the expression form of lambdas, and (at least
> for expression switches) alleviates two of the legacy pain points. 
> Switch expressions have other things in common with lambdas too; they
> are the only ones that can have statements; they are the only ones that
> interact with nonlocal control flow.)
> 
> This approach offers a lot of flexibility (some would say too much). 
> You can write "remi-style" expression switches:
> 
>     int x = switch (y) {
>         case 1: break 2;
>         case 2: break 4;
>         default: break 8;
>     };
> 
> or you can write "new-style" expression switches:
> 
>     int x = switch (y) {
>         case 1 -> 2;
>         case 2-> 4;
>         default-> 8;
>     };
> 
> Some people like the transparency of the first; others like the
> compactness and fallthrough-safety of the second.  And in cases where
> you mostly want the benefits of the second, but the real world conspires
> to make one or two cases difficult, you can mix them, and take full
> advantage of what "old switch" does -- with no new rules for control flow.
> 
> #### Complaints
> 
> There were the usual array of complaints over syntax -- many of which
> can be put down to "bleah, new is different, different is bad", but the
> most prominent one seems to be a generalized concern that other users
> (never us, of course, but we always fear for what others might do) won't
> be able to "handle" the power of mixed switches and will write terrible
> code, and then the world will burn.  (And, because the mixing comes with
> fallthrough, it further engenders the "you idiots, you fixed the wrong
> thing" reactions.) Personally, I think the fear of mixing is deeply
> overblown -- I think in most cases people will gravitate towards one of
> the two clean styles, and only mix where the complexity of the real
> world forces them to, but there's value in understanding the
> underpinnings of such reactions, even if in the end they'd turn out to
> be much hot air about nothing.
> 
> #### A real issue with mixing!
> 
> But, there is a real problem with our approach, which is: while a
> unified switch is the right goal, UA1 is not unified _enough_.
> Specifically, we haven't fully aligned the statement forms, and this
> conspires to reduce expressiveness and safety.  That is, in an
> expression switch you can say:
> 
>     case L -> e;
> 
> but in a statement switch you can't say
> 
>     case L -> s;
> 
> The reason for this is a purely accidental one: if we allowed this, then
> we _would_ likely find ourselves in the mixing hell that people are
> afraid of, which in turn would make the risk of accidental fallthrough
> _even worse_ than it is today.  So the failing of mixing is not that it
> will be abused, but that it constrains us from actually getting to a
> unified construct.
> 
> ## Closing the gap
> 
> So, let's take one more step towards unifying the two forms (call this
> UA2), rather than a step away from it.  Let's say that _all_ switches
> can support either old-style (colon) or new-style (arrow) case labels --
> but must stick to one kind of case label in a given switch:
> 
>     // statement switch
>     switch (x) {
>         case 1: println("one"); break;
>         case 2: println("two"); break;
>     }
> 
> or
> 
>     // also statement switch
>     switch (x) {
>         case 1 -> println("one");
>         case 2 -> println("two");
>     }
> 
> If a switch is a statement, the RHS is a statement, which can be a block
> statement:
> 
>     case L -> { a; b; }
> 
> We get there by first taking a step backwards, at least in terms of
> superficial syntax, to the syntax suggested by the napkin sketch, where
> if a switch is an expression, the RHS of an -> case is an expression or
> a block statement (in the latter case, it must complete abruptly by
> reason of either break-value or throw).  Just as we expected "break
> value" to be rare in expression switches under UA1 since developers will
> generally prefer the shorthand form where applicable, we expect it to be
> equally rare under UA2.
> 
> Then, as in UA1, we render unto expressions the things that belong to
> expressions; they must be total (an expression must yield a value or
> complete abruptly by reason of throwing.)
> 
> #### Look, accidental benefits!
> 
> Many of switches failings (fallthrough, scoping) are not directly
> specified features, as much as emergent properties of the structure and
> control flow of switches.  Since by definition you can't fall out of a
> arrow case, then an all-arrow switch gives the fallthrough-haters what
> they want "for free", with no need to treat it specially. In fact, its
> even better; in the all-arrow form, all of the things people hate about
> switch -- the need to say break, the risk of fallthrough, and the
> questionable scoping -- all go away.
> 
> #### Scorecard
> 
> There is one switch construct, which can be use as either an expression
> or a statement; when used as an expression, it acquires the
> characteristics of expressions (must be total, no nonlocal control flow
> out.)  Each can be expressed in one of two syntactic forms (arrow and
> colon.)  All forms will support patterns, null handling, and multiple
> labels per case.  The control flow and scoping rules are driven by
> structural properties of the chosen form.
> 
> The (statement, colon) case is the switch we have since Java 1.0,
> enhanced as above (patterns, nulls, etc.)
> 
> The (statement, arrow) case can be considered a nice syntactic shorthand
> for the previous, which obviates the annoyance of "break", implicitly
> prevents fallthrough of all forms, and avoids the confusion of current
> switch scoping.  Many existing statement switches that are not
> expressions in disguise can be refactored to this.
> 
> The (expression, colon) form is a subset of UA1, where you just never
> say "arrow".
> 
> The (expression, arrow) case can again be considered a nice shorthand
> for the previous, again a subset of UA1, where you just never say
> "colon", and as a result, again don't have to think about fallthrough.
> 
> Totality is a property of expression switches, regardless of form,
> because they are expressions, and expressions must be total.
> 
> Fallthrough is a property of the colon-structured switches; there are no
> changes here.
> 
> Nonlocal control flow _out_ of a switch (continue to an enclosing loop,
> break with label, return) are properties of statement switches.
> 
> So essentially, rather than dividing the semantics along
> expression/statement lines, and then attempting to opportunistically
> heap a bunch of irrelevant features like "no fallthrough" onto the
> expression side "because they're cool" even though they have nothing to
> do with expression-ness, we instead divide the world structurally: the
> colon form gives you the old control flow, and the arrow form gives you
> the new.  And either can be used as a statement, or an expression.  And
> no one will be confused by mixing.
> 
> Orthogonality FTW.  No statement gets left behind.
> 
> ## Explaining it
> 
> Relative to UA1, we could describe this as adding back the blocks (its
> not really a block expression) from the napkin model, supporting an
> arrow form of statement switches with blocks too, and then restricting
> switches to all-arrow or all-colon.  Then each quadrant is a restriction
> of this model.  But that's not how we'd teach it.
> 
> Relative to Java 10, we'd probably say:
> 
>  - Switch statements now come in a simpler (arrow) flavor, where there
> is no fallthrough, no weird scoping, and no need to say break most of
> the time.  Many switches can be rewritten this way, and this form can
> even be taught first.
>  - Switches can be used as either expressions or statements, with
> essentially identical syntax (some grammar differences, but this is
> mostly interesting only to spec writers).  If a switch is an expression,
> it should contain expressions; if a switch is a statement, it should
> contain statements.
>  - Expression switches have additional restrictions that are derived
> exclusively from their expression-ness: totality, can only complete
> abruptly if by reason of throw.
>  - We allow a break-with-value statement in an expression switch as a
> means of explicitly providing the switch result; this can be combined
> with a statement block to allow for statements+break-expression.
> 
> The result is one switch construct, with modern and legacy flavors,
> which supports either expressions or statements.  You can immediately
> look at the middle of a switch and tell (by arrow vs colon) whether it
> has the legacy control flow or not.
> 
> 
>