[switch] Further unification on switch

Mon Apr 23 18:48:37 UTC 2018

----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Jeudi 19 Avril 2018 22:44:45
> Objet: [switch] Further unification on switch

> We've been reviewing the work to date on switch expressions. Here's
> where we are, and here's a possible place we might move to, which I like
> a lot better than where we are now.
> 
> ## Goals
> 
> As a reminder, remember that the primary goal here is _not_ switch
> expressions; switch expressions are supposed to just be an
> uncontroversial waypoint on the way to the real goal, which is a more
> expressive and flexible switch construct that works in a wider variety
> of situations, including supporting patterns, being less hostile to
> null, use as either an expression or a statement, etc.
> 
> And the reason we think that improving switch is the right primary goal
> is because a "do one of these based on ..." construct is _better_ than
> the corresponding chain of if-else-if, for multiple reasons:
> 
>  - Possibility for the compiler to do exhaustiveness analysis,
> potentially finding more bugs;
>  - Possibility for more efficient dispatch -- a switch could be O(1),
> whereas an if-else chain is almost certainly O(n);

   and we can also create the dispatch code dynamically, which can be more efficient
   (when calling the switch with only a few values) 

>  - More semantically transparent -- it's obvious the user is saying "do
> one of these, based on ...";
>  - Eliminates the need to repeat (and possibly get wrong) the switch
> target.
> 
> Switch does come with a lot of baggage (fallthrough by default,
> questionable scoping, need to explicitly break), and this baggage has
> produced the predictable distractions in the discussion -- a desire that
> we subordinate the primary goal (making switch more expressive) to the
> more contingent goal of "fixing" the legacy problems of switch.
> 
> These legacy problems of switch may be unfortunate, but to whatever
> degree we end up ameliorating these, this has to be purely a
> side-benefit -- it's not the primarily goal, no matter how annoying
> people find them.  (The desire to "fix" the mistakes of the past is
> frequently a siren song, which is why we don't allow ourselves to take
> these as first-class requirements.)
> 
> #### What we're not going to do
> 
> The worst possible outcome (which is also the most commonly suggested
> "solution" in forums like reddit) would be to invent a new construct
> that is similar to, but not quite the same as switch (`snitch`), without
> being a 100% replacement for today's quirky switch.  Today's switch is
> surely suboptimal, but it's not so fatally flawed that it needs to be
> euthanized, and we don't want to create an "undead" language construct
> forever, which everyone will still have to learn, and keep track of the
> differences between `switch` and `snitch`.  No thank you.
> 
> That means we extend the existing switch statement, and increase
> flexibility by supporting an expression form, and to the degree needed,
> embrace its quirks.  ("No statement left behind.")
> 
> #### Where we started
> 
> In the first five minutes of working on this project, we sketched out
> the following (call it the "napkin sketch"), where an expression switch
> has case arms of the form:
> 
>    case L -> e;
> or
>    case L -> { statement*; break e; }
> 
> This was enough to get started, but of course the devil is in the details.
> 
> #### Where we are right now
> 
> We moved away from the napkin sketch for a few reasons, in part because
> it seemed to be drawing us down the road towards switch and snitch --
> which was further worrying as we still had yet to deal with the
> potential that pattern switch and constant switch might have differences
> as well.  We want a unified model of switch that deals well enough with
> all the cases -- expressions and statements, patterns and constants.
> 
> Our current model (call this Unification Attempt #1, or UA1 for short)
> is a step towards a unified model of switch, and this is a huge step
> forward.  In this model, there's one switch construct, and there's one
> set of control flow rules, including for break (like return, break takes
> a value in a value context and is void in a void context).
> 
> For convenience and safety, we then layered a shorthand atop
> value-bearing switches, which is to interpret
> 
>     case L -> e;
> 
> as
> 
>     case L: break e;
> 
> expecting the shorter form would be used almost all the time.  (This has
> a pleasing symmetry with the expression form of lambdas, and (at least
> for expression switches) alleviates two of the legacy pain points.
> Switch expressions have other things in common with lambdas too; they
> are the only ones that can have statements; they are the only ones that
> interact with nonlocal control flow.)
> 
> This approach offers a lot of flexibility (some would say too much).
> You can write "remi-style" expression switches:
> 
>     int x = switch (y) {
>         case 1: break 2;
>         case 2: break 4;
>         default: break 8;
>     };
> 
> or you can write "new-style" expression switches:
> 
>     int x = switch (y) {
>         case 1 -> 2;
>         case 2-> 4;
>         default-> 8;
>     };
> 
> Some people like the transparency of the first; others like the
> compactness and fallthrough-safety of the second.  And in cases where
> you mostly want the benefits of the second, but the real world conspires
> to make one or two cases difficult, you can mix them, and take full
> advantage of what "old switch" does -- with no new rules for control flow.
> 
> #### Complaints
> 
> There were the usual array of complaints over syntax -- many of which
> can be put down to "bleah, new is different, different is bad", but the
> most prominent one seems to be a generalized concern that other users
> (never us, of course, but we always fear for what others might do) won't
> be able to "handle" the power of mixed switches and will write terrible
> code, and then the world will burn.  (And, because the mixing comes with
> fallthrough, it further engenders the "you idiots, you fixed the wrong
> thing" reactions.) Personally, I think the fear of mixing is deeply
> overblown -- I think in most cases people will gravitate towards one of
> the two clean styles, and only mix where the complexity of the real
> world forces them to, but there's value in understanding the
> underpinnings of such reactions, even if in the end they'd turn out to
> be much hot air about nothing.
> 
> #### A real issue with mixing!
> 
> But, there is a real problem with our approach, which is: while a
> unified switch is the right goal, UA1 is not unified _enough_.
> Specifically, we haven't fully aligned the statement forms, and this
> conspires to reduce expressiveness and safety.  That is, in an
> expression switch you can say:
> 
>     case L -> e;
> 
> but in a statement switch you can't say
> 
>     case L -> s;
> 
> The reason for this is a purely accidental one: if we allowed this, then
> we _would_ likely find ourselves in the mixing hell that people are
> afraid of, which in turn would make the risk of accidental fallthrough
> _even worse_ than it is today.  So the failing of mixing is not that it
> will be abused, but that it constrains us from actually getting to a
> unified construct.

good argument !

> 
> ## Closing the gap
> 
> So, let's take one more step towards unifying the two forms (call this
> UA2), rather than a step away from it.  Let's say that _all_ switches
> can support either old-style (colon) or new-style (arrow) case labels --
> but must stick to one kind of case label in a given switch:
> 
>     // statement switch
>     switch (x) {
>         case 1: println("one"); break;
>         case 2: println("two"); break;
>     }
> 
> or
> 
>     // also statement switch
>     switch (x) {
>         case 1 -> println("one");
>         case 2 -> println("two");
>     }
> 
> If a switch is a statement, the RHS is a statement, which can be a block
> statement:
> 
>     case L -> { a; b; }
> 
> We get there by first taking a step backwards, at least in terms of
> superficial syntax, to the syntax suggested by the napkin sketch, where
> if a switch is an expression, the RHS of an -> case is an expression or
> a block statement (in the latter case, it must complete abruptly by
> reason of either break-value or throw).  Just as we expected "break
> value" to be rare in expression switches under UA1 since developers will
> generally prefer the shorthand form where applicable, we expect it to be
> equally rare under UA2.
> 
> Then, as in UA1, we render unto expressions the things that belong to
> expressions; they must be total (an expression must yield a value or
> complete abruptly by reason of throwing.)
> 
> #### Look, accidental benefits!
> 
> Many of switches failings (fallthrough, scoping) are not directly
> specified features, as much as emergent properties of the structure and
> control flow of switches.  Since by definition you can't fall out of a
> arrow case, then an all-arrow switch gives the fallthrough-haters what
> they want "for free", with no need to treat it specially. In fact, its
> even better; in the all-arrow form, all of the things people hate about
> switch -- the need to say break, the risk of fallthrough, and the
> questionable scoping -- all go away.
> 
> #### Scorecard
> 
> There is one switch construct, which can be use as either an expression
> or a statement; when used as an expression, it acquires the
> characteristics of expressions (must be total, no nonlocal control flow
> out.)  Each can be expressed in one of two syntactic forms (arrow and
> colon.)  All forms will support patterns, null handling, and multiple
> labels per case.  The control flow and scoping rules are driven by
> structural properties of the chosen form.
> 
> The (statement, colon) case is the switch we have since Java 1.0,
> enhanced as above (patterns, nulls, etc.)
> 
> The (statement, arrow) case can be considered a nice syntactic shorthand
> for the previous, which obviates the annoyance of "break", implicitly
> prevents fallthrough of all forms, and avoids the confusion of current
> switch scoping.  Many existing statement switches that are not
> expressions in disguise can be refactored to this.
> 
> The (expression, colon) form is a subset of UA1, where you just never
> say "arrow".
> 
> The (expression, arrow) case can again be considered a nice shorthand
> for the previous, again a subset of UA1, where you just never say
> "colon", and as a result, again don't have to think about fallthrough.
> 
> Totality is a property of expression switches, regardless of form,
> because they are expressions, and expressions must be total.
> 
> Fallthrough is a property of the colon-structured switches; there are no
> changes here.
> 
> Nonlocal control flow _out_ of a switch (continue to an enclosing loop,
> break with label, return) are properties of statement switches.
> 
> So essentially, rather than dividing the semantics along
> expression/statement lines, and then attempting to opportunistically
> heap a bunch of irrelevant features like "no fallthrough" onto the
> expression side "because they're cool" even though they have nothing to
> do with expression-ness, we instead divide the world structurally: the
> colon form gives you the old control flow, and the arrow form gives you
> the new.  And either can be used as a statement, or an expression.  And
> no one will be confused by mixing.
> 
> Orthogonality FTW.  No statement gets left behind.
> 
> ## Explaining it
> 
> Relative to UA1, we could describe this as adding back the blocks (its
> not really a block expression) from the napkin model, supporting an
> arrow form of statement switches with blocks too, and then restricting
> switches to all-arrow or all-colon.  Then each quadrant is a restriction
> of this model.  But that's not how we'd teach it.
> 
> Relative to Java 10, we'd probably say:
> 
>  - Switch statements now come in a simpler (arrow) flavor, where there
> is no fallthrough, no weird scoping, and no need to say break most of
> the time.  Many switches can be rewritten this way, and this form can
> even be taught first.
>  - Switches can be used as either expressions or statements, with
> essentially identical syntax (some grammar differences, but this is
> mostly interesting only to spec writers).  If a switch is an expression,
> it should contain expressions; if a switch is a statement, it should
> contain statements.
>  - Expression switches have additional restrictions that are derived
> exclusively from their expression-ness: totality, can only complete
> abruptly if by reason of throw.
>  - We allow a break-with-value statement in an expression switch as a
> means of explicitly providing the switch result; this can be combined
> with a statement block to allow for statements+break-expression.
> 
> The result is one switch construct, with modern and legacy flavors,
> which supports either expressions or statements.  You can immediately
> look at the middle of a switch and tell (by arrow vs colon) whether it
> has the legacy control flow or not.

I really like this proposal,
the main issue i see (and i've already said that) is that now when you see -> { ... } in a Java code, it's not clear if it's open a function scope or a block scope,
but given how far we goes into the rabbit hole when we tried to stick with the colon syntax and the fact that nobody among the conference attendees i've discussed with seem to care, going full arrows with the interesting spin of letting the statement switch to be refactored to use -> is the way to go.

Rémi