[switch] Further unification on switch

Wed Apr 25 04:02:29 UTC 2018

I like this proposal and, in particular, I strongly support " ## Closing
the gap" section.
Enforcing uniform style on every particular switch allows to have clean and
intuitive semantics for
arrow switches while giving a straightforward migration path that can be
assisted by tools to old-style ones.

-Dmitry

On Thu, Apr 19, 2018 at 1:44 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> We've been reviewing the work to date on switch expressions. Here's where
> we are, and here's a possible place we might move to, which I like a lot
> better than where we are now.
>
> ## Goals
>
> As a reminder, remember that the primary goal here is _not_ switch
> expressions; switch expressions are supposed to just be an uncontroversial
> waypoint on the way to the real goal, which is a more expressive and
> flexible switch construct that works in a wider variety of situations,
> including supporting patterns, being less hostile to null, use as either an
> expression or a statement, etc.
>
> And the reason we think that improving switch is the right primary goal is
> because a "do one of these based on ..." construct is _better_ than the
> corresponding chain of if-else-if, for multiple reasons:
>
>  - Possibility for the compiler to do exhaustiveness analysis, potentially
> finding more bugs;
>  - Possibility for more efficient dispatch -- a switch could be O(1),
> whereas an if-else chain is almost certainly O(n);
>  - More semantically transparent -- it's obvious the user is saying "do
> one of these, based on ...";
>  - Eliminates the need to repeat (and possibly get wrong) the switch
> target.
>
> Switch does come with a lot of baggage (fallthrough by default,
> questionable scoping, need to explicitly break), and this baggage has
> produced the predictable distractions in the discussion -- a desire that we
> subordinate the primary goal (making switch more expressive) to the more
> contingent goal of "fixing" the legacy problems of switch.
>
> These legacy problems of switch may be unfortunate, but to whatever degree
> we end up ameliorating these, this has to be purely a side-benefit -- it's
> not the primarily goal, no matter how annoying people find them.  (The
> desire to "fix" the mistakes of the past is frequently a siren song, which
> is why we don't allow ourselves to take these as first-class requirements.)
>
> #### What we're not going to do
>
> The worst possible outcome (which is also the most commonly suggested
> "solution" in forums like reddit) would be to invent a new construct that
> is similar to, but not quite the same as switch (`snitch`), without being a
> 100% replacement for today's quirky switch.  Today's switch is surely
> suboptimal, but it's not so fatally flawed that it needs to be euthanized,
> and we don't want to create an "undead" language construct forever, which
> everyone will still have to learn, and keep track of the differences
> between `switch` and `snitch`.  No thank you.
>
> That means we extend the existing switch statement, and increase
> flexibility by supporting an expression form, and to the degree needed,
> embrace its quirks.  ("No statement left behind.")
>
> #### Where we started
>
> In the first five minutes of working on this project, we sketched out the
> following (call it the "napkin sketch"), where an expression switch has
> case arms of the form:
>
>    case L -> e;
> or
>    case L -> { statement*; break e; }
>
> This was enough to get started, but of course the devil is in the details.
>
> #### Where we are right now
>
> We moved away from the napkin sketch for a few reasons, in part because it
> seemed to be drawing us down the road towards switch and snitch -- which
> was further worrying as we still had yet to deal with the potential that
> pattern switch and constant switch might have differences as well.  We want
> a unified model of switch that deals well enough with all the cases --
> expressions and statements, patterns and constants.
>
> Our current model (call this Unification Attempt #1, or UA1 for short) is
> a step towards a unified model of switch, and this is a huge step forward.
> In this model, there's one switch construct, and there's one set of control
> flow rules, including for break (like return, break takes a value in a
> value context and is void in a void context).
>
> For convenience and safety, we then layered a shorthand atop value-bearing
> switches, which is to interpret
>
>     case L -> e;
>
> as
>
>     case L: break e;
>
> expecting the shorter form would be used almost all the time.  (This has a
> pleasing symmetry with the expression form of lambdas, and (at least for
> expression switches) alleviates two of the legacy pain points.  Switch
> expressions have other things in common with lambdas too; they are the only
> ones that can have statements; they are the only ones that interact with
> nonlocal control flow.)
>
> This approach offers a lot of flexibility (some would say too much).  You
> can write "remi-style" expression switches:
>
>     int x = switch (y) {
>         case 1: break 2;
>         case 2: break 4;
>         default: break 8;
>     };
>
> or you can write "new-style" expression switches:
>
>     int x = switch (y) {
>         case 1 -> 2;
>         case 2-> 4;
>         default-> 8;
>     };
>
> Some people like the transparency of the first; others like the
> compactness and fallthrough-safety of the second.  And in cases where you
> mostly want the benefits of the second, but the real world conspires to
> make one or two cases difficult, you can mix them, and take full advantage
> of what "old switch" does -- with no new rules for control flow.
>
> #### Complaints
>
> There were the usual array of complaints over syntax -- many of which can
> be put down to "bleah, new is different, different is bad", but the most
> prominent one seems to be a generalized concern that other users (never us,
> of course, but we always fear for what others might do) won't be able to
> "handle" the power of mixed switches and will write terrible code, and then
> the world will burn.  (And, because the mixing comes with fallthrough, it
> further engenders the "you idiots, you fixed the wrong thing" reactions.)
> Personally, I think the fear of mixing is deeply overblown -- I think in
> most cases people will gravitate towards one of the two clean styles, and
> only mix where the complexity of the real world forces them to, but there's
> value in understanding the underpinnings of such reactions, even if in the
> end they'd turn out to be much hot air about nothing.
>
> #### A real issue with mixing!
>
> But, there is a real problem with our approach, which is: while a unified
> switch is the right goal, UA1 is not unified _enough_. Specifically, we
> haven't fully aligned the statement forms, and this conspires to reduce
> expressiveness and safety.  That is, in an expression switch you can say:
>
>     case L -> e;
>
> but in a statement switch you can't say
>
>     case L -> s;
>
> The reason for this is a purely accidental one: if we allowed this, then
> we _would_ likely find ourselves in the mixing hell that people are afraid
> of, which in turn would make the risk of accidental fallthrough _even
> worse_ than it is today.  So the failing of mixing is not that it will be
> abused, but that it constrains us from actually getting to a unified
> construct.
>
> ## Closing the gap
>
> So, let's take one more step towards unifying the two forms (call this
> UA2), rather than a step away from it.  Let's say that _all_ switches can
> support either old-style (colon) or new-style (arrow) case labels -- but
> must stick to one kind of case label in a given switch:
>
>     // statement switch
>     switch (x) {
>         case 1: println("one"); break;
>         case 2: println("two"); break;
>     }
>
> or
>
>     // also statement switch
>     switch (x) {
>         case 1 -> println("one");
>         case 2 -> println("two");
>     }
>
> If a switch is a statement, the RHS is a statement, which can be a block
> statement:
>
>     case L -> { a; b; }
>
> We get there by first taking a step backwards, at least in terms of
> superficial syntax, to the syntax suggested by the napkin sketch, where if
> a switch is an expression, the RHS of an -> case is an expression or a
> block statement (in the latter case, it must complete abruptly by reason of
> either break-value or throw).  Just as we expected "break value" to be rare
> in expression switches under UA1 since developers will generally prefer the
> shorthand form where applicable, we expect it to be equally rare under UA2.
>
> Then, as in UA1, we render unto expressions the things that belong to
> expressions; they must be total (an expression must yield a value or
> complete abruptly by reason of throwing.)
>
> #### Look, accidental benefits!
>
> Many of switches failings (fallthrough, scoping) are not directly
> specified features, as much as emergent properties of the structure and
> control flow of switches.  Since by definition you can't fall out of a
> arrow case, then an all-arrow switch gives the fallthrough-haters what they
> want "for free", with no need to treat it specially. In fact, its even
> better; in the all-arrow form, all of the things people hate about switch
> -- the need to say break, the risk of fallthrough, and the questionable
> scoping -- all go away.
>
> #### Scorecard
>
> There is one switch construct, which can be use as either an expression or
> a statement; when used as an expression, it acquires the characteristics of
> expressions (must be total, no nonlocal control flow out.)  Each can be
> expressed in one of two syntactic forms (arrow and colon.)  All forms will
> support patterns, null handling, and multiple labels per case.  The control
> flow and scoping rules are driven by structural properties of the chosen
> form.
>
> The (statement, colon) case is the switch we have since Java 1.0, enhanced
> as above (patterns, nulls, etc.)
>
> The (statement, arrow) case can be considered a nice syntactic shorthand
> for the previous, which obviates the annoyance of "break", implicitly
> prevents fallthrough of all forms, and avoids the confusion of current
> switch scoping.  Many existing statement switches that are not expressions
> in disguise can be refactored to this.
>
> The (expression, colon) form is a subset of UA1, where you just never say
> "arrow".
>
> The (expression, arrow) case can again be considered a nice shorthand for
> the previous, again a subset of UA1, where you just never say "colon", and
> as a result, again don't have to think about fallthrough.
>
> Totality is a property of expression switches, regardless of form, because
> they are expressions, and expressions must be total.
>
> Fallthrough is a property of the colon-structured switches; there are no
> changes here.
>
> Nonlocal control flow _out_ of a switch (continue to an enclosing loop,
> break with label, return) are properties of statement switches.
>
> So essentially, rather than dividing the semantics along
> expression/statement lines, and then attempting to opportunistically heap a
> bunch of irrelevant features like "no fallthrough" onto the expression side
> "because they're cool" even though they have nothing to do with
> expression-ness, we instead divide the world structurally: the colon form
> gives you the old control flow, and the arrow form gives you the new.  And
> either can be used as a statement, or an expression.  And no one will be
> confused by mixing.
>
> Orthogonality FTW.  No statement gets left behind.
>
> ## Explaining it
>
> Relative to UA1, we could describe this as adding back the blocks (its not
> really a block expression) from the napkin model, supporting an arrow form
> of statement switches with blocks too, and then restricting switches to
> all-arrow or all-colon.  Then each quadrant is a restriction of this
> model.  But that's not how we'd teach it.
>
> Relative to Java 10, we'd probably say:
>
>  - Switch statements now come in a simpler (arrow) flavor, where there is
> no fallthrough, no weird scoping, and no need to say break most of the
> time.  Many switches can be rewritten this way, and this form can even be
> taught first.
>  - Switches can be used as either expressions or statements, with
> essentially identical syntax (some grammar differences, but this is mostly
> interesting only to spec writers).  If a switch is an expression, it should
> contain expressions; if a switch is a statement, it should contain
> statements.
>  - Expression switches have additional restrictions that are derived
> exclusively from their expression-ness: totality, can only complete
> abruptly if by reason of throw.
>  - We allow a break-with-value statement in an expression switch as a
> means of explicitly providing the switch result; this can be combined with
> a statement block to allow for statements+break-expression.
>
> The result is one switch construct, with modern and legacy flavors, which
> supports either expressions or statements.  You can immediately look at the
> middle of a switch and tell (by arrow vs colon) whether it has the legacy
> control flow or not.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180424/6f11f5dc/attachment-0001.html>