From dl at cs.oswego.edu Sun Apr 1 11:59:07 2018 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 1 Apr 2018 07:59:07 -0400 Subject: Expression switch exception naming In-Reply-To: <1705576534.2315182.1522498466503.JavaMail.zimbra@u-pem.fr> References: <0CB0D6F1-83AF-4C91-8A86-77BB8201DF67@oracle.com> <2a83d8fb-df1d-a523-3399-44e1c4b5b891@oracle.com>

<1705576534.2315182.1522498466503.JavaMail.zimbra@u-pem.fr> Message-ID: On Sat, March 31, 2018 8:14 am, Remi Forax wrote: > An enum class is always sealed, there is a fixed number of constants > values, and there is also a fixed number of subtypes, > otherwise values() is not correctly implemented. Right. My point was that allowing "final" might allow programmers to distinguish the cases under question, rather forcing a new arbitrary rule about switch and enum being in same module, or whatever. -Doug Is non-exhaustiveness an Error or not? > > R??mi > > ----- Mail original ----- >> De: "Doug Lea"

>> ??: "Brian Goetz" >> Cc: "amber-spec-experts" >> Envoy??: Samedi 31 Mars 2018 13:56:44 >> Objet: Re: Expression switch exception naming > >> On Fri, March 30, 2018 1:48 pm, Brian Goetz wrote: >> >>> >>> So an alternative here is to tweak the language so that the "conclude >>> exhaustiveness if all enum constants are present" behavior should be >>> reserved for the cases where the switch and the enum are in the same >>> module? >>> >> >> I might have missed discussion of this, but has anyone considered >> the alternative of finally allowing "final" on an enum class? In >> this case, several sets of simpler alternatives would be possible. >> >> -Doug > From brian.goetz at oracle.com Tue Apr 3 16:36:43 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 3 Apr 2018 12:36:43 -0400 Subject: Compile-time type hierarchy information in pattern switch Message-ID: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> Along the lines of the previous discussion about separate compilation skew with enums ... I'm trying to find the right place to draw the line with respect to post-compilation class hierarchy changes. Recall that we can impose a _dominance ordering_ on patterns; pattern P dominates Q if everything that is matched by Q also is matched by P. We already use this today, in catch blocks, to reject programs with dead code; you can't say `catch Exception` before `catch IOException`, because the latter block would be dead. We want to do the same with patterns, so: case String x: ... case Object x: ... is OK but case Object x: ... case String x: ... is rejected at compile time. Separately, we'd like for pattern matching to be efficient; the definition of "inefficient" would be for pattern matching to be inherently O(n), when we can frequently do much better. There's plenty of literature on compiling patterns to decision trees, but none of them address the problem we have to: separate compilation. So any decision tree computed at compile time might be wrong in undesirable ways by runtime. We could also compute a decision tree at runtime using indy; while this is our intent, the devil is in the details. We don't want computing the tree to be too expensive, nor do we want to have to capture O(n^2) compile-time constraints to be validated at runtime. So I'd like to focus on what changes we're willing to accept between compilation and runtime, what our expectations would be for those changes. We've already discussed one of these: novel values in enum / sealed type switches, and for them, the answer is throwing some sort of exception. Another that we dealt with long ago is changing enum ordinals; we decided at the time that we're willing for this to be a BC change, so we generate extra code that uses the as-runtime ordinals rather than the as-compile-time ordinals when lowering the switch into an integer switch. (If we weren't willing to tolerate such changes, we'd have a simpler translation: just lower an enum switch to a switch on its ordinal.) Here's one that I suspect we're not expecting to recover terribly well from: hierarchy inversion. Suppose at compile time A <: B. So the following is a sensible switch body: case String: println("String"); break; case Object: println("Object"); break; Now, imagine that by runtime, String no longer extends Object, but instead Object absurdly extends String. Do we still expect the above to print String for all Strings, and Object for everything else? Or is the latter arm now dead at runtime, even though it wouldn't compile after the change? Or is this now UB, because it would no longer compile? A more realistic example of a hierarchy change is introducing an interface. If we have: interface I { } class C { } and a switch case I: ... case C: ... and later, we make C implement I, we have a similar situation; the switch would no longer compile. Are we allowed to make optimizations based on the compile-time knowledge that C nonfinal, etc.) From mark at io7m.com Wed Apr 4 17:01:05 2018 From: mark at io7m.com (Mark Raynsford) Date: Wed, 4 Apr 2018 17:01:05 +0000 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> Message-ID: <20180404170105.5486a1c7@copperhead.int.arc7.info> On 2018-04-03T12:36:43 -0400 Brian Goetz wrote: > > Here's one that I suspect we're not expecting to recover terribly well > from: hierarchy inversion. Suppose at compile time A <: B. So the > following is a sensible switch body: > > case String: println("String"); break; > case Object: println("Object"); break; > > Now, imagine that by runtime, String no longer extends Object, but > instead Object absurdly extends String. Do we still expect the above to > print String for all Strings, and Object for everything else? Or is the > latter arm now dead at runtime, even though it wouldn't compile after > the change? Or is this now UB, because it would no longer compile? I'm still giving thought to everything you've written, but I am wondering: How feasible is it to get the above to fail early with an informative exception/Error? -- Mark Raynsford | http://www.io7m.com From brian.goetz at oracle.com Wed Apr 4 17:07:17 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 4 Apr 2018 13:07:17 -0400 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <20180404170105.5486a1c7@copperhead.int.arc7.info> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> Message-ID: <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> The intended implementation strategy is to lower complex switches to densely-numbered `int` switches, and then invoke a classifier function that takes a target and returns the int corresponding to the lowered case number.? The classifier function will be an `invokedynamic`, whose static bootstrap will contain a summary of the patterns.? (We've already done this for switches on strings, enums, longs, non-dense ints, etc.) To deliver an early error, that means that (a) the compiler must encode through the static argument list all the assumptions it needs verified at runtime (e.g., `String <: Object`), and (b) at linkage time (the first time the switch is executed), those have to be tested. Doing so is plenty easy, but there's a startup cost, which could be as bad as _O(n^2)_, if I have to validate that no two case labels are ordered inconsistently with subtyping. A possible mitigation is to do the check as a system assertion, which only gets run if we are run with `-esa`; we then might still have some static code bloat (depending on how we encode the assumptions), but at least skip the dynamic check most of the time. On 4/4/2018 1:01 PM, Mark Raynsford wrote: > I'm still giving thought to everything you've written, but I am > wondering: How feasible is it to get the above to fail early with an > informative exception/Error? From peter.levart at gmail.com Thu Apr 5 14:40:28 2018 From: peter.levart at gmail.com (Peter Levart) Date: Thu, 5 Apr 2018 16:40:28 +0200 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> Message-ID: <67674e17-1e86-252f-72f2-9dd6ab78c03e@gmail.com> Hi, On 04/04/2018 07:07 PM, Brian Goetz wrote: > The intended implementation strategy is to lower complex switches to > densely-numbered `int` switches, and then invoke a classifier function > that takes a target and returns the int corresponding to the lowered > case number.? The classifier function will be an `invokedynamic`, > whose static bootstrap will contain a summary of the patterns.? (We've > already done this for switches on strings, enums, longs, non-dense > ints, etc.) > > To deliver an early error, that means that (a) the compiler must > encode through the static argument list all the assumptions it needs > verified at runtime (e.g., `String <: Object`), and (b) at linkage > time (the first time the switch is executed), those have to be tested. > > Doing so is plenty easy, but there's a startup cost, which could be as > bad as _O(n^2)_, if I have to validate that no two case labels are > ordered inconsistently with subtyping. Not necessarily. O(n log n) at worst for stable-sorting n cases which, if already sorted in compile time (i.e. no subtype changes between compile and link time), are resorted using just n-1 comparisons. That's if you want to "fix" the order of cases at link-time in order to compute optimal dispatch logic. If you only want to verify and bail-out if they are not sorted already (i.e. you only accept changes in type hierarchy that don't change order of cases), you always need just n-1 comparisons. The question is whether you only want to re-order / check-order according to type hierarchy or also according to other aspects of "dominance", for example: case Point p where (p.x >= 0 && p.y >= 0): ... case Point p where (p.x >= 0): ... Other aspects of dominance usually don't change between compile and link time, so stable-sorting cases could take just type hierarchy into account, unless you also allow type-hierarchy based conditions in where patterns, for example: case Holder h where (h.value instanceof TypeA): ... case Holder h where (h.value instanceof TypeB): ... Another problem with re-ordering cases at link time is when you support fall-through. What are fall-through(s) in a switch with re-ordered cases? For example: interface A {} interface B extends A {} switch (x) { ??? case B b: ??? ??? ... ??? ??? // fall-through... ??? case A a: ??? ??? A ab = ... ? a : b; ??? ??? ... What happens when you remove A from supertypes of B in a separately compiled code: interface A {} interface B {} Perhaps there's no need to worry about this as verifier would already catch such invalid code during runtime. So fall-through(s) could just stay the same even if cases are virtually reordered for the purpose of computing dispatch logic. The fall-through logic could sometimes survive changes in type hierarchy unnoticed by verifier but would give questionable results when executed. But that could be said for any logic, not necessarily concerned with switch statements. Here's some experiment I played with that clearly separates compile-time, link-time and run-time parts of logic and is just API. You can even simulate the effects of adding subtype relationship(s) between compile-time of switch and link-time: http://cr.openjdk.java.net/~plevart/misc/TypeSwitch/TypeSwitch.java Regards, Peter > > A possible mitigation is to do the check as a system assertion, which > only gets run if we are run with `-esa`; we then might still have some > static code bloat (depending on how we encode the assumptions), but at > least skip the dynamic check most of the time. > > On 4/4/2018 1:01 PM, Mark Raynsford wrote: >> I'm still giving thought to everything you've written, but I am >> wondering: How feasible is it to get the above to fail early with an >> informative exception/Error? > From forax at univ-mlv.fr Thu Apr 5 15:21:52 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 5 Apr 2018 17:21:52 +0200 (CEST) Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> Message-ID: <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "mark" > Cc: "amber-spec-experts" > Envoy?: Mercredi 4 Avril 2018 19:07:17 > Objet: Re: Compile-time type hierarchy information in pattern switch > The intended implementation strategy is to lower complex switches to > densely-numbered `int` switches, and then invoke a classifier function > that takes a target and returns the int corresponding to the lowered > case number.? The classifier function will be an `invokedynamic`, whose > static bootstrap will contain a summary of the patterns.? (We've already > done this for switches on strings, enums, longs, non-dense ints, etc.) > > To deliver an early error, that means that (a) the compiler must encode > through the static argument list all the assumptions it needs verified > at runtime (e.g., `String <: Object`), and (b) at linkage time (the > first time the switch is executed), those have to be tested. > > Doing so is plenty easy, but there's a startup cost, which could be as > bad as _O(n^2)_, if I have to validate that no two case labels are > ordered inconsistently with subtyping. > > A possible mitigation is to do the check as a system assertion, which > only gets run if we are run with `-esa`; we then might still have some > static code bloat (depending on how we encode the assumptions), but at > least skip the dynamic check most of the time. Or we can not try to do any check at runtime that validate the view of the world at compile time. Currently, there is no check that verifies that the catch are in the right order or that a cascade of if-instanceofs means the same thing at compile time and at runtime. My opinion, we should just run the code that was compiled, even if the world as changed between the compilation and the execution. R?mi > > On 4/4/2018 1:01 PM, Mark Raynsford wrote: >> I'm still giving thought to everything you've written, but I am >> wondering: How feasible is it to get the above to fail early with an > > informative exception/Error? From brian.goetz at oracle.com Thu Apr 5 15:25:36 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 Apr 2018 11:25:36 -0400 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> Message-ID: <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> Yes, this is surely an option. But it doesn't answer the underlying question -- if the hierarchy changes in various ways between compile and runtime, what behavior can the user count on, and what changes yield "undefined" behavior? While its easy to say "you should do what the code says", taking that too far ties tie our hands behind our back, and makes switches that should be O(1) into O(n). On 4/5/2018 11:21 AM, Remi Forax wrote: > Or we can not try to do any check at runtime that validate the view of the world at compile time. > Currently, there is no check that verifies that the catch are in the right order or that a cascade of if-instanceofs means the same thing at compile time and at runtime. > > My opinion, we should just run the code that was compiled, even if the world as changed between the compilation and the execution. From forax at univ-mlv.fr Thu Apr 5 15:40:59 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 5 Apr 2018 17:40:59 +0200 (CEST) Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> Message-ID: <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "mark" , "amber-spec-experts" > Envoy?: Jeudi 5 Avril 2018 17:25:36 > Objet: Re: Compile-time type hierarchy information in pattern switch > Yes, this is surely an option. > > But it doesn't answer the underlying question -- if the hierarchy > changes in various ways between compile and runtime, what behavior can > the user count on, and what changes yield "undefined" behavior? no, it's not undefined, at least not an "undefined behavior" as in C. At runtime, the code executed will be the one compiled. A hierarchy changes is not a backward compatible changes, so one can expect surprise and not something undefined. > > While its easy to say "you should do what the code says", taking that > too far ties tie our hands behind our back, and makes switches that > should be O(1) into O(n). ???, not sure to understand. If we record which case was executed for a given class in a hashmap and use it as a cache, it will be always O(1) for all subsequent calls with the same class. R?mi > > On 4/5/2018 11:21 AM, Remi Forax wrote: >> Or we can not try to do any check at runtime that validate the view of the world >> at compile time. >> Currently, there is no check that verifies that the catch are in the right order >> or that a cascade of if-instanceofs means the same thing at compile time and at >> runtime. >> >> My opinion, we should just run the code that was compiled, even if the world as > > changed between the compilation and the execution. From amaembo at gmail.com Thu Apr 5 19:41:16 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Thu, 5 Apr 2018 22:41:16 +0300 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> Message-ID: Hello! Is it too harsh to reject the whole class if the assumptions on class hierarchy which were necessary to compile the switch statements used in the class are not valid at runtime? E.g. compiler may gather all the assumptions across all the pattern-matching switches within the class and add some instructions to the which check these assumptions at once (probably calling some validation method which receives the expected hierarchy in some packed way)? This way the fail-fast behavior will be guaranteed (class refuses to initialize) and while some expensive runtime checks are to be made during class initialization, in case of several pattern switches in the same class, the number of checks will be reduced (although they still will be performed even if no such switch is actually executed). With best regards, Tagir Valeev. On Thu, Apr 5, 2018 at 6:40 PM, wrote: > ----- Mail original ----- > > De: "Brian Goetz" > > ?: "Remi Forax" > > Cc: "mark" , "amber-spec-experts" < > amber-spec-experts at openjdk.java.net> > > Envoy?: Jeudi 5 Avril 2018 17:25:36 > > Objet: Re: Compile-time type hierarchy information in pattern switch > > > Yes, this is surely an option. > > > > But it doesn't answer the underlying question -- if the hierarchy > > changes in various ways between compile and runtime, what behavior can > > the user count on, and what changes yield "undefined" behavior? > > no, it's not undefined, at least not an "undefined behavior" as in C. > At runtime, the code executed will be the one compiled. A hierarchy > changes is not a backward compatible changes, so one can expect surprise > and not something undefined. > > > > > While its easy to say "you should do what the code says", taking that > > too far ties tie our hands behind our back, and makes switches that > > should be O(1) into O(n). > > ???, not sure to understand. > If we record which case was executed for a given class in a hashmap and > use it as a cache, it will be always O(1) for all subsequent calls with the > same class. > > R?mi > > > > > On 4/5/2018 11:21 AM, Remi Forax wrote: > >> Or we can not try to do any check at runtime that validate the view of > the world > >> at compile time. > >> Currently, there is no check that verifies that the catch are in the > right order > >> or that a cascade of if-instanceofs means the same thing at compile > time and at > >> runtime. > >> > >> My opinion, we should just run the code that was compiled, even if the > world as > > > changed between the compilation and the execution. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Apr 5 19:42:12 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 Apr 2018 15:42:12 -0400 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <67674e17-1e86-252f-72f2-9dd6ab78c03e@gmail.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <67674e17-1e86-252f-72f2-9dd6ab78c03e@gmail.com> Message-ID: > That's if you want to "fix" the order of cases at link-time in order > to compute optimal dispatch logic. If you only want to verify and > bail-out if they are not sorted already (i.e. you only accept changes > in type hierarchy that don't change order of cases), you always need > just n-1 comparisons. Perhaps I'm dense, but I don't see this.? Suppose I have completely unrelated interfaces I, J, K, and L.? The user says: ??? case I: ??? case J: ??? case K: ??? case L: which is fine because they're unordered.? At runtime, any of the following type relations could have been injected: ??? J <: I, K <: I, L <: I ??? K <: J, L <: J ??? L <: K and these would cause the switch to be misordered (and would have been rejected at compile time.) How am I to detect any of these with just three comparisons?? If I pick the obvious n-1 (compare each to their neighbor) I wouldn't detect any of { L <: J, K <: I, L <: I }. Skipping ahead, yes, guards do play part in the ordering, and (a) we can't detect changes to data in at runtime and (b) we can't even necessarily order the guards.? But we can detect changes to type tests at runtime.? The question is whether we should. > Another problem with re-ordering cases at link time is when you > support fall-through. What are fall-through(s) in a switch with > re-ordered cases? Our story here is straightforward; we lower a switch whose labels are patterns to a switch whose labels are ints, and encode the patterns (or parts of them) as the static bootstrap arguments of the classifier bootstrap (just a more sophisticated version of what we do for longs, strings, and enums, as discussed previously.)? The classifier spits out a number, and int switch mechanics does the rest.? The question is to what degree we can rely on the compile-time assertion that the inputs are topologically sorted. From brian.goetz at oracle.com Thu Apr 5 19:44:30 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 5 Apr 2018 15:44:30 -0400 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> Message-ID: <32a49502-952d-bb60-5572-b356794530e5@oracle.com> > Is it too harsh to reject the whole class if the assumptions on class > hierarchy which were necessary to compile the switch statements used > in the class are not valid at runtime? That is one of the questions!? And the other question is: is this too expensive to do this check at runtime, given that it will fail so infrequently. If we can detect it cheaply enough, though, we can also repair the situation and fall back to linear testing of patterns.? This seems better (we can execute the statement the user wrote) than failing. My real question is can I punt on trying to detect it, and still optimize the common cases? down to O(1) dispatch.... From forax at univ-mlv.fr Thu Apr 5 20:28:19 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 5 Apr 2018 22:28:19 +0200 (CEST) Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <32a49502-952d-bb60-5572-b356794530e5@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> <32a49502-952d-bb60-5572-b356794530e5@oracle.com> Message-ID: <712022705.545287.1522960099581.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Tagir Valeev" , "amber-spec-experts" > Envoy?: Jeudi 5 Avril 2018 21:44:30 > Objet: Re: Compile-time type hierarchy information in pattern switch >> Is it too harsh to reject the whole class if the assumptions on class >> hierarchy which were necessary to compile the switch statements used >> in the class are not valid at runtime? > > That is one of the questions!? And the other question is: is this too > expensive to do this check at runtime, given that it will fail so > infrequently. > > If we can detect it cheaply enough, though, we can also repair the > situation and fall back to linear testing of patterns.? This seems > better (we can execute the statement the user wrote) than failing. My > real question is can I punt on trying to detect it, and still optimize > the common cases? down to O(1) dispatch.... the way to detect it is to use the DAG of the supertypes (lazily constructed*), from the last to the first case, the idea is to propagate the index of down to the super types, if during the propagation, you find a supertype which is also a case and with an index lower that the currently propagated, then it's a failure. R?mi * you do not have to actually create the DAG, just be able to traverse it from the subtype to the supertypes. From peter.levart at gmail.com Thu Apr 5 21:06:50 2018 From: peter.levart at gmail.com (Peter Levart) Date: Thu, 5 Apr 2018 23:06:50 +0200 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <67674e17-1e86-252f-72f2-9dd6ab78c03e@gmail.com> Message-ID: On 04/05/18 21:42, Brian Goetz wrote: > >> That's if you want to "fix" the order of cases at link-time in order >> to compute optimal dispatch logic. If you only want to verify and >> bail-out if they are not sorted already (i.e. you only accept changes >> in type hierarchy that don't change order of cases), you always need >> just n-1 comparisons. > > Perhaps I'm dense, but I don't see this.? Suppose I have completely > unrelated interfaces I, J, K, and L.? The user says: > > ??? case I: > ??? case J: > ??? case K: > ??? case L: > > which is fine because they're unordered.? At runtime, any of the > following type relations could have been injected: > > ??? J <: I, K <: I, L <: I > ??? K <: J, L <: J > ??? L <: K > > and these would cause the switch to be misordered (and would have been > rejected at compile time.) > > How am I to detect any of these with just three comparisons?? If I > pick the obvious n-1 (compare each to their neighbor) I wouldn't > detect any of { L <: J, K <: I, L <: I }. You're right. Linear sorting would not help as there's no total order that could be derived from subtyping relationships. But as you say at the end, subtyping relationships form a directed acyclic graph on which you can perform topological sorting in linear time. Let's start with a list of cases that have already been ordered topologically at compile time. Say I, J, K, L (as in your example above). The types could be completely unrelated or there could be type relationships among them. Let's add to them synthetic "subtype" relationships (marked with <. to distinguish them from real subtype relationships <:) according to compile-time order of cases): I <. J J <. K K <. L Together with real direct subtype relationships, those form a graph. We just have to find out if this graph is acyclic or not. If it does not have a cycle, the order of case(s) is still OK and the switch is still valid. Otherwise the subtype relationships have changed in a way that makes the compile-time order of cases invalid. Finding cycle can be performed in linear time. Have I missed something this time too? Regards, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.levart at gmail.com Thu Apr 5 21:20:11 2018 From: peter.levart at gmail.com (Peter Levart) Date: Thu, 5 Apr 2018 23:20:11 +0200 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <712022705.545287.1522960099581.JavaMail.zimbra@u-pem.fr> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> <32a49502-952d-bb60-5572-b356794530e5@oracle.com> <712022705.545287.1522960099581.JavaMail.zimbra@u-pem.fr> Message-ID: <857ce394-f249-c1e9-27b7-223697078744@gmail.com> On 04/05/18 22:28, Remi Forax wrote: > the way to detect it is to use the DAG of the supertypes (lazily constructed*), from the last to the first case, the idea is to propagate the index of down to the super types, if during the propagation, you find a supertype which is also a case and with an index lower that the currently propagated, then it's a failure. > > R?mi > > * you do not have to actually create the DAG, just be able to traverse it from the subtype to the supertypes. Yes, this idea is similar to mine. We just have to find a conflict between subtype relationships and compile time order of cases which could be viewed as forming implicit pair-by-pair relationships of consecutive cases. If there's a cycle, we have a conflict. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Thu Apr 5 23:49:59 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 6 Apr 2018 01:49:59 +0200 (CEST) Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> Message-ID: <1078249793.557744.1522972199554.JavaMail.zimbra@u-pem.fr> I've implemented a first version https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/TypeSwitch.java https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/TypeSwitchCallSite.java (i've changed the convention to be null -> -1, unknown -> -2 because it's easier to do the nullcheck upfront instead at the end) and i've written a small JMH benchmark https://github.com/forax/exotic/blob/master/src/test/java/com.github.forax.exotic/com/github/forax/exotic/perf/TypeSwitchBenchMark.java that compare the type-switch with a cascade of if ... else. I've found (on my laptop, so it's may be not true on a server) that the speed depends on - the number of cases - the number of different classes a switch can see at runtime. The current implementation is independent on the number of cases, it uses an inlining cache of 'if getClass', which is great if there are few classes at runtime and change itself to use a ClassValue if there are too many classes at runtime. Benchmark Mode Cnt Score Error Units TypeSwitchBenchMark.long_instanceof_cascade avgt 15 358.876 ? 2.868 ns/op TypeSwitchBenchMark.long_type_switch avgt 15 49.870 ? 0.702 ns/op TypeSwitchBenchMark.short_instanceof_cascade avgt 15 7.016 ? 0.017 ns/op TypeSwitchBenchMark.short_type_switch avgt 15 5.978 ? 0.054 ns/op I think the current implementation is not enough because the cost of using a ClassValue is quite high so if there are few cases and quite a lot of different classes at runtime, the implementation should switch to use a cascade of instanceof instead of using a ClassValue. What should be implemented in my opinion is something like that: number of classes seen at runtime small | big | small if getClass | if instanceof number of cases --------------------------------------------- big if getClass | ClassValue.get | And once we have an implementation a little more realistic, we can implement the verification (see my previous mail) to see its impact. cheers, R?mi ----- Mail original ----- > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Mardi 3 Avril 2018 18:36:43 > Objet: Compile-time type hierarchy information in pattern switch > Along the lines of the previous discussion about separate compilation > skew with enums ... I'm trying to find the right place to draw the line > with respect to post-compilation class hierarchy changes. > > Recall that we can impose a _dominance ordering_ on patterns; pattern P > dominates Q if everything that is matched by Q also is matched by P. > We already use this today, in catch blocks, to reject programs with dead > code; you can't say `catch Exception` before `catch IOException`, > because the latter block would be dead. We want to do the same with > patterns, so: > > case String x: ... > case Object x: ... > > is OK but > > case Object x: ... > case String x: ... > > is rejected at compile time. > > Separately, we'd like for pattern matching to be efficient; the > definition of "inefficient" would be for pattern matching to be > inherently O(n), when we can frequently do much better. There's plenty > of literature on compiling patterns to decision trees, but none of them > address the problem we have to: separate compilation. So any decision > tree computed at compile time might be wrong in undesirable ways by > runtime. We could also compute a decision tree at runtime using indy; > while this is our intent, the devil is in the details. We don't want > computing the tree to be too expensive, nor do we want to have to > capture O(n^2) compile-time constraints to be validated at runtime. So > I'd like to focus on what changes we're willing to accept between > compilation and runtime, what our expectations would be for those changes. > > We've already discussed one of these: novel values in enum / sealed type > switches, and for them, the answer is throwing some sort of exception. > Another that we dealt with long ago is changing enum ordinals; we > decided at the time that we're willing for this to be a BC change, so we > generate extra code that uses the as-runtime ordinals rather than the > as-compile-time ordinals when lowering the switch into an integer > switch. (If we weren't willing to tolerate such changes, we'd have a > simpler translation: just lower an enum switch to a switch on its > ordinal.) > > Here's one that I suspect we're not expecting to recover terribly well > from: hierarchy inversion. Suppose at compile time A <: B. So the > following is a sensible switch body: > > case String: println("String"); break; > case Object: println("Object"); break; > > Now, imagine that by runtime, String no longer extends Object, but > instead Object absurdly extends String. Do we still expect the above to > print String for all Strings, and Object for everything else? Or is the > latter arm now dead at runtime, even though it wouldn't compile after > the change? Or is this now UB, because it would no longer compile? > > A more realistic example of a hierarchy change is introducing an > interface. If we have: > > interface I { } > class C { } > > and a switch > > case I: ... > case C: ... > > and later, we make C implement I, we have a similar situation; the > switch would no longer compile. Are we allowed to make optimizations > based on the compile-time knowledge that C > As an example, suppose A, B, C, ... Z are final classes, and I is an > interface implemented by none of them. Then I can dispatch: > > case A: ... > case B: ... > ... > case I: ... > ... > case Z: ... > case Object: ... > > in two type operations; hash the class of the target and look it up in a > table containing A...Z, and then do a test against I. However, if I'm > required to deal with the case where some of A..Z are retrofitted to > implement I after compile time, and I'm expected to process the switch > in order based on how it is written, then I have to fall back to O(1) > type operations at runtime, or, I have to do as many as O(n^2) type > comparisons at link time. These are steep cliffs to fall off of. > (Mandating throwing an exception at link time is also expensive.) > > Today, all switch cases are totally unordered, so we're free to execute > them in O(1) time. I'd like for that to continue to be the case, even > as we add more complex switches. > > So, let's have a conversation about expectations for what we should do > for a switch at runtime that would no longer compile due to > post-compilation hierarchy changes (new supertypes, hierarchy > inversions, removed supertypes, final <--> nonfinal, etc.) From peter.levart at gmail.com Fri Apr 6 09:01:23 2018 From: peter.levart at gmail.com (Peter Levart) Date: Fri, 6 Apr 2018 11:01:23 +0200 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <857ce394-f249-c1e9-27b7-223697078744@gmail.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> <32a49502-952d-bb60-5572-b356794530e5@oracle.com> <712022705.545287.1522960099581.JavaMail.zimbra@u-pem.fr> <857ce394-f249-c1e9-27b7-223697078744@gmail.com> Message-ID: <9367cb97-12b0-b6ac-ca18-fece90e027b5@gmail.com> On 04/05/2018 11:20 PM, Peter Levart wrote: > > > On 04/05/18 22:28, Remi Forax wrote: >> the way to detect it is to use the DAG of the supertypes (lazily constructed*), from the last to the first case, the idea is to propagate the index of down to the super types, if during the propagation, you find a supertype which is also a case and with an index lower that the currently propagated, then it's a failure. >> >> R?mi >> >> * you do not have to actually create the DAG, just be able to traverse it from the subtype to the supertypes. > > Yes, this idea is similar to mine. We just have to find a conflict > between subtype relationships and compile time order of cases which > could be viewed as forming implicit pair-by-pair relationships of > consecutive cases. If there's a cycle, we have a conflict. > And Remi's algorithm is of course the best implementation of this search. Here's a variant that does not need an index, just a set of types: start with an empty set S for each case type T from the last case up to the first: ??? if S contains T: ??? ??? bail out with error ??? add T and all its supertypes to S The time complexity of this algorithm is O(n). It takes at most n * k lookups into a (hash)set where k is an average number of supertypes of a case type. Usually, when case types share common supertypes not far-away, the algorithm can prune branches in type hierarchy already visited. Implementation-wise, if the algorithm uses a HashMap, mapping visited type to case type it was visited from (back to Remi's index of case), it can also produce a meaningful diagnostic message, mentioning precisely which two cases are in wrong order according to type hierarchy: ??? Class[] caseTypes = ...; ??? TypeVisitor visitor = new TypeVisitor(); ??? for (int i = caseTypes.length - 1; i >= 0; i++) { ??? ??? visitor.visitType(caseTypes[i]); ??? } ??? class TypeVisitor extends HashMap, Class> { ??????? void visitType(Class caseType) { ??????????? Class conflictingCaseType = putIfAbsent(caseType, caseType); ??????????? if (conflictingCaseType != null) { ??????????????? throw new IllegalStateException( ??????????????????? "Case " + conflictingCaseType.getName() + ??????????????????? " matches a subtype of what case " + caseType.getName() + ??????????????????? " matches but is located after it"); ??????????? } ??????????? visitSupertypes(caseType, caseType); ??????? } ??????? private void visitSupertypes(Class type, Class caseType) { ??????????? Class superclass = type.getSuperclass(); ??????????? if (superclass != null && putIfAbsent(superclass, caseType) == null) { ??????????????? visitSupertypes(superclass, caseType); ??????????? } ??????????? for (Class superinterface : type.getInterfaces()) { ??????????????? if (putIfAbsent(superinterface, caseType) == null) { ??????????????????? visitSupertypes(superinterface, caseType); ??????????????? } ??????????? } ??????? } ??? } Regards, Peter From brian.goetz at oracle.com Fri Apr 6 12:48:50 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 6 Apr 2018 08:48:50 -0400 Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <9367cb97-12b0-b6ac-ca18-fece90e027b5@gmail.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <20180404170105.5486a1c7@copperhead.int.arc7.info> <0e38a01f-b9bb-7539-5e0e-1df02f33d69f@oracle.com> <1510056942.420626.1522941712175.JavaMail.zimbra@u-pem.fr> <9bbda7e1-62be-e737-e277-c437da3c241b@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> <32a49502-952d-bb60-5572-b356794530e5@oracle.com> <712022705.545287.1522960099581.JavaMail.zimbra@u-pem.fr> <857ce394-f249-c1e9-27b7-223697078744@gmail.com> <9367cb97-12b0-b6ac-ca18-fece90e027b5@gmail.com> Message-ID: <9f103cbc-5750-c58b-6d29-33c14ca7b45d@oracle.com> This may be O(n), but its not really something I want to do when linking a call site... On 4/6/2018 5:01 AM, Peter Levart wrote: > > > On 04/05/2018 11:20 PM, Peter Levart wrote: >> >> >> On 04/05/18 22:28, Remi Forax wrote: >>> the way to detect it is to use the DAG of the supertypes (lazily >>> constructed*), from the last to the first case, the idea is to >>> propagate the index of down to the super types, if during the >>> propagation, you find a supertype which is also a case and with an >>> index lower that the currently propagated, then it's a failure. >>> >>> R?mi >>> >>> * you do not have to actually create the DAG, just be able to >>> traverse it from the subtype to the supertypes. >> >> Yes, this idea is similar to mine. We just have to find a conflict >> between subtype relationships and compile time order of cases which >> could be viewed as forming implicit pair-by-pair relationships of >> consecutive cases. If there's a cycle, we have a conflict. >> > > And Remi's algorithm is of course the best implementation of this > search. Here's a variant that does not need an index, just a set of > types: > > start with an empty set S > for each case type T from the last case up to the first: > ??? if S contains T: > ??? ??? bail out with error > ??? add T and all its supertypes to S > > The time complexity of this algorithm is O(n). It takes at most n * k > lookups into a (hash)set where k is an average number of supertypes of > a case type. Usually, when case types share common supertypes not > far-away, the algorithm can prune branches in type hierarchy already > visited. Implementation-wise, if the algorithm uses a HashMap, mapping > visited type to case type it was visited from (back to Remi's index of > case), it can also produce a meaningful diagnostic message, mentioning > precisely which two cases are in wrong order according to type hierarchy: > > ??? Class[] caseTypes = ...; > ??? TypeVisitor visitor = new TypeVisitor(); > ??? for (int i = caseTypes.length - 1; i >= 0; i++) { > ??? ??? visitor.visitType(caseTypes[i]); > ??? } > > ??? class TypeVisitor extends HashMap, Class> { > > ??????? void visitType(Class caseType) { > ??????????? Class conflictingCaseType = putIfAbsent(caseType, > caseType); > ??????????? if (conflictingCaseType != null) { > ??????????????? throw new IllegalStateException( > ??????????????????? "Case " + conflictingCaseType.getName() + > ??????????????????? " matches a subtype of what case " + > caseType.getName() + > ??????????????????? " matches but is located after it"); > ??????????? } > ??????????? visitSupertypes(caseType, caseType); > ??????? } > > ??????? private void visitSupertypes(Class type, Class caseType) { > ??????????? Class superclass = type.getSuperclass(); > ??????????? if (superclass != null && putIfAbsent(superclass, > caseType) == null) { > ??????????????? visitSupertypes(superclass, caseType); > ??????????? } > ??????????? for (Class superinterface : type.getInterfaces()) { > ??????????????? if (putIfAbsent(superinterface, caseType) == null) { > ??????????????????? visitSupertypes(superinterface, caseType); > ??????????????? } > ??????????? } > ??????? } > ??? } > > > Regards, Peter > From forax at univ-mlv.fr Fri Apr 6 13:10:20 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 6 Apr 2018 15:10:20 +0200 (CEST) Subject: Compile-time type hierarchy information in pattern switch In-Reply-To: <9f103cbc-5750-c58b-6d29-33c14ca7b45d@oracle.com> References: <2a815079-881a-4a79-592e-7f86a90cae88@oracle.com> <11186599.458653.1522942859001.JavaMail.zimbra@u-pem.fr> <32a49502-952d-bb60-5572-b356794530e5@oracle.com> <712022705.545287.1522960099581.JavaMail.zimbra@u-pem.fr> <857ce394-f249-c1e9-27b7-223697078744@gmail.com> <9367cb97-12b0-b6ac-ca18-fece90e027b5@gmail.com> <9f103cbc-5750-c58b-6d29-33c14ca7b45d@oracle.com> Message-ID: <1102362626.806180.1523020220830.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Peter Levart" , "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Vendredi 6 Avril 2018 14:48:50 > Objet: Re: Compile-time type hierarchy information in pattern switch > This may be O(n), but its not really something I want to do when linking > a call site... I agree :) Anyway, i've implemented the algorithm of Peter, fix a typo (i++ instead of i--) and add the fact that conceptually the supertype of an interface is java.lang.Object https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/TypeSwitch.java#L96 I've also implemented a strategy that use if instanceof but it doesn't perform well, i suppose it's because with a real instanceof the VM gather a profile while with Class.isInstance(), it does not. It can be fixed by adding a special method handle combiner to java.lang.invoke.MethodHandles. That's said i may be wrong, it's perhaps something else. TypeSwitchBenchMark.big_big_instanceof_cascade avgt 15 367.409 ? 1.844 ns/op TypeSwitchBenchMark.big_big_type_switch avgt 15 52.238 ? 0.455 ns/op TypeSwitchBenchMark.small_big_instanceof_cascade avgt 15 34.940 ? 0.287 ns/op TypeSwitchBenchMark.small_big_type_switch avgt 15 52.204 ? 0.319 ns/op TypeSwitchBenchMark.small_small_instanceof_cascade avgt 15 7.320 ? 0.100 ns/op TypeSwitchBenchMark.small_small_type_switch avgt 15 6.122 ? 0.027 ns/op The first big/small is for the number of cases, the second is for the number of classes seen at runtime, so small_big_type_switch means a small number of cases with a lot of runtime classes. To summarize, if the number of classes seen at runtime is small, the type_switch wins (it uses if getClass), if there are a lot of cases, the type_switch wins (it uses ClassValue.get()) but if there are few cases and a lot of classes, the type_switch is behind :( R?mi > > On 4/6/2018 5:01 AM, Peter Levart wrote: >> >> >> On 04/05/2018 11:20 PM, Peter Levart wrote: >>> >>> >>> On 04/05/18 22:28, Remi Forax wrote: >>>> the way to detect it is to use the DAG of the supertypes (lazily >>>> constructed*), from the last to the first case, the idea is to >>>> propagate the index of down to the super types, if during the >>>> propagation, you find a supertype which is also a case and with an >>>> index lower that the currently propagated, then it's a failure. >>>> >>>> R?mi >>>> >>>> * you do not have to actually create the DAG, just be able to >>>> traverse it from the subtype to the supertypes. >>> >>> Yes, this idea is similar to mine. We just have to find a conflict >>> between subtype relationships and compile time order of cases which >>> could be viewed as forming implicit pair-by-pair relationships of >>> consecutive cases. If there's a cycle, we have a conflict. >>> >> >> And Remi's algorithm is of course the best implementation of this >> search. Here's a variant that does not need an index, just a set of >> types: >> >> start with an empty set S >> for each case type T from the last case up to the first: >> ??? if S contains T: >> ??? ??? bail out with error >> ??? add T and all its supertypes to S >> >> The time complexity of this algorithm is O(n). It takes at most n * k >> lookups into a (hash)set where k is an average number of supertypes of >> a case type. Usually, when case types share common supertypes not >> far-away, the algorithm can prune branches in type hierarchy already >> visited. Implementation-wise, if the algorithm uses a HashMap, mapping >> visited type to case type it was visited from (back to Remi's index of >> case), it can also produce a meaningful diagnostic message, mentioning >> precisely which two cases are in wrong order according to type hierarchy: >> >> ??? Class[] caseTypes = ...; >> ??? TypeVisitor visitor = new TypeVisitor(); >> ??? for (int i = caseTypes.length - 1; i >= 0; i++) { >> ??? ??? visitor.visitType(caseTypes[i]); >> ??? } >> >> ??? class TypeVisitor extends HashMap, Class> { >> >> ??????? void visitType(Class caseType) { >> ??????????? Class conflictingCaseType = putIfAbsent(caseType, >> caseType); >> ??????????? if (conflictingCaseType != null) { >> ??????????????? throw new IllegalStateException( >> ??????????????????? "Case " + conflictingCaseType.getName() + >> ??????????????????? " matches a subtype of what case " + >> caseType.getName() + >> ??????????????????? " matches but is located after it"); >> ??????????? } >> ??????????? visitSupertypes(caseType, caseType); >> ??????? } >> >> ??????? private void visitSupertypes(Class type, Class caseType) { >> ??????????? Class superclass = type.getSuperclass(); >> ??????????? if (superclass != null && putIfAbsent(superclass, >> caseType) == null) { >> ??????????????? visitSupertypes(superclass, caseType); >> ??????????? } >> ??????????? for (Class superinterface : type.getInterfaces()) { >> ??????????????? if (putIfAbsent(superinterface, caseType) == null) { >> ??????????????????? visitSupertypes(superinterface, caseType); >> ??????????????? } >> ??????????? } >> ??????? } >> ??? } >> >> >> Regards, Peter From brian.goetz at oracle.com Fri Apr 6 15:51:49 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 6 Apr 2018 11:51:49 -0400 Subject: Switch translation Message-ID: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> The following outlines our story for translating improved switches, including both the switch improvements coming as part of JEP 325, and follow-on work to add pattern matching to switches.? Much of this has been discussed already over the last year, but here it is in one place. # Switch Translation #### Maurizio Cimadamore and Brian Goetz #### April 2018 ## Part 1 -- constant switches This part examines the current translation of `switch` constructs by `javac`, and proposes a more general translation for switching on primitives, boxes, strings, and enums, with the goals of: ?- Unify the treatment of `switch` variants, simplifying the compiler implementation and reducing the static footprint of generated code; ?- Move responsibility for target classification from compile time to run time, allowing us to more freely update the logic without updating the compiler. ## Current translation Switches on `int` (and the smaller integer primitives) are translated in one of two ways.? If the labels are relatively dense, we translate an `int` switch to a `tableswitch`; if they are sparse, we translate to a `lookupswitch`.? The current heuristic appears to be that we use a `tableswitch` if it results in a smaller bytecode than a `lookupswitch` (which uses twice as many bytes per entry), which is a reasonable heuristic. #### Switches on boxes Switches on primitive boxes are currently implemented as if they were primitive switches, unconditionally unboxing the target before entry (possibly throwing NPE). #### Switches on strings Switches on strings are implemented as a two-step process, exploiting the fact that strings cache their `hashCode()` and that hash codes are reasonably spread out. Given a switch on strings like the one below: ??? switch (s) { ??????? case "Hello": ... ? ?? ?? case "World": ... ??????? default: ... ??? } The compiler desugar this into two separate switches, where the first switch maps the input strings into a range of numbers [0..1], as shown below, which can then be used in a subsequent plain switch on ints.? The generated code unconditionally calls `hashCode()`, again possibly throwing NPE. ??? int index=-1; ??? switch (s.hashCode()) { ? ?? ?? case 12345: if (!s.equals("Hello")) break; index = 1; break; ? ?? ?? case 6789: if (!s.equals("World")) break; index = 0; break; ? ?? ?? default: index = -1; ??? } ??? switch (index) { ? ?? ?? case 0: ... ??????? case 1: ... ? ?? ?? default: ... ??? } If there are hash collisions between the strings, the first switch must try all possible matching strings. #### Switches on enums Switches on `enum` constants exploit the fact that enums have (usually dense) integral ordinal values.? Unfortunately, because an ordinal value can change between compilation time and runtime, we cannot rely on this mapping directly, but instead need to do an extra layer of mapping.? Given a switch like: ??? switch(color) { ??????? case RED: ... ??????? case GREEN: ... ??? } The compiler numbers the cases starting a 1 (as with string switch), and creates a synthetic class that maps the runtime values of the enum ordinals to the statically numbered cases: ??? class Outer$0 { ??????? synthetic final int[] $EnumMap$Color = new int[Color.values().length]; ? ?? ?? static { ??? ? ? ? ? try { $EnumMap$Color[RED.ordinal()] = 1; } catch (NoSuchFieldError ex) {} ? ?? ? ?? ? try { $EnumMap$Color[GREEN.ordinal()] = 2; } catch (NoSuchFieldError ex) {} ??????? } ??? } Then, the switch is translated as follows: ??? switch(Outer$0.$EnumMap$Color[color.ordinal()]) { ??????? case 1: stmt1; ? ?? ?? case 2: stmt2 ??? } In other words, we construct an array whose size is the cardinality of the enum, and then the element at position *i* of such array will contain the case index corresponding to the enum constant with whose ordinal is *i*. ## A more general scheme The handling of strings and enums give us a hint of how to create a more regular scheme; for `switch` targets more complex than `int`, we lower the `switch` to an `int` switch with consecutive `case` labels, and use a separate process to map the target into the range of synthetic case labels. Now that we have `invokedynamic` in our toolbox, we can reduce all of the non-`int` cases to a single form, where we number the cases with consecutive integers, and perform case selection via an `invokedynamic`-based classifier function, whose static argument list receives a description of the actual targets, and which returns an `int` identifying what `case` to select. This approach has several advantages: ?- Reduced compiler complexity -- all switches follow a common pattern; ?- Reduced static code size; ?- The classification function can select from a wide range of strategies (linear search, binary search, building a `HashMap`, constructing a perfect hash function, etc), which can vary over time or from situation to situation; ?- We are free to improve the strategy or select an alternate strategy (say, to optimize for startup time) without having to recompile the code; ?- Hopefully at least, if not more, JIT-friendly than the existing translation. We can also use this approach in preference to `lookupswitch` for non-dense `int` switches, as well as use it to extend `switch` to handle `long`, `float`, and `double` targets (which were surely excluded in part because the JVM didn't provide a convenient translation target for these types.) #### Bootstrap design When designing the `invokedynamic` bootstraps to support this translation, we face the classic lumping-vs-splitting decision. For now, we'll bias towards splitting.? In the following example, `BOOTSTRAP_PREAMBLE` indicates the usual leading arguments for an indy bootstrap.? We assume the compiler has numbered the case values densely from 0..N, and the bootstrap will return [0,n) for success, or N for "no match". A strawman design might be: ??? // Numeric switches for P, accepts invocation as P -> I or Box(P) -> I ??? CallSite intSwitch(BOOTSTRAP_PREAMBLE, int... caseValues) ??? // Switch for String, invocation descriptor is String -> I ??? CallSite stringSwitch(BOOTSTRAP_PREAMBLE, String... caseValues) ??? // Switch for Enum, invocation descriptor is E -> I ??? CallSite enumSwitch(BOOTSTRAP_PREAMBLE, Class>> clazz, ??????????????????????? String... caseNames) It might be possible to encode all of these into a single bootstrap, but given that the compiler already treats each type slightly differently, it seems there is little value in this sort of lumping for non-pattern switches. The `enumSwitch` bootstrap as proposed uses `String` values to describe the enum constants, rather than encoding the enum constants directly via condy.? This allows us to be more robust to enums disappearing after compilation. This strategy is also dependent on having broken the limitation on 253 bootstrap arguments in indy/condy. #### Extending to other primitive types This approach extends naturally to other primitive types (long, double, float), by the addition of some more bootstraps (which need to deal with the additional complexities of infinity, NaN, etc): ??? CallSite longSwitch(BOOTSTRAP_PREAMBLE, long... caseValues) ??? CallSite floatSwitch(BOOTSTRAP_PREAMBLE, float... caseValues) ??? CallSite doubleSwitch(BOOTSTRAP_PREAMBLE, double... caseValues) #### Extending to null The scheme as proposed above does not explicitly handle nulls, which is a feature we'd like to have in `switch`.? There are a few ways we could add null handling into the API: ?- Split entry points into null-friendly or null-hostile switches; ?- Find a way to encode nulls in the array of case values (which can be done with condy); ?- Always treat null as a possible input and a distinguished output, and have the compiler ensure the switch can handle this distinguished output. The last strategy is appealing and straightforward; assign a sentinel value (-1) to `null`, and always return this sentinel when the input is null.? The compiler ensures that some case handles `null`, and if no case handles `null` then it inserts an implicit ??? case -1: throw new NullPointerException(); into the generated code. #### General example If we have a string switch: ??? switch (x) { ??????? case "Foo": m(); break; ??????? case "Bar": n(); // fall through ??????? case "Baz": r(); break; ??????? default: p(); ??? } we translate into: ??? int t = indy[bsm=stringSwitch["Foo", "Bar", "Baz"]](x) ??? switch (t) { ??????? case -1: throw new NullPointerException();? // implicit null case ??????? case 0: m(); break; ??????? case 1: n(); // fall through ??????? case 2: r(); break; ??????? case 3: p();??????????????????????????????? // default case ??? } All switches, with the exception of `int` switches (and maybe not even non-dense `int` switches), follow this exact pattern.? If the target type is not a reference type, the `null` case is not needed. This strategy is implemented in the `switch` branch of the amber repository; see `java.lang.runtime.SwitchBootstraps` in that branch for (rough!) implementations of the bootstraps. ## Patterns in narrow-target switches When we add patterns, we may encounter switches whose targets are tightly typed (e.g., `String` or `int`) but still use some patterns in their expression.? For switches whose target type is a primitive, primitive box, `String`, or `enum`, we'd like to use the optimized translation strategy outlined here, but the following kinds of patterns might still show up in a switch on, say, `Integer`: ??? case var x: ??? case _: ??? case Integer x: ??? case Integer(var x): The first three can be translated away by the source compiler, as they are semantically equivalent to `default`.? If any nontrivial patterns are present (including deconstruction patterns), we may need to translate as a pattern switch scheme -- see Part 2. (While the language may not distinguish between "legacy" and "pattern" switches -- in that all switches are pattern switches -- we'd like to avoid giving up obvious optimizations if we can.) # Part 2 -- type test patterns and guards A key motivation for reexamining switch translation is the impending arrival of patterns in switch.? We expect switch translation for the pattern case to follow a similar structure -- lower to an `int` switch and use an indy-based classifier to select an index.? However, there are a few additional complexities.? One is that pattern cases may have guards, which means we need to be able to re-enter the bootstrap with an indication to "continue matching from case N", in the event of a failed guard. (Even if the language doesn't support guards directly, the obvious implementation strategy for nested patterns is to desugar them into guards.) Translating pattern switches is more complicated because there are more options for how to divide the work between the statically generated code and the switch classifier, and different choices have different performance side-effects (are binding variables "boxed" into a tuple to be returned, or do they need to be redundantly calculated). ## Type-test patterns Type-test patterns are notable because their applicability predicate is purely based on the type system, meaning that the compiler can directly reason about it both statically (using flow analysis, optimizing away dynamic type tests) and dynamically (with `instanceof`.)? A switch involving type-tests: ??? switch (x) { ??????? case String s: ... ??????? case Integer i: ... ??????? case Long l: ... ??? } can (among other strategies) be translated into a chain of `if-else` using `instanceof` and casts: ??? if (x instanceof String) { String s = (String) x; ... } ??? else if (x instanceof Integer) { Integer i = (Integer) x; ... } ??? else if (x instanceof Long) { Long l = (Long) x; ... } #### Guards The `if-else` desugaring can also naturally handle guards: ??? switch (x) { ??????? case String s ??????????? where (s.length() > 0): ... ??????? case Integer i ??????????? where (i > 0): ... ??????? case Long l ??????????? where (l > 0L): ... ??? } can be translated to: ??? if (x instanceof String ??????? && ((String) x).length() > 0) { String s = (String) x; ... } ??? else if (x instanceof Integer ???????????? && ((Integer) x) > 0) { Integer i = (Integer) x; ... } ??? else if (x instanceof Long ???????????? && ((Long) x) > 0L) { Long l = (Long) x; ... } #### Performance concerns The translation to `if-else` chains is simple (for switches without fallthrough), but is harder for the VM to optimize, because we've used a more general control flow mechanism.? If the target is an empty `String`, which means we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis could tell us that it can't possibly be an `Integer` or a `Long`, and so there's no need to perform those tests. But generating code that takes advantage of this information is more complex. In the extreme case, where a switch consists entirely of type test patterns for final classes, this could be performed as an O(1) operation by hashing.? And this is a common case involving switches over alternatives in a sum (sealed) type. (We shouldn't rely on finality at compile time, as this can change between compile and run time, but we should take advantage of this at run time if we can.) Finally, the straightforward static translation may miss opportunities for optimization.? For example: ??? switch (x) { ??????? case Point p ??????????? where p.x > 0 && p.y > 0: A ??????? case Point p ??????????? where p.x > 0 && p.y == 0: B ??? } Here, not only would we potentially test the target twice to see if it is a `Point`, but we then further extract the `x` component twice and perform the `p.x > 0` test twice. #### Optimization opportunities The compiler can eliminate some redundant calculations through straightforward techniques.? The previous switch can be transformed to: ??? switch (x) { ??????? case Point p: ??????????? if (((Point) p).x > 0 && ((Point) p).y > 0) { A } ??????????? else if (((Point) p).x > 0 && ((Point) p).y > 0) { B } to eliminate the redundant `instanceof` (and admits further CSE optimizations.) #### Clause reordering The above example was easy to transform because the two `case Point` clauses were adjacent.? But what if they are not?? In some cases, it is safe to reorder them.? For types `T` and `U`, it is safe to reorder `case T` and `case U` if the two types have no intersection; that there can be no types that are subtypes of them both.? This is true when `T` and `U` are classes and neither extends the other, or when one is a final class and the other is an interface that the class does not implement. The compiler could then reorder case clauses so that all the ones whose first test is `case Point` are adjacent, and then coalesce them all into a single arm of the `if-else` chain. A possible spoiler here is fallthrough; if case A falls into case B, then cases A and B have to be moved as a group.? (This is another reason to consider limiting fallthrough.) A bigger possible spoiler here is separate compilation.? If at compile time, we see that `T` and `U` are disjoint types, do we want to bake that assumption into the compilation, or do we have to re-check that assumption at runtime? #### Summary of if-else translation While the if-else translation at first looks pretty bad, we are able to extract a fair amount of redundancy through well-understood compiler transformations.? If an N-way switch has only M distinct types in it, in most cases we can reduce the cost from _O(N)_ to _O(M)_.? Sometimes _M == N_, so this doesn't help, but sometimes _M << N_ (and sometimes `N` is small, in which case _O(N)_ is fine.) Reordering clauses involves some risk; specifically, that the class hierarchy will change between compile and run time.? It seems eminently safe to reorder `String` and `Integer`, but more questionable to reorder an arbitrary class `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because it might easily be changed to do so later.? Ideally we'd like to perform class-hierarchy optimizations using the runtime hierarchy, not the compile-time hierarchy. ## Type classifiers The technique outlined in _Part 1_, where we lower the complex switch to a dense `int` switch, and use an indy-based classifier to select an index, is applicable here as well. First let's consider a switch consisting only of unguarded type-test patterns, optionally with a default clause. We'll start with an `indy` bootstrap whose static argument are `Class` constants corresponding to each arm of the switch, whose dynamic argument is the switch target, and whose return value is a case number (or distinguished sentinels for "no match" and `null`.)? We can easily implement such a bootstrap with a linear search, but can also do better; if some subset of the classes are `final`, we can choose between these more quickly (such as via binary search on `hashCode()`, hash function, or hash table), and we need perform only a single operation to test all of those at once. Dynamic techniques (such as a building a hash map of previously seen target types), which `indy` is well-suited to, can asymptotically approach _O(1)_ even when the classes involved are not final. So we can lower: ??? switch (x) { ??????? case T t: A ??????? case U u: B ??????? case V v: C ??? } to ??? int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) ??? switch (y) { ??????? case 0: A ??????? case 1: B ??????? case 2: C ??? } This has the advantages that the generated code is very similar to the source code, we can (in some cases) get _O(1)_ dispatch performance, and we can handle fallthrough with no additional complexity. #### Guards There are two approaches we could take to add support for guards into the process; we could try to teach the bootstrap about guards (and would have to pass locals that appear in guard expressions as additional arguments to the classifier), or we could leave guards to the generated bytecode.? The latter seems far more attractive, but requires some tweaks to the bootstrap arguments and to the shape of the generated code. If the classifier says "you have matched case #3", but then we fail the guard for #3, we want to go back into the classifier and start again at #4.? (Sometimes the classifier can also use this information ("start over at #4") to optimize away unnecessary tests.) We add a second argument (where to start) to the classifier invocation signature, and wrap the switch in a loop, lowering: ??? switch (target) { ??????? case T t where (e1): A ??????? case T t where (e2): B ??????? case U u where (e3): C ??? } into ??? int index = -1; // start at the top ??? while (true) { ??????? index = indy[...](target, index) ? ? ??? switch (index) { ?? ? ?????? case 0: if (!e1) continue; A ? ?? ?????? case 1: if (!e2) continue; B ? ?? ?????? case 2: if (!e3) continue; C ??????????? default: break; ??????? } ??????? break; ??? } For cases where the same type test is repeated in consecutive positions (at N and N+1), we can have the static compiler coalesce them as above, or we could have the bootstrap maintain a table so that if you re-enter the bootstrap where the previous answer was N, then it can immediately return N+1.? Similarly, if N and N+1 are known to be mutually exclusive types (like `String` and `Integer`), on reentering the classifier with N, we can skip right to N+2 since if we matched `String`, we cannot match `Integer`. Lookup tables for such optimizations can be built at callsite linkage time. #### Mixing constants and type tests This approach also extends to tests that are a mix of constant patterns and type-test patterns, such as: ??? switch (x) { ??????? case "Foo": ... ??????? case 0L: ... ??????? case Integer i: ??? } We can extend the bootstrap protocol to accept constants as well as types, and it is a straightforward optimization to combine both type matching and constant matching in a single pass. ## Nested patterns Nested patterns are essentially guards; even if we don't expose guards in the language, we can desugar ??? case Point(0, var x): into the equivalent of ??? case Point(var a, var x) && a matches 0: using the same translation story as above -- use the classifier to select a candidate case arm based on the top-type of the pattern, and then do additional checks in the generated bytecode, and if the checks fail, continue and re-enter the classifier starting at the next case. #### Explicit continue An alternative to exposing guards is to expose an explicit `continue` statement in switch, which would have the effect of "keep matching at the next case."? Then guards could be expressed imperatively as: ??? case P: ??????? if (!guard) ??????????? continue; ??????? ... ??????? break; ??? case Q: ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From guy.steele at oracle.com Fri Apr 6 16:45:52 2018 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 6 Apr 2018 12:45:52 -0400 Subject: Switch translation In-Reply-To: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> References: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> Message-ID: <6FDE8CBD-14BA-4A18-9030-877E9C664194@oracle.com> Very comprehensive. Four groups of comments below (one at the very bottom). > On Apr 6, 2018, at 11:51 AM, Brian Goetz wrote: > > The following outlines our story for translating improved switches, including both the switch improvements coming as part of JEP 325, and follow-on work to add pattern matching to switches. Much of this has been discussed already over the last year, but here it is in one place. > > # Switch Translation > #### Maurizio Cimadamore and Brian Goetz > #### April 2018 > > ## Part 1 -- constant switches > > This part examines the current translation of `switch` constructs by `javac`, and proposes a more general translation for switching on primitives, boxes, strings, and enums, with the goals of: > > - Unify the treatment of `switch` variants, simplifying the compiler implementation and reducing the static footprint of generated code; > - Move responsibility for target classification from compile time to run time, allowing us to more freely update the logic without updating the compiler. > > ## Current translation > > Switches on `int` (and the smaller integer primitives) are translated in one of two ways. If the labels are relatively dense, we translate an `int` switch to a `tableswitch`; if they are sparse, we translate to a `lookupswitch`. The current heuristic appears to be that we use a `tableswitch` if it results in a smaller bytecode than a `lookupswitch` (which uses twice as many bytes per entry), which is a reasonable heuristic. > > #### Switches on boxes > > Switches on primitive boxes are currently implemented as if they were primitive switches, unconditionally unboxing the target before entry (possibly throwing NPE). > > #### Switches on strings > > Switches on strings are implemented as a two-step process, exploiting the fact that strings cache their `hashCode()` and that hash codes are reasonably spread out. Given a switch on strings like the one below: > > switch (s) { > case "Hello": ... > case "World": ... > default: ... > } > > The compiler desugar this into two separate switches, where the first switch maps the input strings into a range of numbers [0..1], as shown below, which can then be used in a subsequent plain switch on ints. The generated code unconditionally calls `hashCode()`, again possibly throwing NPE. > > int index=-1; > switch (s.hashCode()) { > case 12345: if (!s.equals("Hello")) break; index = 1; break; > case 6789: if (!s.equals("World")) break; index = 0; break; > default: index = -1; > } > switch (index) { > case 0: ... > case 1: ... > default: ... > } > > If there are hash collisions between the strings, the first switch must try all possible matching strings. Minor point: unclear why the default case has to assign -1 to index, when it it already initialized to -1. I see why you use this structure, because it fits a general paradigm of first mapping to an integer. However, a post-optimization might be able to turn such a structure, where the assignments to ?index? are all constants rather than the results of calling some opaque classifier method, into a control structure with a single switch statement and no use of the intermediate integer encoding. I?ll show a more general example to give you the idea: switch (s) { case "Hello": stmts1; case "World": stmts2; case ?Goodbye": stmts3; default: stmtsD; } and suppose that ?Hello? and ?Goodbye? happen to have the same hashcode. It might be transformed into: int index=-1; switch (s.hashCode()) { case 12345: if (s.equals("Hello")) index = 0; else if (s.equals(?Goodbye")) index = 2; break; case 6789: if (s.equals("World")) index = 1; break; default: break; } switch (index) { case 0: stmts1; case 1: stmts2; case 2: stmts3; default: stmtsD; } I now suggest that a post-optimization might then turn this into: SUCCESS: { DEFAULT: { switch (s.hashCode()) { case 12345: if (s.equals("Hello")) { stmts1; break SUCCESS; } else if (s.equals(?Goodbye")) { stmts3; break SUCCESS; } else break DEFAULT; case 6789: if (s.equals("World")) { stmts2; break SUCCESS; } else break DEFAULT; default: break DEFAULT; } break SUCCESS; } do { stmtsD; } while(0); } where `SUCCESS` and `DEFAULT` are suitably generated fresh statement labels. (You might think that simply SUCCESS: { switch (s.hashCode()) { case 12345: if (s.equals("Hello")) { stmts1; break SUCCESS; } else if (s.equals(?Goodbye")) { stmts3; break SUCCESS; } else break; case 6789: if (s.equals("World")) { stmts2; break SUCCESS; } else break; default: break; } stmtsD; } would do the job, but the first version, using a `DEFAULT` label and a dummy `do` statement, allows for stmts1, stmts2, stmts3, and stmtsD to contain `break` statements. Of course, that?s assuming use of only surface syntax to express the transformations; the compiler can probably be smarter than that in practice.) > #### Switches on enums > > Switches on `enum` constants exploit the fact that enums have (usually dense) integral ordinal values. Unfortunately, because an ordinal value can change between compilation time and runtime, we cannot rely on this mapping directly, but instead need to do an extra layer of mapping. Given a switch like: > > switch(color) { > case RED: ... > case GREEN: ... > } > > The compiler numbers the cases starting a 1 (as with string switch), and creates a synthetic class that maps the runtime values of the enum ordinals to the statically numbered cases: Inconsistency: in the string example above, you actually numbered the cases 0 and 1, not 1 and 2. > > class Outer$0 { > synthetic final int[] $EnumMap$Color = new int[Color.values().length]; > static { > try { $EnumMap$Color[RED.ordinal()] = 1; } catch (NoSuchFieldError ex) {} > try { $EnumMap$Color[GREEN.ordinal()] = 2; } catch (NoSuchFieldError ex) {} > } ? > } > > Then, the switch is translated as follows: > > switch(Outer$0.$EnumMap$Color[color.ordinal()]) { > case 1: stmt1; > case 2: stmt2 > } Presumably for this example the chosen integers start with 1 rather than 0, so that if any element of the array is not explicitly initialized by Outer$0, its default 0 value will not be confused with an actual enum value. This subtle point should be mentioned explicitly. An interesting question is whether a ?case 0: throw new Exception(?);? should be supplied, on the grounds that it?s okay for the programmer to ignore enum values presumably known at compile time, but not to ignore values that sneaked in later? (If the desire is really to ignore all such values, known and unknown, the programmer can always write ?default: break;?.) > In other words, we construct an array whose size is the cardinality of the enum, and then the element at position *i* of such array will contain the case index corresponding to the enum constant with whose ordinal is *i*. > > ## A more general scheme > > The handling of strings and enums give us a hint of how to create a more regular scheme; for `switch` targets more complex than `int`, we lower the `switch` to an `int` switch with consecutive `case` labels, and use a separate process to map the target into the range of synthetic case labels. > > Now that we have `invokedynamic` in our toolbox, we can reduce all of the non-`int` cases to a single form, where we number the cases with consecutive integers, and perform case selection via an `invokedynamic`-based classifier function, whose static argument list receives a description of the actual targets, and which returns an `int` identifying what `case` to select. > > This approach has several advantages: > - Reduced compiler complexity -- all switches follow a common pattern; > - Reduced static code size; > - The classification function can select from a wide range of strategies (linear search, binary search, building a `HashMap`, constructing a perfect hash function, etc), which can vary over time or from situation to situation; > - We are free to improve the strategy or select an alternate strategy (say, to optimize for startup time) without having to recompile the code; > - Hopefully at least, if not more, JIT-friendly than the existing translation. > > We can also use this approach in preference to `lookupswitch` for non-dense `int` switches, as well as use it to extend `switch` to handle `long`, `float`, and `double` targets (which were surely excluded in part because the JVM didn't provide a convenient translation target for these types.) > > #### Bootstrap design > > When designing the `invokedynamic` bootstraps to support this translation, we face the classic lumping-vs-splitting decision. For now, we'll bias towards splitting. In the following example, `BOOTSTRAP_PREAMBLE` indicates the usual leading arguments for an indy bootstrap. We assume the compiler has numbered the case values densely from 0..N, and the bootstrap will return [0,n) for success, or N for "no match". > > A strawman design might be: > > // Numeric switches for P, accepts invocation as P -> I or Box(P) -> I > CallSite intSwitch(BOOTSTRAP_PREAMBLE, int... caseValues) > > // Switch for String, invocation descriptor is String -> I > CallSite stringSwitch(BOOTSTRAP_PREAMBLE, String... caseValues) > > // Switch for Enum, invocation descriptor is E -> I > CallSite enumSwitch(BOOTSTRAP_PREAMBLE, Class>> clazz, > String... caseNames) > > It might be possible to encode all of these into a single bootstrap, but given that the compiler already treats each type slightly differently, it seems there is little value in this sort of lumping for non-pattern switches. > > The `enumSwitch` bootstrap as proposed uses `String` values to describe the enum constants, rather than encoding the enum constants directly via condy. This allows us to be more robust to enums disappearing after compilation. > > This strategy is also dependent on having broken the limitation on 253 bootstrap arguments in indy/condy. > > #### Extending to other primitive types > > This approach extends naturally to other primitive types (long, double, float), by the addition of some more bootstraps (which need to deal with the additional complexities of infinity, NaN, etc): > > CallSite longSwitch(BOOTSTRAP_PREAMBLE, long... caseValues) > CallSite floatSwitch(BOOTSTRAP_PREAMBLE, float... caseValues) > CallSite doubleSwitch(BOOTSTRAP_PREAMBLE, double... caseValues) > > #### Extending to null > > The scheme as proposed above does not explicitly handle nulls, which is a feature we'd like to have in `switch`. There are a few ways we could add null handling into the API: > > - Split entry points into null-friendly or null-hostile switches; > - Find a way to encode nulls in the array of case values (which can be done with condy); > - Always treat null as a possible input and a distinguished output, and have the compiler ensure the switch can handle this distinguished output. > > The last strategy is appealing and straightforward; assign a sentinel value (-1) to `null`, and always return this sentinel when the input is null. The compiler ensures that some case handles `null`, and if no case handles `null` then it inserts an implicit > > case -1: throw new NullPointerException(); > > into the generated code. > > #### General example > > If we have a string switch: > > switch (x) { > case "Foo": m(); break; > case "Bar": n(); // fall through > case "Baz": r(); break; > default: p(); > } > > we translate into: > > int t = indy[bsm=stringSwitch["Foo", "Bar", "Baz"]](x) > switch (t) { > case -1: throw new NullPointerException(); // implicit null case > case 0: m(); break; > case 1: n(); // fall through > case 2: r(); break; > case 3: p(); // default case > } > > All switches, with the exception of `int` switches (and maybe not even non-dense `int` switches), follow this exact pattern. If the target type is not a reference type, the `null` case is not needed. > > This strategy is implemented in the `switch` branch of the amber repository; see `java.lang.runtime.SwitchBootstraps` in that branch for (rough!) implementations of the bootstraps. > > ## Patterns in narrow-target switches > > When we add patterns, we may encounter switches whose targets are tightly typed (e.g., `String` or `int`) but still use some patterns in their expression. For switches whose target type is a primitive, primitive box, `String`, or `enum`, we'd like to use the optimized translation strategy outlined here, but the following kinds of patterns might still show up in a switch on, say, `Integer`: > > case var x: > case _: > case Integer x: > case Integer(var x): > > The first three can be translated away by the source compiler, as they are semantically equivalent to `default`. If any nontrivial patterns are present (including deconstruction patterns), we may need to translate as a pattern switch scheme -- see Part 2. (While the language may not distinguish between "legacy" and "pattern" switches -- in that all switches are pattern switches -- we'd like to avoid giving up obvious optimizations if we can.) > > # Part 2 -- type test patterns and guards > > A key motivation for reexamining switch translation is the impending arrival of patterns in switch. We expect switch translation for the pattern case to follow a similar structure -- lower to an `int` switch and use an indy-based classifier to select an index. However, there are a few additional complexities. One is that pattern cases may have guards, which means we need to be able to re-enter the bootstrap with an indication to "continue matching from case N", in the event of a failed guard. (Even if the language doesn't support guards directly, the obvious implementation strategy for nested patterns is to desugar them into guards.) > > Translating pattern switches is more complicated because there are more options for how to divide the work between the statically generated code and the switch classifier, and different choices have different performance side-effects (are binding variables "boxed" into a tuple to be returned, or do they need to be redundantly calculated). > > ## Type-test patterns > > Type-test patterns are notable because their applicability predicate is purely based on the type system, meaning that the compiler can directly reason about it both statically (using flow analysis, optimizing away dynamic type tests) and dynamically (with `instanceof`.) A switch involving type-tests: > > switch (x) { > case String s: ... > case Integer i: ... > case Long l: ... > } > > can (among other strategies) be translated into a chain of `if-else` using `instanceof` and casts: > > if (x instanceof String) { String s = (String) x; ... } > else if (x instanceof Integer) { Integer i = (Integer) x; ... } > else if (x instanceof Long) { Long l = (Long) x; ... } > > #### Guards > > The `if-else` desugaring can also naturally handle guards: > > switch (x) { > case String s > where (s.length() > 0): ... > case Integer i > where (i > 0): ... > case Long l > where (l > 0L): ... > } > > can be translated to: > > if (x instanceof String > && ((String) x).length() > 0) { String s = (String) x; ... } > else if (x instanceof Integer > && ((Integer) x) > 0) { Integer i = (Integer) x; ... } > else if (x instanceof Long > && ((Long) x) > 0L) { Long l = (Long) x; ... } > > #### Performance concerns > > The translation to `if-else` chains is simple (for switches without fallthrough), but is harder for the VM to optimize, because we've used a more general control flow mechanism. If the target is an empty `String`, which means we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis could tell us that it can't possibly be an `Integer` or a `Long`, and so there's no need to perform those tests. But generating code that takes advantage of this information is more complex. > > In the extreme case, where a switch consists entirely of type test patterns for final classes, this could be performed as an O(1) operation by hashing. And this is a common case involving switches over alternatives in a sum (sealed) type. (We shouldn't rely on finality at compile time, as this can change between compile and run time, but we should take advantage of this at run time if we can.) > > Finally, the straightforward static translation may miss opportunities for optimization. For example: > > switch (x) { > case Point p > where p.x > 0 && p.y > 0: A > case Point p > where p.x > 0 && p.y == 0: B > } > > Here, not only would we potentially test the target twice to see if it is a `Point`, but we then further extract the `x` component twice and perform the `p.x > 0` test twice. > > #### Optimization opportunities > > The compiler can eliminate some redundant calculations through straightforward techniques. The previous switch can be transformed to: > > switch (x) { > case Point p: > if (((Point) p).x > 0 && ((Point) p).y > 0) { A } > else if (((Point) p).x > 0 && ((Point) p).y > 0) { B } > > to eliminate the redundant `instanceof` (and admits further CSE optimizations.) > > #### Clause reordering > > The above example was easy to transform because the two `case Point` clauses were adjacent. But what if they are not? In some cases, it is safe to reorder them. For types `T` and `U`, it is safe to reorder `case T` and `case U` if the two types have no intersection; that there can be no types that are subtypes of them both. This is true when `T` and `U` are classes and neither extends the other, or when one is a final class and the other is an interface that the class does not implement. > > The compiler could then reorder case clauses so that all the ones whose first test is `case Point` are adjacent, and then coalesce them all into a single arm of the `if-else` chain. > > A possible spoiler here is fallthrough; if case A falls into case B, then cases A and B have to be moved as a group. (This is another reason to consider limiting fallthrough.) > > A bigger possible spoiler here is separate compilation. If at compile time, we see that `T` and `U` are disjoint types, do we want to bake that assumption into the compilation, or do we have to re-check that assumption at runtime? > > #### Summary of if-else translation > > While the if-else translation at first looks pretty bad, we are able to extract a fair amount of redundancy through well-understood compiler transformations. If an N-way switch has only M distinct types in it, in most cases we can reduce the cost from _O(N)_ to _O(M)_. Sometimes _M == N_, so this doesn't help, but sometimes _M << N_ (and sometimes `N` is small, in which case _O(N)_ is fine.) > > Reordering clauses involves some risk; specifically, that the class hierarchy will change between compile and run time. It seems eminently safe to reorder `String` and `Integer`, but more questionable to reorder an arbitrary class `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because it might easily be changed to do so later. Ideally we'd like to perform class-hierarchy optimizations using the runtime hierarchy, not the compile-time hierarchy. > > ## Type classifiers > > The technique outlined in _Part 1_, where we lower the complex switch to a dense `int` switch, and use an indy-based classifier to select an index, is applicable here as well. First let's consider a switch consisting only of unguarded type-test patterns, optionally with a default clause. > > We'll start with an `indy` bootstrap whose static argument are `Class` constants corresponding to each arm of the switch, whose dynamic argument is the switch target, and whose return value is a case number (or distinguished sentinels for "no match" and `null`.) We can easily implement such a bootstrap with a linear search, but can also do better; if some subset of the classes are `final`, we can choose between these more quickly (such as via binary search on `hashCode()`, hash function, or hash table), and we need perform only a single operation to test all of those at once. Dynamic techniques (such as a building a hash map of previously seen target types), which `indy` is well-suited to, can asymptotically approach _O(1)_ even when the classes involved are not final. > > So we can lower: > > switch (x) { > case T t: A > case U u: B > case V v: C > } > > to > > int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) > switch (y) { > case 0: A > case 1: B > case 2: C > } > > This has the advantages that the generated code is very similar to the source code, we can (in some cases) get _O(1)_ dispatch performance, and we can handle fallthrough with no additional complexity. > > #### Guards > > There are two approaches we could take to add support for guards into the process; we could try to teach the bootstrap about guards (and would have to pass locals that appear in guard expressions as additional arguments to the classifier), or we could leave guards to the generated bytecode. The latter seems far more attractive, but requires some tweaks to the bootstrap arguments and to the shape of the generated code. > > If the classifier says "you have matched case #3", but then we fail the guard for #3, we want to go back into the classifier and start again at #4. (Sometimes the classifier can also use this information ("start over at #4") to optimize away unnecessary tests.) > > We add a second argument (where to start) to the classifier invocation signature, and wrap the switch in a loop, lowering: > > switch (target) { > case T t where (e1): A > case T t where (e2): B > case U u where (e3): C > } > > into > > int index = -1; // start at the top > while (true) { > index = indy[...](target, index) > switch (index) { > case 0: if (!e1) continue; A > case 1: if (!e2) continue; B > case 2: if (!e3) continue; C > default: break; > } > break; > } > > For cases where the same type test is repeated in consecutive positions (at N and N+1), we can have the static compiler coalesce them as above, or we could have the bootstrap maintain a table so that if you re-enter the bootstrap where the previous answer was N, then it can immediately return N+1. Similarly, if N and N+1 are known to be mutually exclusive types (like `String` and `Integer`), on reentering the classifier with N, we can skip right to N+2 since if we matched `String`, we cannot match `Integer`. Lookup tables for such optimizations can be built at callsite linkage time. > > #### Mixing constants and type tests > > This approach also extends to tests that are a mix of constant patterns and type-test patterns, such as: > > switch (x) { > case "Foo": ... > case 0L: ... > case Integer i: > } > > We can extend the bootstrap protocol to accept constants as well as types, and it is a straightforward optimization to combine both type matching and constant matching in a single pass. > > ## Nested patterns > > Nested patterns are essentially guards; even if we don't expose guards in the language, we can desugar > > case Point(0, var x): > > into the equivalent of > > case Point(var a, var x) && a matches 0: > > using the same translation story as above -- use the classifier to select a candidate case arm based on the top-type of the pattern, and then do additional checks in the generated bytecode, and if the checks fail, continue and re-enter the classifier starting at the next case. > > #### Explicit continue > > An alternative to exposing guards is to expose an explicit `continue` statement in switch, which would have the effect of "keep matching at the next case." Then guards could be expressed imperatively as: > > case P: > if (!guard) > continue; > ... > break; > case Q: ? > A nice idea, but careful: it is already meaningful to write: while (?) { switch (?) { case 1: ? case 2: if (foo) continue; ? } } and expect the `continue` to start a new iteration of the `while` loop. Indeed, this fact was already exploited above under ?### Guards?. If you really want to introduce the idea of ?continuing a switch dispatch" into the surface syntax, even if only for expository purposes, let me suggest the form `continue switch;`. switch (?) { case P: if (!guard) continue switch; ... break; case Q: ? } ?Guy -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Apr 6 17:58:24 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 6 Apr 2018 13:58:24 -0400 Subject: Switch translation In-Reply-To: <6FDE8CBD-14BA-4A18-9030-877E9C664194@oracle.com> References: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> <6FDE8CBD-14BA-4A18-9030-877E9C664194@oracle.com> Message-ID: <2b413b07-1c48-216c-a010-552d6b75f52f@oracle.com> >> >> ??? int index=-1; >> ??? switch (s.hashCode()) { >> ? ?? ?? case 12345: if (!s.equals("Hello")) break; index = 1; break; >> ? ?? ?? case 6789: if (!s.equals("World")) break; index = 0; break; >> ? ?? ?? default: index = -1; >> ??? } >> ??? switch (index) { >> ? ?? ?? case 0: ... >> ??????? case 1: ... >> ? ?? ?? default: ... >> ??? } >> >> If there are hash collisions between the strings, the first switch >> must try all possible matching strings. > > I see why you use this structure, because it fits a general paradigm > of first mapping to an integer. Or, "used", since this is the historical strategy which we're tossing over for the indy-based one. > I now suggest that a post-optimization might then turn this into: > > ? SUCCESS: { > ? ? DEFAULT: { > ? ? ? switch (s.hashCode()) { > ? ? ? ? case 12345: if (s.equals("Hello")) { stmts1; break SUCCESS; } > else?if (s.equals(?Goodbye")) { stmts3; break SUCCESS; } else break > DEFAULT; Yes; the thing that pushed us to this translation was fallthrough and other weird control flow; by lowering the string-switch to an int-switch, the control structure is preserved, so any complex control flow comes along "for free" by existing int-switch translation.? Of course, it's not free; we pay with a pre-switch. (When we added strings in switch, it was part of "Project Coin", whose mandate was "small features", so it was preferable at the time to choose a simpler but less efficient desugaring.) > > >> #### Switches on enums >> >> Switches on `enum` constants exploit the fact that enums have >> (usually dense) integral ordinal values. Unfortunately, because an >> ordinal value can change between compilation time and runtime, we >> cannot rely on this mapping directly, but instead need to do an extra >> layer of mapping.? Given a switch like: >> >> ??? switch(color) { >> ??????? case RED: ... >> ??????? case GREEN: ... >> ??? } >> >> The compiler numbers the cases starting a 1 (as with string switch), >> and creates a synthetic class that maps the runtime values of the >> enum ordinals to the statically numbered cases: > > Inconsistency: in the string example above, you actually numbered the > cases 0 and 1, not 1 and 2. The old way, where the compiler generated the transform table (Java 5 and later) used 1-origin, for the reason you surmise.? The new, indy-based translation uses 0, like the String example. > > Presumably for this example the chosen integers start with 1 rather > than 0, so that if any element of the array is not explicitly > initialized by Outer$0, its default 0 value will not be confused with > an actual enum value. ?This subtle point should be mentioned explicitly. Yes, that's exactly why the historical approach did it this way. The new way (which is uniform with other indy-based switch types) takes care of this with pre-filling the array with the index that indicates "default" at linkage time.? From SwitchBootstraps::enumSwitch: ??????????? ordinalMap = new int[enumClass.getEnumConstants().length]; ??????????? Arrays.fill(ordinalMap, enumNames.length); ??????????? for (int i=0; i >> >> #### Explicit continue >> >> An alternative to exposing guards is to expose an explicit `continue` >> statement in switch, which would have the effect of "keep matching at >> the next case."? Then guards could be expressed imperatively as: >> >> ??? case P: >> ??????? if (!guard) >> ??????????? continue; >> ??????? ... >> ??????? break; >> ??? case Q: ? >> > A nice idea, but careful: it is already meaningful to write: > > while (?) { switch (?) { case 1: ? case 2: if (foo) continue; ? } } > > and expect the `continue` to start a new iteration of the `while` > loop. ?Indeed, this fact was already exploited above under ?### Guards?. Yes.? One of the downsides of exposing `continue` is that currently the (switch, continue) entry in my table from "Disallowing break label and continue label inside expression switch" has a P instead of an X, meaning that continue is currently allowed in a switch if there's an enclosing continue-able context.? So this could be disambiguated as you say with "continue switch", or with requiring a label in some or all circumstances. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guy.steele at oracle.com Fri Apr 6 18:38:25 2018 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 6 Apr 2018 14:38:25 -0400 Subject: Switch translation In-Reply-To: <2b413b07-1c48-216c-a010-552d6b75f52f@oracle.com> References: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> <6FDE8CBD-14BA-4A18-9030-877E9C664194@oracle.com> <2b413b07-1c48-216c-a010-552d6b75f52f@oracle.com> Message-ID: > On Apr 6, 2018, at 1:58 PM, Brian Goetz wrote: > > >>> >>> int index=-1; >>> switch (s.hashCode()) { >>> case 12345: if (!s.equals("Hello")) break; index = 1; break; >>> case 6789: if (!s.equals("World")) break; index = 0; break; >>> default: index = -1; >>> } >>> switch (index) { >>> case 0: ... >>> case 1: ... >>> default: ... >>> } >>> >>> If there are hash collisions between the strings, the first switch must try all possible matching strings. >> >> I see why you use this structure, because it fits a general paradigm of first mapping to an integer. > > Or, "used", since this is the historical strategy which we're tossing over for the indy-based one. Sorry, I incorrectly interpreted some of the transitional text. > >> I now suggest that a post-optimization might then turn this into: >> >> SUCCESS: { >> DEFAULT: { >> switch (s.hashCode()) { >> case 12345: if (s.equals("Hello")) { stmts1; break SUCCESS; } else if (s.equals(?Goodbye")) { stmts3; break SUCCESS; } else break DEFAULT; > > Yes; the thing that pushed us to this translation was fallthrough and other weird control flow; by lowering the string-switch to an int-switch, the control structure is preserved, so any complex control flow comes along "for free" by existing int-switch translation. Of course, it's not free; we pay with a pre-switch. (When we added strings in switch, it was part of "Project Coin", whose mandate was "small features", so it was preferable at the time to choose a simpler but less efficient desugaring.) Oops, I forgot about preserving fallthrough. Yuck. ?Never mind." Well, the post-optimization can still be used in situations where no fallthrough occurs. Can decide later whether it is worth the trouble to avoid the integer encoding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Fri Apr 6 21:22:53 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 6 Apr 2018 23:22:53 +0200 (CEST) Subject: Switch translation In-Reply-To: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> References: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> Message-ID: <1592418474.1146193.1523049772989.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Vendredi 6 Avril 2018 17:51:49 > Objet: Switch translation > The following outlines our story for translating improved switches, including > both the switch improvements coming as part of JEP 325, and follow-on work to > add pattern matching to switches. Much of this has been discussed already over > the last year, but here it is in one place. > # Switch Translation > #### Maurizio Cimadamore and Brian Goetz > #### April 2018 > ## Part 1 -- constant switches > This part examines the current translation of `switch` constructs by `javac`, > and proposes a more general translation for switching on primitives, boxes, > strings, and enums, with the goals of: > - Unify the treatment of `switch` variants, simplifying the compiler > implementation and reducing the static footprint of generated code; > - Move responsibility for target classification from compile time to run time, > allowing us to more freely update the logic without updating the compiler. > ## Current translation > Switches on `int` (and the smaller integer primitives) are translated in one of > two ways. If the labels are relatively dense, we translate an `int` switch to a > `tableswitch`; if they are sparse, we translate to a `lookupswitch`. The > current heuristic appears to be that we use a `tableswitch` if it results in a > smaller bytecode than a `lookupswitch` (which uses twice as many bytes per > entry), which is a reasonable heuristic. > #### Switches on boxes > Switches on primitive boxes are currently implemented as if they were primitive > switches, unconditionally unboxing the target before entry (possibly throwing > NPE). > #### Switches on strings > Switches on strings are implemented as a two-step process, exploiting the fact > that strings cache their `hashCode()` and that hash codes are reasonably spread > out. Given a switch on strings like the one below: > switch (s) { > case "Hello": ... > case "World": ... > default: ... > } > The compiler desugar this into two separate switches, where the first switch > maps the input strings into a range of numbers [0..1], as shown below, which > can then be used in a subsequent plain switch on ints. The generated code > unconditionally calls `hashCode()`, again possibly throwing NPE. > int index=-1; > switch (s.hashCode()) { > case 12345: if (!s.equals("Hello")) break; index = 1; break; > case 6789: if (!s.equals("World")) break; index = 0; break; > default: index = -1; > } > switch (index) { > case 0: ... > case 1: ... > default: ... > } > If there are hash collisions between the strings, the first switch must try all > possible matching strings. > #### Switches on enums > Switches on `enum` constants exploit the fact that enums have (usually dense) > integral ordinal values. Unfortunately, because an ordinal value can change > between compilation time and runtime, we cannot rely on this mapping directly, > but instead need to do an extra layer of mapping. Given a switch like: > switch(color) { > case RED: ... > case GREEN: ... > } > The compiler numbers the cases starting a 1 (as with string switch), and creates > a synthetic class that maps the runtime values of the enum ordinals to the > statically numbered cases: > class Outer$0 { > synthetic final int[] $EnumMap$Color = new int[Color.values().length]; > static { > try { $EnumMap$Color[RED.ordinal()] = 1; } catch (NoSuchFieldError ex) {} > try { $EnumMap$Color[GREEN.ordinal()] = 2; } catch (NoSuchFieldError ex) {} > } > } > Then, the switch is translated as follows: > switch(Outer$0.$EnumMap$Color[color.ordinal()]) { > case 1: stmt1; > case 2: stmt2 > } > In other words, we construct an array whose size is the cardinality of the enum, > and then the element at position * i * of such array will contain the case > index corresponding to the enum constant with whose ordinal is * i * . > ## A more general scheme > The handling of strings and enums give us a hint of how to create a more regular > scheme; for `switch` targets more complex than `int`, we lower the `switch` to > an `int` switch with consecutive `case` labels, and use a separate process to > map the target into the range of synthetic case labels. > Now that we have `invokedynamic` in our toolbox, we can reduce all of the > non-`int` cases to a single form, where we number the cases with consecutive > integers, and perform case selection via an `invokedynamic`-based classifier > function, whose static argument list receives a description of the actual > targets, and which returns an `int` identifying what `case` to select. > This approach has several advantages: > - Reduced compiler complexity -- all switches follow a common pattern; > - Reduced static code size; > - The classification function can select from a wide range of strategies (linear > search, binary search, building a `HashMap`, constructing a perfect hash > function, etc), which can vary over time or from situation to situation; > - We are free to improve the strategy or select an alternate strategy (say, to > optimize for startup time) without having to recompile the code; > - Hopefully at least, if not more, JIT-friendly than the existing translation. > We can also use this approach in preference to `lookupswitch` for non-dense > `int` switches, as well as use it to extend `switch` to handle `long`, `float`, > and `double` targets (which were surely excluded in part because the JVM didn't > provide a convenient translation target for these types.) It seems to be a good general approach but it has several drawbacks: - do not work well with the type switch because the instanceof part (at least the part that recognizes the type) will be inside invokedynamic while the cast part will be in the tableswitch, so there is little chance that the VM can optimize such construction to avoid to do the instanceof/checkcast twice. - is not optimal in term of bytecode size with an expression switch that doesn't do any side effect on the local variable because there is a better representation where each case is desugared as a static method like for a lambda. In that case, you do not need a tableswitch, an invokedynamic is enough. In term of performance, because the VM used to did not gather profile when executing tableswitch/lookupswitch, performance were not good compared to only use an invokedynamic but JDK-8200303 may change things. So trying to detect if an invokedynamic alone is not enough can be interesting. In term of bootstrap method, most of the code can be shared apart returning an int or calling a static method on the leaf. > #### Bootstrap design > When designing the `invokedynamic` bootstraps to support this translation, we > face the classic lumping-vs-splitting decision. For now, we'll bias towards > splitting. In the following example, `BOOTSTRAP_PREAMBLE` indicates the usual > leading arguments for an indy bootstrap. We assume the compiler has numbered > the case values densely from 0..N, and the bootstrap will return [0,n) for > success, or N for "no match". > A strawman design might be: > // Numeric switches for P, accepts invocation as P -> I or Box(P) -> I > CallSite intSwitch(BOOTSTRAP_PREAMBLE, int... caseValues) > // Switch for String, invocation descriptor is String -> I > CallSite stringSwitch(BOOTSTRAP_PREAMBLE, String... caseValues) > // Switch for Enum, invocation descriptor is E -> I > CallSite enumSwitch(BOOTSTRAP_PREAMBLE, Class>> clazz, > String... caseNames) > It might be possible to encode all of these into a single bootstrap, but given > that the compiler already treats each type slightly differently, it seems there > is little value in this sort of lumping for non-pattern switches. > The `enumSwitch` bootstrap as proposed uses `String` values to describe the enum > constants, rather than encoding the enum constants directly via condy. This > allows us to be more robust to enums disappearing after compilation. > This strategy is also dependent on having broken the limitation on 253 bootstrap > arguments in indy/condy. > #### Extending to other primitive types > This approach extends naturally to other primitive types (long, double, float), > by the addition of some more bootstraps (which need to deal with the additional > complexities of infinity, NaN, etc): > CallSite longSwitch(BOOTSTRAP_PREAMBLE, long... caseValues) > CallSite floatSwitch(BOOTSTRAP_PREAMBLE, float... caseValues) > CallSite doubleSwitch(BOOTSTRAP_PREAMBLE, double... caseValues) > #### Extending to null > The scheme as proposed above does not explicitly handle nulls, which is a > feature we'd like to have in `switch`. There are a few ways we could add null > handling into the API: > - Split entry points into null-friendly or null-hostile switches; > - Find a way to encode nulls in the array of case values (which can be done with > condy); > - Always treat null as a possible input and a distinguished output, and have the > compiler ensure the switch can handle this distinguished output. > The last strategy is appealing and straightforward; assign a sentinel value (-1) > to `null`, and always return this sentinel when the input is null. The compiler > ensures that some case handles `null`, and if no case handles `null` then it > inserts an implicit > case -1: throw new NullPointerException(); > into the generated code. or - use a boolean as first bootstrap constant arguments to indicate if you want null to be -1 or a NPE. It will make the generated bytecode smaller and be sure that most of the time if there is no case null, the handling of null can be done implicitly by the VM. > #### General example > If we have a string switch: > switch (x) { > case "Foo": m(); break; > case "Bar": n(); // fall through > case "Baz": r(); break; > default: p(); > } > we translate into: > int t = indy[bsm=stringSwitch["Foo", "Bar", "Baz"]](x) > switch (t) { > case -1: throw new NullPointerException(); // implicit null case > case 0: m(); break; > case 1: n(); // fall through > case 2: r(); break; > case 3: p(); // default case > } with my proposed bootstrap recipe (use a boolen to indicate a nullcheck is needed of not), int t = indy[bsm=stringSwitch[false, "Foo", "Bar", "Baz"]](x) switch (t) { case 0: m(); break; case 1: n(); // fall through case 2: r(); break; case 3: p(); // default case } > All switches, with the exception of `int` switches (and maybe not even non-dense > `int` switches), follow this exact pattern. If the target type is not a > reference type, the `null` case is not needed. > This strategy is implemented in the `switch` branch of the amber repository; see > `java.lang.runtime.SwitchBootstraps` in that branch for (rough!) > implementations of the bootstraps. > ## Patterns in narrow-target switches > When we add patterns, we may encounter switches whose targets are tightly typed > (e.g., `String` or `int`) but still use some patterns in their expression. For > switches whose target type is a primitive, primitive box, `String`, or `enum`, > we'd like to use the optimized translation strategy outlined here, but the > following kinds of patterns might still show up in a switch on, say, `Integer`: > case var x: > case _: > case Integer x: > case Integer(var x): > The first three can be translated away by the source compiler, as they are > semantically equivalent to `default`. If any nontrivial patterns are present > (including deconstruction patterns), we may need to translate as a pattern > switch scheme -- see Part 2. (While the language may not distinguish between > "legacy" and "pattern" switches -- in that all switches are pattern switches -- > we'd like to avoid giving up obvious optimizations if we can.) > # Part 2 -- type test patterns and guards > A key motivation for reexamining switch translation is the impending arrival of > patterns in switch. We expect switch translation for the pattern case to follow > a similar structure -- lower to an `int` switch and use an indy-based > classifier to select an index. However, there are a few additional > complexities. One is that pattern cases may have guards, which means we need to > be able to re-enter the bootstrap with an indication to "continue matching from > case N", in the event of a failed guard. (Even if the language doesn't support > guards directly, the obvious implementation strategy for nested patterns is to > desugar them into guards.) > Translating pattern switches is more complicated because there are more options > for how to divide the work between the statically generated code and the switch > classifier, and different choices have different performance side-effects (are > binding variables "boxed" into a tuple to be returned, or do they need to be > redundantly calculated). I'm still not sure that having guards make sense from the language perspective, i find a switch with guard to be less readable that a switch with an if (at least in Scala). > ## Type-test patterns > Type-test patterns are notable because their applicability predicate is purely > based on the type system, meaning that the compiler can directly reason about > it both statically (using flow analysis, optimizing away dynamic type tests) > and dynamically (with `instanceof`.) A switch involving type-tests: > switch (x) { > case String s: ... > case Integer i: ... > case Long l: ... > } > can (among other strategies) be translated into a chain of `if-else` using > `instanceof` and casts: > if (x instanceof String) { String s = (String) x; ... } > else if (x instanceof Integer) { Integer i = (Integer) x; ... } > else if (x instanceof Long) { Long l = (Long) x; ... } > #### Guards > The `if-else` desugaring can also naturally handle guards: > switch (x) { > case String s > where (s.length() > 0): ... > case Integer i > where (i > 0): ... > case Long l > where (l > 0L): ... > } > can be translated to: > if (x instanceof String > && ((String) x).length() > 0) { String s = (String) x; ... } > else if (x instanceof Integer > && ((Integer) x) > 0) { Integer i = (Integer) x; ... } > else if (x instanceof Long > && ((Long) x) > 0L) { Long l = (Long) x; ... } > #### Performance concerns > The translation to `if-else` chains is simple (for switches without > fallthrough), but is harder for the VM to optimize, because we've used a more > general control flow mechanism. If the target is an empty `String`, which means > we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis > could tell us that it can't possibly be an `Integer` or a `Long`, and so > there's no need to perform those tests. But generating code that takes > advantage of this information is more complex. it's worst than that, it's not a if-else chain it's a if-instanceof-else chain, instanceof by itself is decomposed into several ifs, so when you have enough (the number depend on your CPU) if-instanceof-else because the assembly code if full of conditional branches, it will be really slow but branch predictor will be lost. > In the extreme case, where a switch consists entirely of type test patterns for > final classes, this could be performed as an O(1) operation by hashing. And > this is a common case involving switches over alternatives in a sum (sealed) > type. (We shouldn't rely on finality at compile time, as this can change > between compile and run time, but we should take advantage of this at run time > if we can.) Hashing is complex without VM support because you have to be able to update dynamically the cache and to not have strong pointers to the classes otherwise you wil not be able to unload the classes. So while the complexity is O(1) it may requires several loads making hashing only useful when there is quite a few cases. > Finally, the straightforward static translation may miss opportunities for > optimization. For example: > switch (x) { > case Point p > where p.x > 0 && p.y > 0: A > case Point p > where p.x > 0 && p.y == 0: B > } > Here, not only would we potentially test the target twice to see if it is a > `Point`, but we then further extract the `x` component twice and perform the > `p.x > 0` test twice. > #### Optimization opportunities > The compiler can eliminate some redundant calculations through straightforward > techniques. The previous switch can be transformed to: > switch (x) { > case Point p: > if (((Point) p).x > 0 && ((Point) p).y > 0) { A } > else if (((Point) p).x > 0 && ((Point) p).y > 0) { B } > to eliminate the redundant `instanceof` (and admits further CSE optimizations.) > #### Clause reordering > The above example was easy to transform because the two `case Point` clauses > were adjacent. But what if they are not? In some cases, it is safe to reorder > them. For types `T` and `U`, it is safe to reorder `case T` and `case U` if the > two types have no intersection; that there can be no types that are subtypes of > them both. This is true when `T` and `U` are classes and neither extends the > other, or when one is a final class and the other is an interface that the > class does not implement. > The compiler could then reorder case clauses so that all the ones whose first > test is `case Point` are adjacent, and then coalesce them all into a single arm > of the `if-else` chain. > A possible spoiler here is fallthrough; if case A falls into case B, then cases > A and B have to be moved as a group. (This is another reason to consider > limiting fallthrough.) > A bigger possible spoiler here is separate compilation. If at compile time, we > see that `T` and `U` are disjoint types, do we want to bake that assumption > into the compilation, or do we have to re-check that assumption at runtime? > #### Summary of if-else translation > While the if-else translation at first looks pretty bad, we are able to extract > a fair amount of redundancy through well-understood compiler transformations. > If an N-way switch has only M distinct types in it, in most cases we can reduce > the cost from _O(N)_ to _O(M)_. Sometimes _M == N_, so this doesn't help, but > sometimes _ M << N _ (and sometimes `N` is small, in which case _O(N)_ is > fine.) > Reordering clauses involves some risk; specifically, that the class hierarchy > will change between compile and run time. It seems eminently safe to reorder > `String` and `Integer`, but more questionable to reorder an arbitrary class > `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because > it might easily be changed to do so later. Ideally we'd like to perform > class-hierarchy optimizations using the runtime hierarchy, not the compile-time > hierarchy. > ## Type classifiers > The technique outlined in _Part 1_, where we lower the complex switch to a dense > `int` switch, and use an indy-based classifier to select an index, is > applicable here as well. First let's consider a switch consisting only of > unguarded type-test patterns, optionally with a default clause. > We'll start with an `indy` bootstrap whose static argument are `Class` constants > corresponding to each arm of the switch, whose dynamic argument is the switch > target, and whose return value is a case number (or distinguished sentinels for > "no match" and `null`.) We can easily implement such a bootstrap with a linear > search, but can also do better; if some subset of the classes are `final`, we > can choose between these more quickly (such as via binary search on > `hashCode()`, hash function, or hash table), and we need perform only a single > operation to test all of those at once. Dynamic techniques (such as a building > a hash map of previously seen target types), which `indy` is well-suited to, > can asymptotically approach _O(1)_ even when the classes involved are not > final. > So we can lower: > switch (x) { > case T t: A > case U u: B > case V v: C > } > to > int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) > switch (y) { > case 0: A > case 1: B > case 2: C > } > This has the advantages that the generated code is very similar to the source > code, we can (in some cases) get _O(1)_ dispatch performance, and we can handle > fallthrough with no additional complexity. as i said above, you need to do the checkcast twice in that case. > #### Guards > There are two approaches we could take to add support for guards into the > process; we could try to teach the bootstrap about guards (and would have to > pass locals that appear in guard expressions as additional arguments to the > classifier), or we could leave guards to the generated bytecode. The latter > seems far more attractive, but requires some tweaks to the bootstrap arguments > and to the shape of the generated code. > If the classifier says "you have matched case #3", but then we fail the guard > for #3, we want to go back into the classifier and start again at #4. > (Sometimes the classifier can also use this information ("start over at #4") to > optimize away unnecessary tests.) > We add a second argument (where to start) to the classifier invocation > signature, and wrap the switch in a loop, lowering: > switch (target) { > case T t where (e1): A > case T t where (e2): B > case U u where (e3): C > } > into > int index = -1; // start at the top > while (true) { > index = indy[...](target, index) > switch (index) { > case 0: if (!e1) continue; A > case 1: if (!e2) continue; B > case 2: if (!e3) continue; C > default: break; > } > break; > } > For cases where the same type test is repeated in consecutive positions (at N > and N+1), we can have the static compiler coalesce them as above, or we could > have the bootstrap maintain a table so that if you re-enter the bootstrap where > the previous answer was N, then it can immediately return N+1. Similarly, if N > and N+1 are known to be mutually exclusive types (like `String` and `Integer`), > on reentering the classifier with N, we can skip right to N+2 since if we > matched `String`, we cannot match `Integer`. Lookup tables for such > optimizations can be built at callsite linkage time. > #### Mixing constants and type tests > This approach also extends to tests that are a mix of constant patterns and > type-test patterns, such as: > switch (x) { > case "Foo": ... > case 0L: ... > case Integer i: > } > We can extend the bootstrap protocol to accept constants as well as types, and > it is a straightforward optimization to combine both type matching and constant > matching in a single pass. > ## Nested patterns > Nested patterns are essentially guards; even if we don't expose guards in the > language, we can desugar > case Point(0, var x): > into the equivalent of > case Point(var a, var x) && a matches 0: > using the same translation story as above -- use the classifier to select a > candidate case arm based on the top-type of the pattern, and then do additional > checks in the generated bytecode, and if the checks fail, continue and re-enter > the classifier starting at the next case. > #### Explicit continue > An alternative to exposing guards is to expose an explicit `continue` statement > in switch, which would have the effect of "keep matching at the next case." > Then guards could be expressed imperatively as: > case P: > if (!guard) > continue; > ... > break; > case Q: ... R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Sat Apr 7 15:39:09 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 7 Apr 2018 11:39:09 -0400 Subject: Feedback wanted: switch expression typing In-Reply-To: References: <350254623.2244204.1522428861283.JavaMail.zimbra@u-pem.fr> <17865431.2309562.1522491830602.JavaMail.zimbra@u-pem.fr> Message-ID: <99E5310B-D197-4E44-89C0-B39FA8F8D672@oracle.com> We can start on later now. We can warn on conditionals where the two would give a different answer, nudging people to fix their code, and then bring them into alignment after everyone has been suitably irritated by warnings. > On Mar 31, 2018, at 7:13 AM, Doug Lea

wrote: > > On Sat, March 31, 2018 6:23 am, forax at univ-mlv.fr wrote: >> The fact that the semantics of ?: is very ad-hoc is a kind of accident of >> the history, >> we may want to fix it but i do not see why we have to fix it at the same >> time that we introduce the expression switch, >> we can fix the semantics of ?: later or never. > > Where "later" probably means "never". It should be fixed now. > I agree that (B) and (C) are basically the same, so choose (C). > I've had to fiddle with :? to get the compiler to shut up about > reasonable-looking expressions. (Sorry, I can't recall examples.) > Having the same story for both of them would be best, assuming > that existing code doesn't break. > > -Doug > >> >> R?mi >> >> ----- Mail original ----- >>> De: "daniel smith" >>> ?: "Remi Forax" >>> Cc: "amber-spec-experts" >>> Envoy?: Samedi 31 Mars 2018 03:44:49 >>> Objet: Re: Feedback wanted: switch expression typing >> >>>> On Mar 30, 2018, at 10:54 AM, Remi Forax wrote: >>>> >>>> I do not see (B) as sacrifying the consistency because the premise is >>>> that an >>>> expression switch should be consistent with ?: >>>> >>>> But an expression switch can also be modeled as a classical switch that >>>> returns >>>> it's value to a local variable. >>>> >>>> int a = switch(foo) { >>>> case 'a' -> 2; >>>> case 'b' -> 3; >>>> } >>>> can be see as >>>> int a = $switch(foo); >>>> with >>>> int $switch(char foo) { >>>> case 'a': return 2; >>>> case 'b': return 3; >>>> } >>> >>> I mean, sure, this is another way to assert "switches in assignment >>> contexts >>> should always be poly expressions". >>> >>> But it's just as easy to assert "conditional expressions in assignment >>> contexts >>> should always be poly expressions". >>> >>> int a = test ? 2 : 3; >>> can be seen as >>> int a = $conditional(test); >>> with >>> int $conditional(boolean test) { >>> if (test) return 2; >>> else return 3; >>> } >>> >>> Those are probably good principles. But if we embrace them, we're doing >>> (C). >>> >>> ?Dan >> > > From amaembo at gmail.com Sun Apr 8 12:09:46 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Sun, 08 Apr 2018 12:09:46 +0000 Subject: Switch translation In-Reply-To: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> References: <3de80623-baf4-e8b1-f58c-2e3e52c52b2a@oracle.com> Message-ID: Hello! > A possible spoiler here is fallthrough; if case A falls into case B, then cases A and B have to be moved as a group. (This is another reason to consider limiting fallthrough.) I don't think it's a big problem. If we first just need to determine an index to be passed to the tableswitch, then only the final tableswitch will have a fallthrough, while the index determination procedure never need a fallthrough. Thus during the index determination we are free to reorder branches along with the index values. With best regards, Tagir Valeev. 6 ???. 2018 ?. 22:52 ???????????? "Brian Goetz" ???????: The following outlines our story for translating improved switches, including both the switch improvements coming as part of JEP 325, and follow-on work to add pattern matching to switches. Much of this has been discussed already over the last year, but here it is in one place. # Switch Translation #### Maurizio Cimadamore and Brian Goetz #### April 2018 ## Part 1 -- constant switches This part examines the current translation of `switch` constructs by `javac`, and proposes a more general translation for switching on primitives, boxes, strings, and enums, with the goals of: - Unify the treatment of `switch` variants, simplifying the compiler implementation and reducing the static footprint of generated code; - Move responsibility for target classification from compile time to run time, allowing us to more freely update the logic without updating the compiler. ## Current translation Switches on `int` (and the smaller integer primitives) are translated in one of two ways. If the labels are relatively dense, we translate an `int` switch to a `tableswitch`; if they are sparse, we translate to a `lookupswitch`. The current heuristic appears to be that we use a `tableswitch` if it results in a smaller bytecode than a `lookupswitch` (which uses twice as many bytes per entry), which is a reasonable heuristic. #### Switches on boxes Switches on primitive boxes are currently implemented as if they were primitive switches, unconditionally unboxing the target before entry (possibly throwing NPE). #### Switches on strings Switches on strings are implemented as a two-step process, exploiting the fact that strings cache their `hashCode()` and that hash codes are reasonably spread out. Given a switch on strings like the one below: switch (s) { case "Hello": ... case "World": ... default: ... } The compiler desugar this into two separate switches, where the first switch maps the input strings into a range of numbers [0..1], as shown below, which can then be used in a subsequent plain switch on ints. The generated code unconditionally calls `hashCode()`, again possibly throwing NPE. int index=-1; switch (s.hashCode()) { case 12345: if (!s.equals("Hello")) break; index = 1; break; case 6789: if (!s.equals("World")) break; index = 0; break; default: index = -1; } switch (index) { case 0: ... case 1: ... default: ... } If there are hash collisions between the strings, the first switch must try all possible matching strings. #### Switches on enums Switches on `enum` constants exploit the fact that enums have (usually dense) integral ordinal values. Unfortunately, because an ordinal value can change between compilation time and runtime, we cannot rely on this mapping directly, but instead need to do an extra layer of mapping. Given a switch like: switch(color) { case RED: ... case GREEN: ... } The compiler numbers the cases starting a 1 (as with string switch), and creates a synthetic class that maps the runtime values of the enum ordinals to the statically numbered cases: class Outer$0 { synthetic final int[] $EnumMap$Color = new int[Color.values().length]; static { try { $EnumMap$Color[RED.ordinal()] = 1; } catch (NoSuchFieldError ex) {} try { $EnumMap$Color[GREEN.ordinal()] = 2; } catch (NoSuchFieldError ex) {} } } Then, the switch is translated as follows: switch(Outer$0.$EnumMap$Color[color.ordinal()]) { case 1: stmt1; case 2: stmt2 } In other words, we construct an array whose size is the cardinality of the enum, and then the element at position **i** of such array will contain the case index corresponding to the enum constant with whose ordinal is **i**. ## A more general scheme The handling of strings and enums give us a hint of how to create a more regular scheme; for `switch` targets more complex than `int`, we lower the `switch` to an `int` switch with consecutive `case` labels, and use a separate process to map the target into the range of synthetic case labels. Now that we have `invokedynamic` in our toolbox, we can reduce all of the non-`int` cases to a single form, where we number the cases with consecutive integers, and perform case selection via an `invokedynamic`-based classifier function, whose static argument list receives a description of the actual targets, and which returns an `int` identifying what `case` to select. This approach has several advantages: - Reduced compiler complexity -- all switches follow a common pattern; - Reduced static code size; - The classification function can select from a wide range of strategies (linear search, binary search, building a `HashMap`, constructing a perfect hash function, etc), which can vary over time or from situation to situation; - We are free to improve the strategy or select an alternate strategy (say, to optimize for startup time) without having to recompile the code; - Hopefully at least, if not more, JIT-friendly than the existing translation. We can also use this approach in preference to `lookupswitch` for non-dense `int` switches, as well as use it to extend `switch` to handle `long`, `float`, and `double` targets (which were surely excluded in part because the JVM didn't provide a convenient translation target for these types.) #### Bootstrap design When designing the `invokedynamic` bootstraps to support this translation, we face the classic lumping-vs-splitting decision. For now, we'll bias towards splitting. In the following example, `BOOTSTRAP_PREAMBLE` indicates the usual leading arguments for an indy bootstrap. We assume the compiler has numbered the case values densely from 0..N, and the bootstrap will return [0,n) for success, or N for "no match". A strawman design might be: // Numeric switches for P, accepts invocation as P -> I or Box(P) -> I CallSite intSwitch(BOOTSTRAP_PREAMBLE, int... caseValues) // Switch for String, invocation descriptor is String -> I CallSite stringSwitch(BOOTSTRAP_PREAMBLE, String... caseValues) // Switch for Enum, invocation descriptor is E -> I CallSite enumSwitch(BOOTSTRAP_PREAMBLE, Class>> clazz, String... caseNames) It might be possible to encode all of these into a single bootstrap, but given that the compiler already treats each type slightly differently, it seems there is little value in this sort of lumping for non-pattern switches. The `enumSwitch` bootstrap as proposed uses `String` values to describe the enum constants, rather than encoding the enum constants directly via condy. This allows us to be more robust to enums disappearing after compilation. This strategy is also dependent on having broken the limitation on 253 bootstrap arguments in indy/condy. #### Extending to other primitive types This approach extends naturally to other primitive types (long, double, float), by the addition of some more bootstraps (which need to deal with the additional complexities of infinity, NaN, etc): CallSite longSwitch(BOOTSTRAP_PREAMBLE, long... caseValues) CallSite floatSwitch(BOOTSTRAP_PREAMBLE, float... caseValues) CallSite doubleSwitch(BOOTSTRAP_PREAMBLE, double... caseValues) #### Extending to null The scheme as proposed above does not explicitly handle nulls, which is a feature we'd like to have in `switch`. There are a few ways we could add null handling into the API: - Split entry points into null-friendly or null-hostile switches; - Find a way to encode nulls in the array of case values (which can be done with condy); - Always treat null as a possible input and a distinguished output, and have the compiler ensure the switch can handle this distinguished output. The last strategy is appealing and straightforward; assign a sentinel value (-1) to `null`, and always return this sentinel when the input is null. The compiler ensures that some case handles `null`, and if no case handles `null` then it inserts an implicit case -1: throw new NullPointerException(); into the generated code. #### General example If we have a string switch: switch (x) { case "Foo": m(); break; case "Bar": n(); // fall through case "Baz": r(); break; default: p(); } we translate into: int t = indy[bsm=stringSwitch["Foo", "Bar", "Baz"]](x) switch (t) { case -1: throw new NullPointerException(); // implicit null case case 0: m(); break; case 1: n(); // fall through case 2: r(); break; case 3: p(); // default case } All switches, with the exception of `int` switches (and maybe not even non-dense `int` switches), follow this exact pattern. If the target type is not a reference type, the `null` case is not needed. This strategy is implemented in the `switch` branch of the amber repository; see `java.lang.runtime.SwitchBootstraps` in that branch for (rough!) implementations of the bootstraps. ## Patterns in narrow-target switches When we add patterns, we may encounter switches whose targets are tightly typed (e.g., `String` or `int`) but still use some patterns in their expression. For switches whose target type is a primitive, primitive box, `String`, or `enum`, we'd like to use the optimized translation strategy outlined here, but the following kinds of patterns might still show up in a switch on, say, `Integer`: case var x: case _: case Integer x: case Integer(var x): The first three can be translated away by the source compiler, as they are semantically equivalent to `default`. If any nontrivial patterns are present (including deconstruction patterns), we may need to translate as a pattern switch scheme -- see Part 2. (While the language may not distinguish between "legacy" and "pattern" switches -- in that all switches are pattern switches -- we'd like to avoid giving up obvious optimizations if we can.) # Part 2 -- type test patterns and guards A key motivation for reexamining switch translation is the impending arrival of patterns in switch. We expect switch translation for the pattern case to follow a similar structure -- lower to an `int` switch and use an indy-based classifier to select an index. However, there are a few additional complexities. One is that pattern cases may have guards, which means we need to be able to re-enter the bootstrap with an indication to "continue matching from case N", in the event of a failed guard. (Even if the language doesn't support guards directly, the obvious implementation strategy for nested patterns is to desugar them into guards.) Translating pattern switches is more complicated because there are more options for how to divide the work between the statically generated code and the switch classifier, and different choices have different performance side-effects (are binding variables "boxed" into a tuple to be returned, or do they need to be redundantly calculated). ## Type-test patterns Type-test patterns are notable because their applicability predicate is purely based on the type system, meaning that the compiler can directly reason about it both statically (using flow analysis, optimizing away dynamic type tests) and dynamically (with `instanceof`.) A switch involving type-tests: switch (x) { case String s: ... case Integer i: ... case Long l: ... } can (among other strategies) be translated into a chain of `if-else` using `instanceof` and casts: if (x instanceof String) { String s = (String) x; ... } else if (x instanceof Integer) { Integer i = (Integer) x; ... } else if (x instanceof Long) { Long l = (Long) x; ... } #### Guards The `if-else` desugaring can also naturally handle guards: switch (x) { case String s where (s.length() > 0): ... case Integer i where (i > 0): ... case Long l where (l > 0L): ... } can be translated to: if (x instanceof String && ((String) x).length() > 0) { String s = (String) x; ... } else if (x instanceof Integer && ((Integer) x) > 0) { Integer i = (Integer) x; ... } else if (x instanceof Long && ((Long) x) > 0L) { Long l = (Long) x; ... } #### Performance concerns The translation to `if-else` chains is simple (for switches without fallthrough), but is harder for the VM to optimize, because we've used a more general control flow mechanism. If the target is an empty `String`, which means we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis could tell us that it can't possibly be an `Integer` or a `Long`, and so there's no need to perform those tests. But generating code that takes advantage of this information is more complex. In the extreme case, where a switch consists entirely of type test patterns for final classes, this could be performed as an O(1) operation by hashing. And this is a common case involving switches over alternatives in a sum (sealed) type. (We shouldn't rely on finality at compile time, as this can change between compile and run time, but we should take advantage of this at run time if we can.) Finally, the straightforward static translation may miss opportunities for optimization. For example: switch (x) { case Point p where p.x > 0 && p.y > 0: A case Point p where p.x > 0 && p.y == 0: B } Here, not only would we potentially test the target twice to see if it is a `Point`, but we then further extract the `x` component twice and perform the `p.x > 0` test twice. #### Optimization opportunities The compiler can eliminate some redundant calculations through straightforward techniques. The previous switch can be transformed to: switch (x) { case Point p: if (((Point) p).x > 0 && ((Point) p).y > 0) { A } else if (((Point) p).x > 0 && ((Point) p).y > 0) { B } to eliminate the redundant `instanceof` (and admits further CSE optimizations.) #### Clause reordering The above example was easy to transform because the two `case Point` clauses were adjacent. But what if they are not? In some cases, it is safe to reorder them. For types `T` and `U`, it is safe to reorder `case T` and `case U` if the two types have no intersection; that there can be no types that are subtypes of them both. This is true when `T` and `U` are classes and neither extends the other, or when one is a final class and the other is an interface that the class does not implement. The compiler could then reorder case clauses so that all the ones whose first test is `case Point` are adjacent, and then coalesce them all into a single arm of the `if-else` chain. A possible spoiler here is fallthrough; if case A falls into case B, then cases A and B have to be moved as a group. (This is another reason to consider limiting fallthrough.) A bigger possible spoiler here is separate compilation. If at compile time, we see that `T` and `U` are disjoint types, do we want to bake that assumption into the compilation, or do we have to re-check that assumption at runtime? #### Summary of if-else translation While the if-else translation at first looks pretty bad, we are able to extract a fair amount of redundancy through well-understood compiler transformations. If an N-way switch has only M distinct types in it, in most cases we can reduce the cost from _O(N)_ to _O(M)_. Sometimes _M == N_, so this doesn't help, but sometimes _M << N_ (and sometimes `N` is small, in which case _O(N)_ is fine.) Reordering clauses involves some risk; specifically, that the class hierarchy will change between compile and run time. It seems eminently safe to reorder `String` and `Integer`, but more questionable to reorder an arbitrary class `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because it might easily be changed to do so later. Ideally we'd like to perform class-hierarchy optimizations using the runtime hierarchy, not the compile-time hierarchy. ## Type classifiers The technique outlined in _Part 1_, where we lower the complex switch to a dense `int` switch, and use an indy-based classifier to select an index, is applicable here as well. First let's consider a switch consisting only of unguarded type-test patterns, optionally with a default clause. We'll start with an `indy` bootstrap whose static argument are `Class` constants corresponding to each arm of the switch, whose dynamic argument is the switch target, and whose return value is a case number (or distinguished sentinels for "no match" and `null`.) We can easily implement such a bootstrap with a linear search, but can also do better; if some subset of the classes are `final`, we can choose between these more quickly (such as via binary search on `hashCode()`, hash function, or hash table), and we need perform only a single operation to test all of those at once. Dynamic techniques (such as a building a hash map of previously seen target types), which `indy` is well-suited to, can asymptotically approach _O(1)_ even when the classes involved are not final. So we can lower: switch (x) { case T t: A case U u: B case V v: C } to int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) switch (y) { case 0: A case 1: B case 2: C } This has the advantages that the generated code is very similar to the source code, we can (in some cases) get _O(1)_ dispatch performance, and we can handle fallthrough with no additional complexity. #### Guards There are two approaches we could take to add support for guards into the process; we could try to teach the bootstrap about guards (and would have to pass locals that appear in guard expressions as additional arguments to the classifier), or we could leave guards to the generated bytecode. The latter seems far more attractive, but requires some tweaks to the bootstrap arguments and to the shape of the generated code. If the classifier says "you have matched case #3", but then we fail the guard for #3, we want to go back into the classifier and start again at #4. (Sometimes the classifier can also use this information ("start over at #4") to optimize away unnecessary tests.) We add a second argument (where to start) to the classifier invocation signature, and wrap the switch in a loop, lowering: switch (target) { case T t where (e1): A case T t where (e2): B case U u where (e3): C } into int index = -1; // start at the top while (true) { index = indy[...](target, index) switch (index) { case 0: if (!e1) continue; A case 1: if (!e2) continue; B case 2: if (!e3) continue; C default: break; } break; } For cases where the same type test is repeated in consecutive positions (at N and N+1), we can have the static compiler coalesce them as above, or we could have the bootstrap maintain a table so that if you re-enter the bootstrap where the previous answer was N, then it can immediately return N+1. Similarly, if N and N+1 are known to be mutually exclusive types (like `String` and `Integer`), on reentering the classifier with N, we can skip right to N+2 since if we matched `String`, we cannot match `Integer`. Lookup tables for such optimizations can be built at callsite linkage time. #### Mixing constants and type tests This approach also extends to tests that are a mix of constant patterns and type-test patterns, such as: switch (x) { case "Foo": ... case 0L: ... case Integer i: } We can extend the bootstrap protocol to accept constants as well as types, and it is a straightforward optimization to combine both type matching and constant matching in a single pass. ## Nested patterns Nested patterns are essentially guards; even if we don't expose guards in the language, we can desugar case Point(0, var x): into the equivalent of case Point(var a, var x) && a matches 0: using the same translation story as above -- use the classifier to select a candidate case arm based on the top-type of the pattern, and then do additional checks in the generated bytecode, and if the checks fail, continue and re-enter the classifier starting at the next case. #### Explicit continue An alternative to exposing guards is to expose an explicit `continue` statement in switch, which would have the effect of "keep matching at the next case." Then guards could be expressed imperatively as: case P: if (!guard) continue; ... break; case Q: ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Sun Apr 8 20:16:54 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 8 Apr 2018 22:16:54 +0200 (CEST) Subject: Record and annotation values Message-ID: <594033480.1352711.1523218614316.JavaMail.zimbra@u-pem.fr> Currently annotation values are limited to what is encodable as a constant in the constant pool. With Condy, we can expand the number of values that can be encodable as a constant in the constant pool to the infinity by allowing a reference to any non-mutable class to be encoded as an annotation values. For that we need to have a 'protocol' that - encode an instance of a user defined non-mutable class as a condy by the compiler. - decode an instance of a user defined non-mutable class by the JDK runtime. Records with their constructors do not provide enough meta-information for that, the parameter names of the constructors may not be available at runtime. So i think the constructors parameter names of a Record should be always recorded (as with --parameters was specified for the constructors) to enable non-mutable records to be annotation values. R?mi From brian.goetz at oracle.com Sun Apr 8 21:33:33 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 8 Apr 2018 17:33:33 -0400 Subject: Record and annotation values In-Reply-To: <594033480.1352711.1523218614316.JavaMail.zimbra@u-pem.fr> References: <594033480.1352711.1523218614316.JavaMail.zimbra@u-pem.fr> Message-ID: I think this is one in the category of "just because you can, doesn't meant you should."? So before discussing mechanisms, let's discuss goals. Annotations are for metaDATA.? The restriction on what you can put in an annotation stems in part from what you can put in the constant pool, but busting the constant pool limits doesn't automatically mean its a good idea to make it an annotation value. The rule that I've been gravitating towards is: anything that is important enough to have a _literal form_ in the language, probably is is a good candidate to consider as an annotation value.? The obvious first candidate is method refs (excluding instance-bound ones).? But even with mrefs in, I'd say no to lambdas, because annotation values should also be scrutable to annotation processors.? (If we had collection literals, then collections of things that can already go in annos is probably also a valid candidate.) Don't forget that just because a record has all-final fields, doesn't mean its immutable all the way down.? And while some notion of "immutable all the way down" has been a frequent wish-list item, taking that on just so you can stash records in annotations is definitely tail wagging the dog. On the second point (reifying parameter names), while I don't object to doing this, this can't be the "official" way to get this data.? It should be reflectively accessible, because you need to nominally tie the constructor arguments to some way of getting their values (getters or fields).? The compiler prototype uses an annotation on the class declaration that stashes this, but that's just a prototyping hack; this probably needs a RecordSignature attribute. On 4/8/2018 4:16 PM, Remi Forax wrote: > Currently annotation values are limited to what is encodable as a constant in the constant pool. > With Condy, we can expand the number of values that can be encodable as a constant in the constant pool to the infinity by allowing a reference to any non-mutable class to be encoded as an annotation values. > > For that we need to have a 'protocol' that > - encode an instance of a user defined non-mutable class as a condy by the compiler. > - decode an instance of a user defined non-mutable class by the JDK runtime. > > Records with their constructors do not provide enough meta-information for that, the parameter names of the constructors may not be available at runtime. > > So i think the constructors parameter names of a Record should be always recorded (as with --parameters was specified for the constructors) to enable non-mutable records to be annotation values. > > R?mi > From amaembo at gmail.com Mon Apr 9 05:07:05 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Mon, 9 Apr 2018 12:07:05 +0700 Subject: Switch on java.lang.Class Message-ID: Hello! I don't remember whether switch on java.lang.Class instance was discussed. I guess, this pattern is quite common and it will be useful to support it. Such code often appears in deserialization logic when we branch on desired type to deserialize. Here are a couple of examples from opensource libraries: 1. com.google.gson.DefaultDateTypeAdapter#read (gson-2.8.2): Date date = deserializeToDate(in.nextString()); if (dateType == Date.class) { return date; } else if (dateType == Timestamp.class) { return new Timestamp(date.getTime()); } else if (dateType == java.sql.Date.class) { return new java.sql.Date(date.getTime()); } else { // This must never happen: dateType is guarded in the primary constructor throw new AssertionError(); } Could be rewritten as: Date date = deserializeToDate(in.nextString()); return switch(dateType) { case Date.class -> date; case Timestamp.class -> new Timestamp(date.getTime()); case java.sql.Date.class -> new java.sql.Date(date.getTime()); default -> // This must never happen: dateType is guarded in the primary constructor throw new AssertionError(); }; 2. com.fasterxml.jackson.databind.deser.std.FromStringDeserializer#findDeserializer (jackson-databind-2.9.4): public static Std findDeserializer(Class rawType) { int kind = 0; if (rawType == File.class) { kind = Std.STD_FILE; } else if (rawType == URL.class) { kind = Std.STD_URL; } else if (rawType == URI.class) { kind = Std.STD_URI; } else if (rawType == Class.class) { kind = Std.STD_CLASS; } else if (rawType == JavaType.class) { kind = Std.STD_JAVA_TYPE; } else if // more branches like this } else { return null; } return new Std(rawType, kind); } Could be rewritten as: public static Std findDeserializer(Class rawType) { int kind = switch(rawType) { case File.class -> Std.STD_FILE; case URL.class -> Std.STD_URL; case URI.class -> Std.STD_URI; case Class.cass -> Std.STD_CLASS; case JavaType.class -> Std.STD_JAVA_TYPE; ... default -> 0; }; return kind == 0 ? null : new Std(rawType, kind); } In such code all branches are mutually exclusive. The bootstrap method can generate a lookupswitch based on Class.hashCode, then equals checks, pretty similar to String switch implementation. Unlike String hash codes Class.hashCode is not stable and varies between JVM launches, but they are already known during the bootstrap and we can trust them during the VM lifetime, so we can generate a lookupswitch. The minor problematic point is to support primitive classes like int.class. This cannot be passed directly as indy static argument, but this can be solved with condy. What do you think? With best regards, Tagir Valeev. -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Mon Apr 9 06:16:26 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 9 Apr 2018 08:16:26 +0200 (CEST) Subject: Record and annotation values In-Reply-To: References: <594033480.1352711.1523218614316.JavaMail.zimbra@u-pem.fr> Message-ID: <487962183.23171.1523254586643.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" , "amber-spec-experts" > Envoy?: Dimanche 8 Avril 2018 23:33:33 > Objet: Re: Record and annotation values > I think this is one in the category of "just because you can, doesn't > meant you should."? So before discussing mechanisms, let's discuss goals. > > Annotations are for metaDATA.? The restriction on what you can put in an > annotation stems in part from what you can put in the constant pool, but > busting the constant pool limits doesn't automatically mean its a good > idea to make it an annotation value. > > The rule that I've been gravitating towards is: anything that is > important enough to have a _literal form_ in the language, probably is > is a good candidate to consider as an annotation value.? The obvious > first candidate is method refs (excluding instance-bound ones).? But > even with mrefs in, I'd say no to lambdas, because annotation values > should also be scrutable to annotation processors.? (If we had > collection literals, then collections of things that can already go in > annos is probably also a valid candidate.) > > Don't forget that just because a record has all-final fields, doesn't > mean its immutable all the way down.? And while some notion of > "immutable all the way down" has been a frequent wish-list item, taking > that on just so you can stash records in annotations is definitely tail > wagging the dog. > I agree, i do not think it's a good idea to introduce record as annotation value just because we can but that's not the reason why i think we should introduce record as annotation value. There is a lot of time where you can construct an annotation with invalid values but those invalid annotation will not be catch at compile time but at runtime. By example, @Test({ignore=false, timeout_value=-3, timeout_unit=SECOND}) So having a way to specify a contract for annotation values will make Java more safe. > > On the second point (reifying parameter names), while I don't object to > doing this, this can't be the "official" way to get this data.? It > should be reflectively accessible, because you need to nominally tie the > constructor arguments to some way of getting their values (getters or > fields).? The compiler prototype uses an annotation on the class > declaration that stashes this, but that's just a prototyping hack; this > probably needs a RecordSignature attribute. yes, a specfic attribute is perhaps a better, i wonder if the information in the RecordSignature should not be part of the Extractor. R?mi > > > On 4/8/2018 4:16 PM, Remi Forax wrote: >> Currently annotation values are limited to what is encodable as a constant in >> the constant pool. >> With Condy, we can expand the number of values that can be encodable as a >> constant in the constant pool to the infinity by allowing a reference to any >> non-mutable class to be encoded as an annotation values. >> >> For that we need to have a 'protocol' that >> - encode an instance of a user defined non-mutable class as a condy by the >> compiler. >> - decode an instance of a user defined non-mutable class by the JDK runtime. >> >> Records with their constructors do not provide enough meta-information for that, >> the parameter names of the constructors may not be available at runtime. >> >> So i think the constructors parameter names of a Record should be always >> recorded (as with --parameters was specified for the constructors) to enable >> non-mutable records to be annotation values. >> >> R?mi From forax at univ-mlv.fr Mon Apr 9 11:49:15 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 9 Apr 2018 13:49:15 +0200 (CEST) Subject: Expression switch - an alternate proposal In-Reply-To: References: Message-ID: <1418035131.260541.1523274555469.JavaMail.zimbra@u-pem.fr> moving to spec-experts as it can interest others. Hi Stephen, First, thanks to do a detailed analysis of the rational of your proposal. I think i agree with you about the fact that the expression switch does need to support fallthrough, more on that in a folowing email. I also agree with you that mixing arrows and colons is confusing. I am not sure it's that important to make a string distinction between the statement switch and the expression switch. You do not give any element or why you think it's important and in my opinion, it's the kind of things that you think is important when you introduce the feature and tend to be less important if the feature was not new. Basically, your proposal is to use -> eveywhere, i think i prefer the opposite, do not use arrow at all. Using arrow in this context is disturbing because it doesn't mean the same things if it's the arrow of the lambda or the arrow inside an expression switch. As i know that you love puzzlers, how about ? int a = 0; switch(x) { case 0 -> { a = 3 }; case 1 -> () -> { a = 3 }; } or this one switch(x) { case 0 -> { break 3; } case 1 -> () -> { break 3; }; case 2 -> { return 3; } case 3 -> () -> { return 3; }; } the problem is that currently -> means create a new function scope and not creates a new code scope. So if do not mixing arrows and colons is an important goal and i think it is, i think it's better to not use arrow. After all, we need the arrow syntax in lambda only to know if (x) is the start of a lambda or a cast, there is no need to have an arrow in the expression switch. Moreover, do we really need a shorter syntax given that we can use break and a value ? Here is your example with no arrow and no short syntax, var action = switch (light) { case RED: log("Red found"); break "Stop"; case YELLOW, GREEN: break "Go go go"; default: log("WTF: " + light); throw new WtfException("Unexpected color: " + light); }; and now we can discuss about adding a shorter syntax by making break optional if there is one expression. R?mi ----- Mail original ----- > De: "Stephen Colebourne" > ?: "amber-dev" > Envoy?: Lundi 9 Avril 2018 01:58:03 > Objet: Expression switch - an alternate proposal > What follows is a set of changes to the current expression switch > proposal that I believe result in a better outcome. > > The goal is to tackle four specific things (in order): > 1) The context as to whether it is a statement or expression switch > (and thus what is or is not allowed) is too remote/subtle > 2) Mixing arrows and colons is confusing to read > 3) Blocks that do not have a separate scope > 4) Fall through by default > while still keeping the design as a unified switch language feature. > > To tackle #1 and #2, all cases in an expression switch must start with > arrow -> (and all in statement switch must start with colon :) > To tackle #3, all blocks in an expression switch must have braces > To tackle #4, disallow fall through in expression switch (or add a > fallthrough keyword) > > Here is the impact on some code: > > Current: > > var action = switch (light) { > case RED: > log("Red found"); > break "Stop"; > case YELLOW: > case GREEN -> "Go go go"; > default: > log("WTF: " + light); > throw new WtfException("Unexpected color: " + light); > } > > Alternate proposal: > > var action = switch (light) { > case RED -> { > log("Red found"); > break "Stop"; > } > case YELLOW, GREEN -> "Go go go"; > default: -> { > log("WTF: " + light); > throw new WtfException("Unexpected color: " + light); > } > } > > How is this still a unified switch? By observing that switch can be > broken down into two distinct phases: > - matching > - action > What makes it unified is that the matching phase is shared. Where > statement and expression switch differ is in the action phase. > > The unified matching phase includes: > - target expression to switch on > - case null > - constant case clauses > - pattern matching case clauses > - default clause > > The action phase of a statement switch is: > - followed by a colon > - have non-scoped blocks > - fall through by default > - can use return/continue/break > > The action phase of an expression switch is: > - followed by an arrow > - have an expression or a block (aka block-expression) > - cannot fall through > - cannot use return/continue/break > > By having a unified matching phase and a separate (but consistent) > action phase in each form, I believe that the overall language feature > would be much simpler to learn. And importantly, it achieves the goal > of not deprecating or threatening the existence of the classic > statement switch. > > All the key differences are in the action phase, which is clearly > identified by arrow or colon (no remote context). Developers will come > to associate the rule differences between the two forms with the arrow > or colon, while the pattern matching knowledge is shared. > > Of course, the matching phase is not completely unified - expression > switches must be exhaustive, and they may have auto default case > clauses. (Perhaps the unified matching phase mental model suggests > that auto default would be better written explicitly, eg. "default > throw;", which could then apply to both statement and expression. Not > sure.) > > I hope this alternate proposal is clear. To me, the split between a > unified matching phase and a consistent but different action phase > clearly identified in syntax results in much better readability, > learning and understandability. > > Stephen > PS. I think there are alternate block expression syntaxes, including > ones that avoid "break expression", but I've chosen to avoid that > bikeshed and use the closest one to the current proposal for the > purpose of this mail From forax at univ-mlv.fr Mon Apr 9 11:55:12 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 9 Apr 2018 13:55:12 +0200 (CEST) Subject: Expression switch - an alternate proposal In-Reply-To: References: Message-ID: <1415747104.263756.1523274912104.JavaMail.zimbra@u-pem.fr> Do we need fallthrough in an expression switch, i believe like Stephen that we don't. First, as Stephen point in it's example, if we have comma separated cases, we need less fallthrough and even if we have a code like this var value = switch(x) { case 0: foo(); case 1: bar(); break 42; }; one can always rewrite it with comma separated cases and an if var value = switch(x) { case 0, 1: if (x == 0) { foo(); } bar(); break 42; }; There is another reason to not allow fallthrough, we have rule out all goto-related syntax like break label/continue label from the expression switch, but we should still allow a fallthrough which is a goto to the next basic block. I think we should be coherent here and do not allow the fallthrough in the expression switch. R?mi ----- Mail original ----- > De: "Stephen Colebourne" > ?: "amber-dev" > Envoy?: Lundi 9 Avril 2018 01:58:03 > Objet: Expression switch - an alternate proposal > What follows is a set of changes to the current expression switch > proposal that I believe result in a better outcome. > > The goal is to tackle four specific things (in order): > 1) The context as to whether it is a statement or expression switch > (and thus what is or is not allowed) is too remote/subtle > 2) Mixing arrows and colons is confusing to read > 3) Blocks that do not have a separate scope > 4) Fall through by default > while still keeping the design as a unified switch language feature. > > To tackle #1 and #2, all cases in an expression switch must start with > arrow -> (and all in statement switch must start with colon :) > To tackle #3, all blocks in an expression switch must have braces > To tackle #4, disallow fall through in expression switch (or add a > fallthrough keyword) > > Here is the impact on some code: > > Current: > > var action = switch (light) { > case RED: > log("Red found"); > break "Stop"; > case YELLOW: > case GREEN -> "Go go go"; > default: > log("WTF: " + light); > throw new WtfException("Unexpected color: " + light); > } > > Alternate proposal: > > var action = switch (light) { > case RED -> { > log("Red found"); > break "Stop"; > } > case YELLOW, GREEN -> "Go go go"; > default: -> { > log("WTF: " + light); > throw new WtfException("Unexpected color: " + light); > } > } > > How is this still a unified switch? By observing that switch can be > broken down into two distinct phases: > - matching > - action > What makes it unified is that the matching phase is shared. Where > statement and expression switch differ is in the action phase. > > The unified matching phase includes: > - target expression to switch on > - case null > - constant case clauses > - pattern matching case clauses > - default clause > > The action phase of a statement switch is: > - followed by a colon > - have non-scoped blocks > - fall through by default > - can use return/continue/break > > The action phase of an expression switch is: > - followed by an arrow > - have an expression or a block (aka block-expression) > - cannot fall through > - cannot use return/continue/break > > By having a unified matching phase and a separate (but consistent) > action phase in each form, I believe that the overall language feature > would be much simpler to learn. And importantly, it achieves the goal > of not deprecating or threatening the existence of the classic > statement switch. > > All the key differences are in the action phase, which is clearly > identified by arrow or colon (no remote context). Developers will come > to associate the rule differences between the two forms with the arrow > or colon, while the pattern matching knowledge is shared. > > Of course, the matching phase is not completely unified - expression > switches must be exhaustive, and they may have auto default case > clauses. (Perhaps the unified matching phase mental model suggests > that auto default would be better written explicitly, eg. "default > throw;", which could then apply to both statement and expression. Not > sure.) > > I hope this alternate proposal is clear. To me, the split between a > unified matching phase and a consistent but different action phase > clearly identified in syntax results in much better readability, > learning and understandability. > > Stephen > PS. I think there are alternate block expression syntaxes, including > ones that avoid "break expression", but I've chosen to avoid that > bikeshed and use the closest one to the current proposal for the > purpose of this mail From brian.goetz at oracle.com Mon Apr 9 13:30:07 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 Apr 2018 09:30:07 -0400 Subject: Record and annotation values In-Reply-To: <487962183.23171.1523254586643.JavaMail.zimbra@u-pem.fr> References: <594033480.1352711.1523218614316.JavaMail.zimbra@u-pem.fr> <487962183.23171.1523254586643.JavaMail.zimbra@u-pem.fr> Message-ID: > I agree, i do not think it's a good idea to introduce record as annotation value just because we can but that's not the reason why i think we should introduce record as annotation value. > > There is a lot of time where you can construct an annotation with invalid values but those invalid annotation will not be catch at compile time but at runtime. > By example, > @Test({ignore=false, timeout_value=-3, timeout_unit=SECOND}) > > So having a way to specify a contract for annotation values will make Java more safe. This is why it is best to start with problems first, rather than solutions.? It was far from obvious that this was your underlying motivation, and given this motivation, its far from obvious this is the best way to get there. So let's start over: the problem you're trying to solve is that there is not a good way currently to do compile- or run-time annotation validation? > yes, a specfic attribute is perhaps a better, > i wonder if the information in the RecordSignature should not be part of the Extractor. > I think the containment here is backwards.? An extractor is a lower-level mechanism for implementing conditional deconstruction, which includes pattern matching.? A class can have multiple extractors (patterns), just as it can have multiple constructors. Records have a distinguished extractor (the primary deconstructor pattern), just as they have a distinguished constructor.? It should be possible to reflectively navigate from a record class to its primary ctor/dtor.? That might be by referring to them from the RecordSignature, or might be some other way (e.g., an invariant that you can use the record signature as an input to findConstructor / findDeconstructionPattern.) From brian.goetz at oracle.com Mon Apr 9 13:38:15 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 Apr 2018 09:38:15 -0400 Subject: Switch on java.lang.Class In-Reply-To: References: Message-ID: <3510133d-4147-fab9-366f-6ea42b523c4b@oracle.com> I'm skeptical of this feature, because (a) its not as widely applicable as it looks, (b) its error-prone. Both of these stem from the fact that comparing classes with == excludes subtypes.? So it really only works with final classes -- but if we had a feature like this, people might mistakenly use it with nonfinal classes, and be surprised when a subtype shows up (this can happen even when your IDE tells you there are no subtypes, because of dynamic proxies).? And all of the examples you show are in low-level libraries, which is a warning sign. Where did these snippets get their Class from?? Good chance, case 1 got it from calling Object.getClass().? In which case, they can just pattern match on the type of the thing: ??? switch (date) { ??????? case Date d: ... ??????? case Timestamp t: ... ??????? default: ... ??? } Case 2 is more likely just operating on types that it got from a reflection API.? If you have only a few entries, an if-else will do; if you have more entries, a Map is likely to be the better choice.? For situations like this, I'd rather invest in map literals or better Map.of() builders. So, I would worry this feature is unlikely to carry its weight, and further, may lead to misuse. On 4/9/2018 1:07 AM, Tagir Valeev wrote: > Hello! > > I don't remember whether switch on java.lang.Class instance was > discussed. I guess, this pattern is quite common and it will be useful > to support it. Such code often appears in deserialization logic when > we branch on desired type to deserialize. Here are a couple of > examples from opensource libraries: > > 1. com.google.gson.DefaultDateTypeAdapter#read (gson-2.8.2): > > ? ? Date date = deserializeToDate(in.nextString()); > ? ? if (dateType == Date.class) { > ? ? ? return date; > ? ? } else if (dateType == Timestamp.class) { > ? ? ? return new Timestamp(date.getTime()); > ? ? } else if (dateType == java.sql.Date.class) { > ? ? ? return new java.sql.Date(date.getTime()); > ? ? } else { > ? ? ? // This must never happen: dateType is guarded in the primary > constructor > ? ? ? throw new AssertionError(); > ? ? } > > Could be rewritten as: > > ? ? Date date = deserializeToDate(in.nextString()); > ? ? return switch(dateType) { > ? ? ? case Date.class -> date; > ? ? ? case Timestamp.class -> new Timestamp(date.getTime()); > ? ? ? case java.sql.Date.class -> new java.sql.Date(date.getTime()); > ? ? ? default -> > ? ? ? ? // This must never happen: dateType is guarded in the primary > constructor > ? ? ? ? throw new AssertionError(); > ? ? }; > > 2. > com.fasterxml.jackson.databind.deser.std.FromStringDeserializer#findDeserializer > (jackson-databind-2.9.4): > > ? ? public static Std findDeserializer(Class rawType) > ? ? { > ? ? ? ? int kind = 0; > ? ? ? ? if (rawType == File.class) { > ? ? ? ? ? ? kind = Std.STD_FILE; > ? ? ? ? } else if (rawType == URL.class) { > ? ? ? ? ? ? kind = Std.STD_URL; > ? ? ? ? } else if (rawType == URI.class) { > ? ? ? ? ? ? kind = Std.STD_URI; > ? ? ? ? } else if (rawType == Class.class) { > ? ? ? ? ? ? kind = Std.STD_CLASS; > ? ? ? ? } else if (rawType == JavaType.class) { > ? ? ? ? ? ? kind = Std.STD_JAVA_TYPE; > ? ? ? ? } else if // more branches like this > ? ? ? ? } else { > ? ? ? ? ? ? return null; > ? ? ? ? } > ? ? ? ? return new Std(rawType, kind); > ? ? } > > Could be rewritten as: > > ? ? public static Std findDeserializer(Class rawType) > ? ? { > ? ? ? ? int kind = switch(rawType) { > ? ? ? ? case File.class -> Std.STD_FILE; > ? ? ? ? case URL.class -> Std.STD_URL; > ? ? ? ? case URI.class -> Std.STD_URI; > ? ? ? ? case Class.cass -> Std.STD_CLASS; > ? ? ? ? case JavaType.class -> Std.STD_JAVA_TYPE; > ? ? ? ? ... > ? ? ? ? default -> 0; > ? ? ? ? }; > ? ? ? ? return kind == 0 ? null : new Std(rawType, kind); > ? ? } > > In such code all branches are mutually exclusive. The bootstrap method > can generate a lookupswitch based on Class.hashCode, then equals > checks, pretty similar to String switch implementation. Unlike String > hash codes Class.hashCode is not stable and varies between JVM > launches, but they are already known during the bootstrap and we can > trust them during the VM lifetime, so we can generate a lookupswitch. > The minor problematic point is to support primitive classes like > int.class. This cannot be passed directly as indy static argument, but > this can be solved with condy. > > What do you think? > > With best regards, > Tagir Valeev. > From brian.goetz at oracle.com Mon Apr 9 15:03:12 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 Apr 2018 11:03:12 -0400 Subject: Expression switch - an alternate proposal In-Reply-To: <1418035131.260541.1523274555469.JavaMail.zimbra@u-pem.fr> References: <1418035131.260541.1523274555469.JavaMail.zimbra@u-pem.fr> Message-ID: <43593260-3529-94f9-a55a-f568c5fec5f7@oracle.com> > I think i agree with you about the fact that the expression switch does need to support fallthrough, > more on that in a folowing email. I've been leaving this topic until we have ironed out the higher-order bits, but this seems a good enough time to start this discussion. > I also agree with you that mixing arrows and colons is confusing. I agree this is confusing, but I think it is also not likely to be something people do naturally -- because the -> form, where it is applicable, is so much more attractive -- so the risk of confusion is low.?? Just as style guides say to users "if you're going to use fall through, label it clearly", and most code does, style guides will guide users away from this confusion. > Basically, your proposal is to use -> eveywhere, i think i prefer the opposite, do not use arrow at all. > Using arrow in this context is disturbing because it doesn't mean the same things if it's the arrow of the lambda or the arrow inside an expression switch. This is a reasonable alternative, but I don't think it would be very popular.? I think people will really love being able to write: ??? case MONDAY -> 1; ??? case TUESDAY -> 2; and will be sad if we make them write ??? case MONDAY: break 1; ??? case TUESDAY: break 2; Not only will they be said, but they will point out that the "obvious" answer was in front of our noses, and we did something different just to be different.? (You can easily imagine the "There those Java guys go again, verbosity for its own sake" rants, but this time they might actually be right, rather than the folks who can't spell "migration compatibility" complaining about erasure.) > the problem is that currently -> means create a new function scope and not creates a new code scope. I think the scopes issue is a red herring. > So if do not mixing arrows and colons is an important goal and i think it is, i think it's better to not use arrow. Or just: avoid mixing arrows and colons. > Moreover, do we really need a shorter syntax given that we can use break and a value ? I suggest you do this poll at Devoxx.? Make sure to wear flame-proof pants! > and now we can discuss about adding a shorter syntax by making break optional if there is one expression. ... which we expect to be true almost all the time. From brian.goetz at oracle.com Mon Apr 9 19:14:47 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 9 Apr 2018 15:14:47 -0400 Subject: Switch expressions -- gathering the threads Message-ID: <403596bb-406b-6b99-1dd5-420f7bea5dfa@oracle.com> There's been some active discussion on "Is this the switch expression construct we're looking for" over on amber-dev.? Its a good time to take stock of where we are, and identifying any loose ends. ## Approach Our approach is driven not merely by the desire to have an expression form of switch, but to make switch more generally useful as a multi-way conditional construct.? The biggest driver here of course is making it work well with pattern matching. Pattern matching is a driver for better handling of nulls and primitives (though these are also useful on their own); additionally, the more useful we make switch, the more obvious the cumbersomeness of its statement-orientation becomes.? Pattern matching also pushes hard on the somewhat unfortunate scoping behavior; a straightforward interpretation of existing scoping of locals in switch would not be very good for pattern bindings. At first, given all the constraints of existing switches, we thought it unlikely that we'd be able to get away with teaching switch some new tricks, and would have to create a new construct (say, "match").? Bit by bit, though, we were able to chip away at the accidental complexity of the { constants, patterns } x { statement, expression } space, to the point where it seemed practical to unify the construct. Having a single construct has pros and cons. On the other hand, entities should not be multipled without necessity; on the other, a one-size-fits-all construct might exhibit schizoid behavior.? And the switch statement probably has more unusual (some would say objectionable) behaviors than any other Java construct, putting us in tension between compatibility and perceived complexity. ## Current proposal The current proposal starts with existing statement switch, extending `break` to support a value, and requiring that the value-ness of the break match the value-ness of the switch (just as return must with methods or lambdas).? We also slightly adjust the rules regarding nonlocal control flow _through_ a switch switch.? Because expression switches are expressions, they must be total.? For expression switches over enums and sealed types, we have the option to infer a throwing default when all sealed members are provided. We then offer a shorthand form for case labels in expression switches, that: ??? case P -> e; is shorthand for ??? case P: break e; This leaves the following differences between expression switches and statement switches: ?- Expression switches are required to be exhaustive; statement switches cannot be required to be exhaustive. ?- Expression switches permit the `->` shorthand form. ?- Expression switches may restrict fallthrough in some way, or may not, TBD. ?- You can `return` and `continue` out of a statement switch, but not out of an expression switch (like lambdas.) ?- You cannot `break` or `continue` _through_ an expression switch (like lambdas and conditionals.) And leaves some open issues for discussion: ?- We have some options as to whether to restrict fallthrough in expression switches, and also whether to restrict fallthrough into patterns. ?- We have the option to try and give the `->` form some meaning in statement switches. ## Commentary The concerns raised so far mostly revolve around potential confusion.? Because the two forms are mostly alike, but have subtle differences, the fear is this will lead to confusion. Various schemes have been suggested to make them look more different, or to make them behave more different, to make it more clear where the lines are. For example, the following have been cited: ?- Saying `break expression` is ugly, or confusable for a labeled break; ?- Concerns that fallthrough-by-default is an even worse default for expression switches than for statements (and, if we restrict fallthrough in switch expression, the gap between the forms grows); ?- The asymmetry of the implicit throwing default in apparently-exhaustive enum switches will be a sharp edge; ?- That a user might not be able to tell, by looking at the middle of a large switch, whether its an expression or statement switch? ?- The possibility people will write code with mixed label forms (colon and arrow) seems to scare the heck out of people; ?- The arrows might confuse people with similarity to lambdas. My reaction to most of these is "meh".? I think the arrow-form is going to be so preferable that the risk of fallthrough will be low (because there are few statements in the first place), and can be lowered further with restrictions; similarly, I think unrestricted mixing of arrow and colon forms will be quite rare (except for the case where there is one catch-all case, often a default, which will take statement form, which seems mostly harmless), and strongly discouraged.? And that means that the confusion between expression and statement will be nonexistent -- because the expression ones will have arrows and the statement ones will not. There are also a number of calls for "If X is rare, just disallow X", where X could be a statement-plus-expression form in expression switches or mixed label forms in one switch.? The problem is that they are usually not rare _enough_ that their lack would not cause a different kind of backlash. #### Some alternatives that have been suggested **Separate keyword.**? Having a separate keyword ("choose") for expression switch seems like it should dispel all the "but people will be confused" issues, but I'm not sure it actually will. Because the two constructs will still be so similar, the differences will likely still be surprises to people. It is also not a magic wand; we still have to figure out how to deal with statement+expression compounds, and doesn't automatically rule out the "mixed colons and arrows" problem. **Block expression**.? For the "mixed colons and arrows" problem, several have suggested some sort of ad-hoc, switch-specific block expression, but from a language evolution perspective, I think this is a cure is worse than the disease.? Having an ad-hoc form just for switch is terrible, and adding a general block expression form to the language is not where we want to go -- and doing it to avoid the perception of rampant mixed colons-and-arrows would be killing a dust mite with a napalm blast. **No colons in expression switch.**? Without a block expression, this is a non-starter; there are way too many legitimate uses for compound expressions in expressions witches. **No mixed colons and arrows**.? This will be intensely irritating to users; if you add one compound expression in a 50-way switch, you have to change 49 others from the nice form to the nasty one. ## Open issues The main issue we need to address is whether we want to restrict fallthrough in expression switches (or in the extreme case, prohibit it entirely.) One argument why fallthrough might be desirable is that some existing statement switches that make use of fallthrough (such as string or packet parsers) could become expression switches; these frequently have a "main result" they want to return (such as the index of the next character), while at the same time recording some side state about the context.? Refactoring these to expression switches could be beneficial just as it is for many other statement switches.? On the other hand, it would also be reasonable say we should leave these cases in statement-world where they are now. A form of fallthrough that I think may be more common in expression switches is when something wants to fall _into_ the default: ??? int x = switch (y) { ??????? case "Foo" -> 1; ??????? case "Bar" -> 2; ??????? case null: ??????? default: ??????????? // handle exceptional case here ??? } Because `default` is not a pattern, we can't say: ??? case null, default: here.? (Well, we could make it one.)? Though we could carve out an exception for such "trivial" fallthrough. I think a reasonable restriction that might preserve flexibility while avoiding most accidental uses is to make it illegal to fall _into_ an arrow-labeled case; if you want fallthrough, stay in colon-world.? (It's impossible to fall _out of_ an arrow case.) Given that most users would rather live in arrow-world, this means that for practical purposes, there's no fallthrough in expression switches at all, but advanced users have a fallback that works just like the switch and fallthrough they've always known. While it is not specific to expression vs statement switch, we should also ask whether we want to restrict fallthrough into certain kinds of pattern labels (i.e., those without binding variables), even in statement switch.? (I don't really see the point, though; I don't see a path to getting rid of the breaks, which would be the real payoff.)? Further, because of the intersection rules about OR pattern, its more likely an accidental fallthrough from one pattern label to another would result in a compile error anyway. #### -> in statement switch Finally, people have asked about whether we should consider allowing `->` for statement switches too (perhaps on the theory that they're kind of like void-valued expression switches.)? I see the attraction here -- when the majority of actions are single-line, this would be a winner, and you could drop the breaks.? However, because the distribution of statement count in switch arms is all over the map, this would dramatically increase the the prevalence of mixed colon-and-arrow switches, and probably further exposing people to the risk of accidental fallthrough, as now break is needed sometimes and not others _in the same statement switch_. From amaembo at gmail.com Tue Apr 10 05:12:59 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Tue, 10 Apr 2018 12:12:59 +0700 Subject: Switch on java.lang.Class In-Reply-To: <3510133d-4147-fab9-366f-6ea42b523c4b@oracle.com> References: <3510133d-4147-fab9-366f-6ea42b523c4b@oracle.com> Message-ID: Hello! Does not sound convincing. First, to my experience, it's quite widely applicable. My first two samples were from what you called a low-level libraries just because I wanted to grep something well-known. Now I grepped jdk10 source by `\w+ == \w+\.class` and scanned manually about 10% of the results and found about 10 places where it's useful (some examples are shown below). So extrapolating I may assume that this construct can be applied roughly 100 times in JDK (note that my regexp does not cover xyz.equals(foo.class) and some developers prefer this style; also different spacing is not covered). You may surely call the JDK code as "low-level libraries", but grepping IntelliJ IDEA source code I also see significant amount of occurrences. Though I don't see why usefulness of the feature in a low-level libraries should be the warning sign. In any case I'm pretty sure that switch on class will be more applicable, than the switch on floats. But you are doing the switch on floats. Why? For consistency, of course. You want to support all literals in switch. But class literals are also literals, according to JLS 15.8.2, so it is inconsistent not to support them (especially taking into account that their usefulness is not the lowest of all possible literals). Another comparison: all literals (including class literals) and enum values are acceptable as annotation values. The same in switch expressions, but excluding the class literals, which is inconsistent. I don't buy an error-prone argument either. Is `switch(doubleValue) {case Math.PI: ...}` error-prone? Why somebody cannot assume that the comparison should tolerate some delta difference between doubleValue and Math.PI? Somebody surely can, but that's silly. We know that the switch checks for equality, it was always so. It will be so for classes as well, and assuming something different is inconsistent. After all, writing foo.equals(Bar.class) or foo == Bar.class is allowed in the language, people use these constructions, and often it's the right thing to do. Of course their code becomes erroneous sometimes, because in this particular place the inheritance should be taken into account. But the same is true for doubleValue == Math.PI comparison: sometimes it's ok, sometimes it's wrong and some tolerance interval should be checked instead. And when it's ok, you add a new option to use switch on doubles. Several code samples found in JDK: 1. javafx.base/javafx/util/converter/LocalDateTimeStringConverter.java:197 (final classes) if (type == LocalDate.class) { return (T)LocalDate.from(chronology.date(temporal)); } else if (type == LocalTime.class) { return (T)LocalTime.from(temporal); } else { return (T)LocalDateTime.from(chronology.localDateTime(temporal)); } 2. java.desktop/sun/print/Win32PrintService.java:928 (final classes) if (category == ColorSupported.class) { int caps = getPrinterCapabilities(); if ((caps & DEVCAP_COLOR) != 0) { return (T)ColorSupported.SUPPORTED; } else { return (T)ColorSupported.NOT_SUPPORTED; } } else if (category == PrinterName.class) { return (T)getPrinterName(); } else if (category == PrinterState.class) { return (T)getPrinterState(); } else if (category == PrinterStateReasons.class) { return (T)getPrinterStateReasons(); } else if (category == QueuedJobCount.class) { return (T)getQueuedJobCount(); } else if (category == PrinterIsAcceptingJobs.class) { return (T)getPrinterIsAcceptingJobs(); } else { return null; } 3. com.sun.media.sound.SoftSynthesizer#getPropertyInfo (line 926) - final classes; several blocks like this if (c == Byte.class) item2.value = Byte.valueOf(s); else if (c == Short.class) item2.value = Short.valueOf(s); else if (c == Integer.class) item2.value = Integer.valueOf(s); else if (c == Long.class) item2.value = Long.valueOf(s); else if (c == Float.class) item2.value = Float.valueOf(s); else if (c == Double.class) item2.value = Double.valueOf(s); 4. java.awt.Component#getListeners (interfaces!), Window#getListeners, List#getListeners, JComponent#getListeners, etc. are similar public T[] getListeners(Class listenerType) { EventListener l = null; if (listenerType == ComponentListener.class) { l = componentListener; } else if (listenerType == FocusListener.class) { l = focusListener; } else if (listenerType == HierarchyListener.class) { l = hierarchyListener; } else if (listenerType == HierarchyBoundsListener.class) { l = hierarchyBoundsListener; } else if (listenerType == KeyListener.class) { l = keyListener; } else if (listenerType == MouseListener.class) { l = mouseListener; } else if (listenerType == MouseMotionListener.class) { l = mouseMotionListener; } else if (listenerType == MouseWheelListener.class) { l = mouseWheelListener; } else if (listenerType == InputMethodListener.class) { l = inputMethodListener; } else if (listenerType == PropertyChangeListener.class) { return (T[])getPropertyChangeListeners(); } return AWTEventMulticaster.getListeners(l, listenerType); } 5. java.beans.XMLEncoder#primitiveTypeFor (final classes) if (wrapper == Boolean.class) return Boolean.TYPE; if (wrapper == Byte.class) return Byte.TYPE; if (wrapper == Character.class) return Character.TYPE; if (wrapper == Short.class) return Short.TYPE; if (wrapper == Integer.class) return Integer.TYPE; if (wrapper == Long.class) return Long.TYPE; if (wrapper == Float.class) return Float.TYPE; if (wrapper == Double.class) return Double.TYPE; if (wrapper == Void.class) return Void.TYPE; return null; 6. javax.swing.plaf.synth.SynthTableUI.SynthTableCellRenderer#configureValue (mix of abstract, non-final and final classes) private void configureValue(Object value, Class columnClass) { if (columnClass == Object.class || columnClass == null) { // case Object.class, null! setHorizontalAlignment(JLabel.LEADING); } else if (columnClass == Float.class || columnClass == Double.class) { if (numberFormat == null) { numberFormat = NumberFormat.getInstance(); } setHorizontalAlignment(JLabel.TRAILING); setText((value == null) ? "" : ((NumberFormat)numberFormat).format(value)); } else if (columnClass == Number.class) { setHorizontalAlignment(JLabel.TRAILING); // Super will have set value. } else if (columnClass == Icon.class || columnClass == ImageIcon.class) { setHorizontalAlignment(JLabel.CENTER); setIcon((value instanceof Icon) ? (Icon)value : null); setText(""); } else if (columnClass == Date.class) { if (dateFormat == null) { dateFormat = DateFormat.getDateInstance(); } setHorizontalAlignment(JLabel.LEADING); setText((value == null) ? "" : ((Format)dateFormat).format(value)); } else { configureValue(value, columnClass.getSuperclass()); // note this: recursively going to superclass automatically } } With best regards, Tagir Valeev. On Mon, Apr 9, 2018 at 8:38 PM, Brian Goetz wrote: > I'm skeptical of this feature, because (a) its not as widely applicable as > it looks, (b) its error-prone. > > Both of these stem from the fact that comparing classes with == excludes > subtypes. So it really only works with final classes -- but if we had a > feature like this, people might mistakenly use it with nonfinal classes, > and be surprised when a subtype shows up (this can happen even when your > IDE tells you there are no subtypes, because of dynamic proxies). And all > of the examples you show are in low-level libraries, which is a warning > sign. > > Where did these snippets get their Class from? Good chance, case 1 got it > from calling Object.getClass(). In which case, they can just pattern match > on the type of the thing: > > switch (date) { > case Date d: ... > case Timestamp t: ... > default: ... > } > > Case 2 is more likely just operating on types that it got from a > reflection API. If you have only a few entries, an if-else will do; if you > have more entries, a Map is likely to be the better choice. For situations > like this, I'd rather invest in map literals or better Map.of() builders. > > So, I would worry this feature is unlikely to carry its weight, and > further, may lead to misuse. > > > > On 4/9/2018 1:07 AM, Tagir Valeev wrote: > >> Hello! >> >> I don't remember whether switch on java.lang.Class instance was >> discussed. I guess, this pattern is quite common and it will be useful to >> support it. Such code often appears in deserialization logic when we branch >> on desired type to deserialize. Here are a couple of examples from >> opensource libraries: >> >> 1. com.google.gson.DefaultDateTypeAdapter#read (gson-2.8.2): >> >> Date date = deserializeToDate(in.nextString()); >> if (dateType == Date.class) { >> return date; >> } else if (dateType == Timestamp.class) { >> return new Timestamp(date.getTime()); >> } else if (dateType == java.sql.Date.class) { >> return new java.sql.Date(date.getTime()); >> } else { >> // This must never happen: dateType is guarded in the primary >> constructor >> throw new AssertionError(); >> } >> >> Could be rewritten as: >> >> Date date = deserializeToDate(in.nextString()); >> return switch(dateType) { >> case Date.class -> date; >> case Timestamp.class -> new Timestamp(date.getTime()); >> case java.sql.Date.class -> new java.sql.Date(date.getTime()); >> default -> >> // This must never happen: dateType is guarded in the primary >> constructor >> throw new AssertionError(); >> }; >> >> 2. com.fasterxml.jackson.databind.deser.std.FromStringDeserializer#findDeserializer >> (jackson-databind-2.9.4): >> >> public static Std findDeserializer(Class rawType) >> { >> int kind = 0; >> if (rawType == File.class) { >> kind = Std.STD_FILE; >> } else if (rawType == URL.class) { >> kind = Std.STD_URL; >> } else if (rawType == URI.class) { >> kind = Std.STD_URI; >> } else if (rawType == Class.class) { >> kind = Std.STD_CLASS; >> } else if (rawType == JavaType.class) { >> kind = Std.STD_JAVA_TYPE; >> } else if // more branches like this >> } else { >> return null; >> } >> return new Std(rawType, kind); >> } >> >> Could be rewritten as: >> >> public static Std findDeserializer(Class rawType) >> { >> int kind = switch(rawType) { >> case File.class -> Std.STD_FILE; >> case URL.class -> Std.STD_URL; >> case URI.class -> Std.STD_URI; >> case Class.cass -> Std.STD_CLASS; >> case JavaType.class -> Std.STD_JAVA_TYPE; >> ... >> default -> 0; >> }; >> return kind == 0 ? null : new Std(rawType, kind); >> } >> >> In such code all branches are mutually exclusive. The bootstrap method >> can generate a lookupswitch based on Class.hashCode, then equals checks, >> pretty similar to String switch implementation. Unlike String hash codes >> Class.hashCode is not stable and varies between JVM launches, but they are >> already known during the bootstrap and we can trust them during the VM >> lifetime, so we can generate a lookupswitch. The minor problematic point is >> to support primitive classes like int.class. This cannot be passed directly >> as indy static argument, but this can be solved with condy. >> >> What do you think? >> >> With best regards, >> Tagir Valeev. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Apr 10 08:02:24 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 10 Apr 2018 10:02:24 +0200 (CEST) Subject: Expression switch - an alternate proposal In-Reply-To: <43593260-3529-94f9-a55a-f568c5fec5f7@oracle.com> References: <1418035131.260541.1523274555469.JavaMail.zimbra@u-pem.fr> <43593260-3529-94f9-a55a-f568c5fec5f7@oracle.com> Message-ID: <113201964.596558.1523347344352.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" , "Stephen Colebourne" > Cc: "amber-spec-experts" > Envoy?: Lundi 9 Avril 2018 17:03:12 > Objet: Re: Expression switch - an alternate proposal >> I think i agree with you about the fact that the expression switch does need to >> support fallthrough, >> more on that in a folowing email. > > I've been leaving this topic until we have ironed out the higher-order > bits, but this seems a good enough time to start this discussion. > >> I also agree with you that mixing arrows and colons is confusing. > I agree this is confusing, but I think it is also not likely to be > something people do naturally -- because the -> form, where it is > applicable, is so much more attractive -- so the risk of confusion is > low.?? Just as style guides say to users "if you're going to use fall > through, label it clearly", and most code does, style guides will guide > users away from this confusion. > >> Basically, your proposal is to use -> eveywhere, i think i prefer the opposite, >> do not use arrow at all. >> Using arrow in this context is disturbing because it doesn't mean the same >> things if it's the arrow of the lambda or the arrow inside an expression >> switch. > > This is a reasonable alternative, but I don't think it would be very > popular.? I think people will really love being able to write: > > ??? case MONDAY -> 1; > ??? case TUESDAY -> 2; > > and will be sad if we make them write > > ??? case MONDAY: break 1; > ??? case TUESDAY: break 2; > > Not only will they be said, but they will point out that the "obvious" > answer was in front of our noses, and we did something different just to > be different.? (You can easily imagine the "There those Java guys go > again, verbosity for its own sake" rants, but this time they might > actually be right, rather than the folks who can't spell "migration > compatibility" complaining about erasure.) Apart from the semantics difference between -> inside a lambda and -> inside a case, the fact that you can use -> but not -> { } let me think that if we need a shorter syntax, a one that use -> is not the best one. > >> the problem is that currently -> means create a new function scope and not >> creates a new code scope. > > I think the scopes issue is a red herring. > >> So if do not mixing arrows and colons is an important goal and i think it is, i >> think it's better to not use arrow. > > Or just: avoid mixing arrows and colons. That's may be hard, if you take ASM code as an example, we have two kind of switchs, low level ones to parse method descriptor, generics signature, etc that will continue to use the statement descriptor and "association" switch, that associate a value to another value, when ASM transforms the high level Visitor API to low level bytecodes or when ASM does abstract analysis like computing the stack frames, those can be transformed to expression switch but if you take a look to these switch, usually there is do computation/allocations so written as an expression switch, there will be case with one single expression (most of them) but also one or two cases by switch that will assign a local variable, so with the current proposed syntax, it means mixing arrows and colons. I'm sure there are other shorter syntax possible that does not use ->, technically we do even need the symbol ->, so why not just use ':' as a shorter syntax. You may think that it means that the grammar as to be smarter to distinguish between a single expression and a statement that may be followed by other statements but you can parse everything as statements and in a later phase if there is only one expression consider it as a break expression. The main drawback i see by not having to use '->' in the grammar is that you can not allow fallthrough but i think we should disable fallthrough in an expression switch anyway. So in term of design, i see it in the opposite way, the fact that we do not allow fallthrough allow us more degree of freedom in term of syntax so let us use a more regular syntax by avoiding to introduce '->'. I think not introducing -> as also the nice effect of making the expression switch less alien compared to the statement switch because it remove one of the syntactic difference between them. > >> Moreover, do we really need a shorter syntax given that we can use break and a >> value ? > > I suggest you do this poll at Devoxx.? Make sure to wear flame-proof pants! I have a 3 hours session with Jos? Paumard at Devoxx France (only 3000 attendees, so a little smaller than the real Devoxx in Belgium) next week on amber and valhalla. So i will run the poll, we will see. For the pants, i've a plan :) > >> and now we can discuss about adding a shorter syntax by making break optional if >> there is one expression. > > ... which we expect to be true almost all the time. R?mi From brian.goetz at oracle.com Tue Apr 10 12:25:48 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 10 Apr 2018 08:25:48 -0400 Subject: Switch on java.lang.Class In-Reply-To: References: <3510133d-4147-fab9-366f-6ea42b523c4b@oracle.com> Message-ID: <0d10a4b9-8153-3c1e-9981-97c38a9dd16a@oracle.com> > Though I don't see why usefulness of the feature in a low-level > libraries should be the warning sign. I don't dispute it's usefulness, but I'm sure you agree that "usefulness > 0" is not the bar for putting a feature in the language -- it is way, way higher than that. My bigger concern is that it is error-prone -- especially if we release type-test patterns and Class constant patterns at the same time.? Many users will be tempted to use Class patterns as a less ugly alternative to instanceof tests -- and then their code will be subtly wrong.? When given the choice of what looks like "old fashioned switch with a new type", and "new-fangled type-test patterns", many users will lean towards the former because its familiar.? And get the wrong thing. So the problem is not that its only useful to low-level users; its that others users may be tempted to use it, and get the wrong thing.? (This isn't theoretical.? "Type Switch" has been an RFE for years; in the examples presented as justification for the feature, many wrongly conflate it with instanceof.) > In any case I'm pretty sure that switch on class will be more > applicable, than the switch on floats. But you are doing theswitch on > floats. Why? For consistency, of course. You want to support all > literals in switch. But class literals are also literals, according to > JLS 15.8.2, so it is inconsistent not to support them (especially > taking into account that their usefulness is not the lowest of all > possible literals). "For consistency" arguments are always weak justifications for including a feature, because you can always find a precedent or rule to be consistent with.? The justification is not merely for consistency; it is to avoid introducing _new_ asymmetries.? It would be silly to not allow float literals as patterns; then you couldn't match against `Complex(0.0f, 0.0f)`. So the choice is not about float in _switch_, but about float as a _pattern_.? Once you admit the latter, it is hard to say no to the former.? (Same with null; we're not doing `case null` for its own sake, its to support `null` as a pattern; using it in `case` is a consequence of that.) So, should class literals be a pattern?? That would also mean that you could say ??? if (x instanceof Foo.class) { ... } and it would mean something subtly different than ??? if (x instanceof Foo) { ... } That's not so good.? Or the same mistake in switch: ??? switch (anObject) { ??????? case "Foo": ??????? case String.class: ??????? ... ??? } which means "does anObject equal the constant String.class".? The confusion between `String` and `String.class` as patterns is a pretty serious risk.? And again, introducing both kinds of patterns at once makes it worse. So, while I think its a consistent and useful feature, I also don't think its a necessarily good idea to expose everyone to this confusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Tue Apr 10 19:34:13 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 10 Apr 2018 13:34:13 -0600 Subject: Switch expressions -- gathering the threads In-Reply-To: <403596bb-406b-6b99-1dd5-420f7bea5dfa@oracle.com> References: <403596bb-406b-6b99-1dd5-420f7bea5dfa@oracle.com> Message-ID: <55F4951B-277D-4AD9-A96E-DE36406C6ACB@oracle.com> > On Apr 9, 2018, at 1:14 PM, Brian Goetz wrote: > > A form of fallthrough that I think may be more common in expression switches is when something wants to fall _into_ the default: > > int x = switch (y) { > case "Foo" -> 1; > case "Bar" -> 2; > > case null: > default: > // handle exceptional case here > } > > Because `default` is not a pattern, we can't say: > > case null, default: > > here. (Well, we could make it one.) Though we could carve out an exception for such "trivial" fallthrough. As a matter of terminology, I think it would be helpful for us to not call this fallthrough at all. It creates a lot of confusion when somebody is making an assertion about fallthrough, and it's unclear whether this kind of thing is being included or not. JLS is a good guide: grammatically, the body of a switch statement is a sequence of SwitchBlocks, each of which has a sequence of SwitchLabels followed by some BlockStatements. https://docs.oracle.com/javase/specs/jls/se10/html/jls-14.html#jls-14.11 JLS doesn't formally define the concept of "fallthrough" but I suggest we use it to describe the situation in which control flows from one SwitchBlock to another. What you've illustrated is instead a "switch case with multiple labels"?something deserving scrutiny on its own, but really a different sort of problem than fallthrough. ?Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Apr 10 20:30:53 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 10 Apr 2018 22:30:53 +0200 (CEST) Subject: Switch expressions -- gathering the threads In-Reply-To: <55F4951B-277D-4AD9-A96E-DE36406C6ACB@oracle.com> References: <403596bb-406b-6b99-1dd5-420f7bea5dfa@oracle.com> <55F4951B-277D-4AD9-A96E-DE36406C6ACB@oracle.com> Message-ID: <1288016983.922021.1523392253596.JavaMail.zimbra@u-pem.fr> > De: "daniel smith" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Mardi 10 Avril 2018 21:34:13 > Objet: Re: Switch expressions -- gathering the threads >> On Apr 9, 2018, at 1:14 PM, Brian Goetz < [ mailto:brian.goetz at oracle.com | >> brian.goetz at oracle.com ] > wrote: >> A form of fallthrough that I think may be more common in expression switches is >> when something wants to fall _into_ the default: >> int x = switch (y) { >> case "Foo" -> 1; >> case "Bar" -> 2; >> case null: >> default: >> // handle exceptional case here >> } >> Because `default` is not a pattern, we can't say: >> case null, default: >> here. (Well, we could make it one.) Though we could carve out an exception for >> such "trivial" fallthrough. > As a matter of terminology, I think it would be helpful for us to not call this > fallthrough at all. It creates a lot of confusion when somebody is making an > assertion about fallthrough, and it's unclear whether this kind of thing is > being included or not. > JLS is a good guide: grammatically, the body of a switch statement is a sequence > of SwitchBlocks, each of which has a sequence of SwitchLabels followed by some > BlockStatements. > [ https://docs.oracle.com/javase/specs/jls/se10/html/jls-14.html#jls-14.11 | > https://docs.oracle.com/javase/specs/jls/se10/html/jls-14.html#jls-14.11 ] > JLS doesn't formally define the concept of "fallthrough" but I suggest we use it > to describe the situation in which control flows from one SwitchBlock to > another. > What you've illustrated is instead a "switch case with multiple > labels"?something deserving scrutiny on its own, but really a different sort of > problem than fallthrough. > ?Dan I'm not sure this difference is important. What about the example below, multiple labels or a fallthrough ? switch(x) { case 0: ; case 1: } regards, R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Apr 10 20:38:54 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 10 Apr 2018 16:38:54 -0400 Subject: Annos on records (was: Records -- Using them as JPA entities and validating them with Bean Validation) In-Reply-To: <606812520.898989.1523381675460.JavaMail.zimbra@u-pem.fr> References:

<606812520.898989.1523381675460.JavaMail.zimbra@u-pem.fr> Message-ID: [ moving to amber-spec-experts] I tend to agree.? It will take longer to adopt, but it _is_ a new kind of target in a source file, and then frameworks can decide what it should mean, and then there's no confusion. It's possible, too, as a migration move, to split the difference, though I'm not sure its worth it -- add a new target, _and_, if the target includes param/field/method, but does _not_ include record, then lower the anno onto all applicable members. On 4/10/2018 1:34 PM, Remi Forax wrote: > No, not right for me, > a new Annotation target is better so each framework can decide what it means for its annotation. > > It will slow the adoption but it's better in the long term. > > R?mi > > ----- Mail original ----- >> De: "Kevin Bourrillion" >> ?: "Gunnar Morling" >> Cc: "amber-dev" >> Envoy?: Mardi 10 Avril 2018 19:25:57 >> Objet: Re: Records -- Using them as JPA entities and validating them with Bean Validation >> On Mon, Apr 9, 2018 at 1:39 PM, Gunnar Morling wrote: >> >>> * Annotation semantics: I couldn't find any example of records with >>> annotations, but IIUC, something like >>> >>> @Entity record Book(@Id long id, String isbn) { ... } >>> >>> would desugar into >>> >>> class @Entity public class Book { private @Id long id, private >>> String isbn; ... }; >>> >>> For the JPA entity use case it'd be helpful to have an option to lift >>> annotations to the corresponding getters instead of the fields (as the >>> location of the @Id annotation controls the default strategy -- field vs. >>> property -- for reading/writing entity state). Similarly, Bean Validation >>> would benefit from such option. >>> >> My assumption has been that we would allow an annotation on a record >> parameter as long as it has *any of *{FIELD,METHOD,PARAMETER} as target, >> and that the annotation would be automatically propagated to each >> synthesized element it applies to. Does this sound about right to everyone? >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Tue Apr 10 20:42:24 2018 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 10 Apr 2018 13:42:24 -0700 Subject: Annos on records (was: Records -- Using them as JPA entities and validating them with Bean Validation) In-Reply-To: References:

<606812520.898989.1523381675460.JavaMail.zimbra@u-pem.fr> Message-ID: If we create a new ElementType.RECORD, the annotation in question won't even be *able *to add that target type until it is ready to *require* JDK 13 (or whatever) as its new minimum version. On Tue, Apr 10, 2018 at 1:38 PM, Brian Goetz wrote: > [ moving to amber-spec-experts] > > I tend to agree. It will take longer to adopt, but it _is_ a new kind of > target in a source file, and then frameworks can decide what it should > mean, and then there's no confusion. > > It's possible, too, as a migration move, to split the difference, though > I'm not sure its worth it -- add a new target, _and_, if the target > includes param/field/method, but does _not_ include record, then lower the > anno onto all applicable members. > > On 4/10/2018 1:34 PM, Remi Forax wrote: > >> No, not right for me, >> a new Annotation target is better so each framework can decide what it >> means for its annotation. >> >> It will slow the adoption but it's better in the long term. >> >> R?mi >> >> ----- Mail original ----- >> >>> De: "Kevin Bourrillion" >>> ?: "Gunnar Morling" >>> Cc: "amber-dev" >>> Envoy?: Mardi 10 Avril 2018 19:25:57 >>> Objet: Re: Records -- Using them as JPA entities and validating them >>> with Bean Validation >>> On Mon, Apr 9, 2018 at 1:39 PM, Gunnar Morling >>> wrote: >>> >>> * Annotation semantics: I couldn't find any example of records with >>>> annotations, but IIUC, something like >>>> >>>> @Entity record Book(@Id long id, String isbn) { ... } >>>> >>>> would desugar into >>>> >>>> class @Entity public class Book { private @Id long id, private >>>> String isbn; ... }; >>>> >>>> For the JPA entity use case it'd be helpful to have an option to >>>> lift >>>> annotations to the corresponding getters instead of the fields (as the >>>> location of the @Id annotation controls the default strategy -- field >>>> vs. >>>> property -- for reading/writing entity state). Similarly, Bean >>>> Validation >>>> would benefit from such option. >>>> >>>> My assumption has been that we would allow an annotation on a record >>> parameter as long as it has *any of *{FIELD,METHOD,PARAMETER} as target, >>> and that the annotation would be automatically propagated to each >>> synthesized element it applies to. Does this sound about right to >>> everyone? >>> >>> >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com >>> >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Apr 10 20:53:43 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 10 Apr 2018 16:53:43 -0400 Subject: Annos on records In-Reply-To: References:

<606812520.898989.1523381675460.JavaMail.zimbra@u-pem.fr>

Message-ID: MR-JARs busts that restriction; in the main part of your jar, you have ??? @Target(A, B) ??? @interface Foo { } and in the 13 section you have ??? @Target(A, B, RECORD) ??? @interface Foo { } (that's not the whole thing, but it means you only need wait until you _accept_ 13 rather than _require_ 13.) On 4/10/2018 4:42 PM, Kevin Bourrillion wrote: > If we create a new ElementType.RECORD, the annotation in question > won't even be /able /to add that target type until it is ready to > /require/?JDK 13 (or whatever) as its new minimum version. > > > On Tue, Apr 10, 2018 at 1:38 PM, Brian Goetz > wrote: > > [ moving to amber-spec-experts] > > I tend to agree.? It will take longer to adopt, but it _is_ a new > kind of target in a source file, and then frameworks can decide > what it should mean, and then there's no confusion. > > It's possible, too, as a migration move, to split the difference, > though I'm not sure its worth it -- add a new target, _and_, if > the target includes param/field/method, but does _not_ include > record, then lower the anno onto all applicable members. > > On 4/10/2018 1:34 PM, Remi Forax wrote: > > No, not right for me, > a new Annotation target is better so each framework can decide > what it means for its annotation. > > It will slow the adoption but it's better in the long term. > > R?mi > > ----- Mail original ----- > > De: "Kevin Bourrillion" > > ?: "Gunnar Morling" > > Cc: "amber-dev"

<606812520.898989.1523381675460.JavaMail.zimbra@u-pem.fr>

Message-ID: <350222201.925811.1523394456214.JavaMail.zimbra@u-pem.fr> Here is what i've done to support ElementType.MODULE a library that has to work with Java 8, adding a target type is usually compatible because the one that add the annotation target is often the one in control of the code that will also consume the annotation. In order to work you need to answer two questions: - how to create an annotation compatible 8 with a meta-annotation value only available in 9. using ASM to add the right value to the annotation meta-annotation is a 10 lines program, - how to consume a non existing meta-annotation value, i do a switch on the name of the enum instead of doing a switch on the enum itself. R?mi > De: "Kevin Bourrillion" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Mardi 10 Avril 2018 22:42:24 > Objet: Re: Annos on records (was: Records -- Using them as JPA entities and > validating them with Bean Validation) > If we create a new ElementType.RECORD, the annotation in question won't even be > able to add that target type until it is ready to require JDK 13 (or whatever) > as its new minimum version. > On Tue, Apr 10, 2018 at 1:38 PM, Brian Goetz < [ mailto:brian.goetz at oracle.com | > brian.goetz at oracle.com ] > wrote: >> [ moving to amber-spec-experts] >> I tend to agree. It will take longer to adopt, but it _is_ a new kind of target >> in a source file, and then frameworks can decide what it should mean, and then >> there's no confusion. >> It's possible, too, as a migration move, to split the difference, though I'm not >> sure its worth it -- add a new target, _and_, if the target includes >> param/field/method, but does _not_ include record, then lower the anno onto all >> applicable members. >> On 4/10/2018 1:34 PM, Remi Forax wrote: >>> No, not right for me, >>> a new Annotation target is better so each framework can decide what it means for >>> its annotation. >>> It will slow the adoption but it's better in the long term. >>> R?mi >>> ----- Mail original ----- >>>> De: "Kevin Bourrillion" < [ mailto:kevinb at google.com | kevinb at google.com ] > >>>> ?: "Gunnar Morling" < [ mailto:gunnar at hibernate.org | gunnar at hibernate.org ] > >>>> Cc: "amber-dev" < [ mailto:amber-dev at openjdk.java.net | >>>> amber-dev at openjdk.java.net ] > >>>> Envoy?: Mardi 10 Avril 2018 19:25:57 >>>> Objet: Re: Records -- Using them as JPA entities and validating them with Bean >>>> Validation >>>> On Mon, Apr 9, 2018 at 1:39 PM, Gunnar Morling < [ mailto:gunnar at hibernate.org | >>>> gunnar at hibernate.org ] > wrote: >>>>> * Annotation semantics: I couldn't find any example of records with >>>>> annotations, but IIUC, something like >>>>> @Entity record Book(@Id long id, String isbn) { ... } >>>>> would desugar into >>>>> class @Entity public class Book { private @Id long id, private >>>>> String isbn; ... }; >>>>> For the JPA entity use case it'd be helpful to have an option to lift >>>>> annotations to the corresponding getters instead of the fields (as the >>>>> location of the @Id annotation controls the default strategy -- field vs. >>>>> property -- for reading/writing entity state). Similarly, Bean Validation >>>>> would benefit from such option. >>>> My assumption has been that we would allow an annotation on a record >>>> parameter as long as it has *any of *{FIELD,METHOD,PARAMETER} as target, >>>> and that the annotation would be automatically propagated to each >>>> synthesized element it applies to. Does this sound about right to everyone? >>>> -- >>>> Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | >>>> kevinb at google.com ] > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | [ mailto:kevinb at google.com | > kevinb at google.com ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Apr 10 21:18:14 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 10 Apr 2018 17:18:14 -0400 Subject: Switch on java.lang.Class In-Reply-To: References: <3510133d-4147-fab9-366f-6ea42b523c4b@oracle.com> Message-ID: <9cb0d60e-18ae-b06d-915a-c29a80b823cf@oracle.com> Also, let's separate the problem from the solution here. Problem: switching on class values. Solution: make class literals be constant patterns. I don't think the problem is unworthy of solution, but I don't like the specific solution.? But, there may be other ways to get there. Here's two workarounds that you can do today: ??? switch (c.getName()) { ??????? case "java.lang.String": ... ??????? case "java.lang.Integer": ... ??? } ??? enum KnownTypes { ??????? STRING(String.class), INTEGER(Integer.class), ...; ? ?? ?? static Map classToEnum = new HashMap<>(); ??????? ... constructor populates map ... ??? } ??? switch (KnownTypes.classToEnum.get(c)) { ??????? case null: ... ??????? case STRING: ... ??????? case INTEGER: ... ??? } These are both workarounds, for sure.? (With map literals, we can make either cleaner.) What other approaches might there be?? Well, I don't want to open that discussion now, but clearly, at some point, we'll have the ability to declare explicit patterns.? This opens doors to writing your own patterns that let you switch on arbitrary inputs: ??? switch (c) { ??????? case isStringClass(): ... ??????? case isIntegerClass(): ... ??? } There's a long way to go to get there, and lots of ways to slice this, but I think, if this problem is worth solving, there are other candidate solutions that don't have the confusion downside. On 4/10/2018 1:12 AM, Tagir Valeev wrote: > Hello! > > Does not sound convincing. First, to my experience, it's quite widely > applicable. My first two samples were from what you called a low-level > libraries just because I wanted to grep something well-known. Now I > grepped jdk10 source by `\w+ == \w+\.class` and scanned manually about > 10% of the results and found about 10 places where it's useful (some > examples are shown below). So extrapolating I may assume that this > construct can be applied roughly 100 times in JDK (note that my regexp > does not cover xyz.equals(foo.class) and some developers prefer this > style; also different spacing is not covered). You may surely call the > JDK code as "low-level libraries", but grepping IntelliJ IDEA source > code I also see significant amount of occurrences. Though I don't see > why usefulness of the feature in a low-level libraries should be the > warning sign. > > In any case I'm pretty sure that switch on class will be more > applicable, than the switch on floats. But you are doing theswitch on > floats. Why? For consistency, of course. You want to support all > literals in switch. But class literals are also literals, according to > JLS 15.8.2, so it is inconsistent not to support them (especially > taking into account that their usefulness is not the lowest of all > possible literals). Another comparison: all literals (including class > literals) and enum values are acceptable as annotation values. The > same in switch expressions, but excluding the class literals, which is > inconsistent. > > I don't buy an error-prone argument either. Is `switch(doubleValue) > {case Math.PI: ...}` error-prone? Why somebody cannot assume that the > comparison should tolerate some delta difference between doubleValue > and Math.PI? Somebody surely can, but that's silly. We know that the > switch checks for equality, it was always so. It will be so for > classes as well, and assuming something different is inconsistent. > After all, writing foo.equals(Bar.class) or foo == Bar.class is > allowed in the language, people use these constructions, and often > it's the right thing to do. Of course their code becomes erroneous > sometimes, because in this particular place the inheritance should be > taken into account. But the same is true for doubleValue == Math.PI > comparison: sometimes it's ok, sometimes it's wrong and some tolerance > interval should be checked instead. And when it's ok, you add a new > option to use switch on doubles. > > Several code samples found in JDK: > > 1.?javafx.base/javafx/util/converter/LocalDateTimeStringConverter.java:197 > (final classes) > if (type == LocalDate.class) { > ? return (T)LocalDate.from(chronology.date(temporal)); > } else if (type == LocalTime.class) { > ? return (T)LocalTime.from(temporal); > } else { > ? return (T)LocalDateTime.from(chronology.localDateTime(temporal)); > } > > 2.?java.desktop/sun/print/Win32PrintService.java:928 (final classes) > ? ? ? ? if (category == ColorSupported.class) { > ? ? ? ? ? ? int caps = getPrinterCapabilities(); > ? ? ? ? ? ? if ((caps & DEVCAP_COLOR) != 0) { > ? ? ? ? ? ? ? ? return (T)ColorSupported.SUPPORTED; > ? ? ? ? ? ? } else { > ? ? ? ? ? ? ? ? return (T)ColorSupported.NOT_SUPPORTED; > ? ? ? ? ? ? } > ? ? ? ? } else if (category == PrinterName.class) { > ? ? ? ? ? ? return (T)getPrinterName(); > ? ? ? ? } else if (category == PrinterState.class) { > ? ? ? ? ? ? return (T)getPrinterState(); > ? ? ? ? } else if (category == PrinterStateReasons.class) { > ? ? ? ? ? ? return (T)getPrinterStateReasons(); > ? ? ? ? } else if (category == QueuedJobCount.class) { > ? ? ? ? ? ? return (T)getQueuedJobCount(); > ? ? ? ? } else if (category == PrinterIsAcceptingJobs.class) { > ? ? ? ? ? ? return (T)getPrinterIsAcceptingJobs(); > ? ? ? ? } else { > ? ? ? ? ? ? return null; > ? ? ? ? } > 3.?com.sun.media.sound.SoftSynthesizer#getPropertyInfo (line 926) - > final classes; several blocks like this > ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (c == Byte.class) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? item2.value = Byte.valueOf(s); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? else if (c == Short.class) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? item2.value = Short.valueOf(s); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? else if (c == Integer.class) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? item2.value = Integer.valueOf(s); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? else if (c == Long.class) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? item2.value = Long.valueOf(s); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? else if (c == Float.class) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? item2.value = Float.valueOf(s); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? else if (c == Double.class) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? item2.value = Double.valueOf(s); > 4.?java.awt.Component#getListeners (interfaces!), Window#getListeners, > List#getListeners, JComponent#getListeners, etc. are similar > ? ? public T[] getListeners(Class > listenerType) { > ? ? ? ? EventListener l = null; > ? ? ? ? if? (listenerType == ComponentListener.class) { > ? ? ? ? ? ? l = componentListener; > ? ? ? ? } else if (listenerType == FocusListener.class) { > ? ? ? ? ? ? l = focusListener; > ? ? ? ? } else if (listenerType == HierarchyListener.class) { > ? ? ? ? ? ? l = hierarchyListener; > ? ? ? ? } else if (listenerType == HierarchyBoundsListener.class) { > ? ? ? ? ? ? l = hierarchyBoundsListener; > ? ? ? ? } else if (listenerType == KeyListener.class) { > ? ? ? ? ? ? l = keyListener; > ? ? ? ? } else if (listenerType == MouseListener.class) { > ? ? ? ? ? ? l = mouseListener; > ? ? ? ? } else if (listenerType == MouseMotionListener.class) { > ? ? ? ? ? ? l = mouseMotionListener; > ? ? ? ? } else if (listenerType == MouseWheelListener.class) { > ? ? ? ? ? ? l = mouseWheelListener; > ? ? ? ? } else if (listenerType == InputMethodListener.class) { > ? ? ? ? ? ? l = inputMethodListener; > ? ? ? ? } else if (listenerType == PropertyChangeListener.class) { > ? ? ? ? ? ? return (T[])getPropertyChangeListeners(); > ? ? ? ? } > ? ? ? ? return AWTEventMulticaster.getListeners(l, listenerType); > ? ? } > 5.?java.beans.XMLEncoder#primitiveTypeFor (final classes) > ? ? ? ?if (wrapper == Boolean.class) return Boolean.TYPE; > ? ? ? ? if (wrapper == Byte.class) return Byte.TYPE; > ? ? ? ? if (wrapper == Character.class) return Character.TYPE; > ? ? ? ? if (wrapper == Short.class) return Short.TYPE; > ? ? ? ? if (wrapper == Integer.class) return Integer.TYPE; > ? ? ? ? if (wrapper == Long.class) return Long.TYPE; > ? ? ? ? if (wrapper == Float.class) return Float.TYPE; > ? ? ? ? if (wrapper == Double.class) return Double.TYPE; > ? ? ? ? if (wrapper == Void.class) return Void.TYPE; > ? ? ? ? return null; > ?6.?javax.swing.plaf.synth.SynthTableUI.SynthTableCellRenderer#configureValue > (mix of abstract, non-final and final classes) > ? ? ? ? private void configureValue(Object value, Class columnClass) { > ? ? ? ? ? ? if (columnClass == Object.class || columnClass == null) { > // case Object.class, null! > ? ? ? ? ? ? ? ? setHorizontalAlignment(JLabel.LEADING); > ? ? ? ? ? ? } else if (columnClass == Float.class || columnClass == > Double.class) { > ? ? ? ? ? ? ? ? if (numberFormat == null) { > ? ? ? ? ? ? ? ? ? ? numberFormat = NumberFormat.getInstance(); > ? ? ? ? ? ? ? ? } > ? ? ? ? ? ? ? ? setHorizontalAlignment(JLabel.TRAILING); > ? ? ? ? ? ? ? ? setText((value == null) ? "" : > ((NumberFormat)numberFormat).format(value)); > ? ? ? ? ? ? } > ? ? ? ? ? ? else if (columnClass == Number.class) { > ? ? ? ? ? ? ? ? setHorizontalAlignment(JLabel.TRAILING); > ? ? ? ? ? ? ? ? // Super will have set value. > ? ? ? ? ? ? } > ? ? ? ? ? ? else if (columnClass == Icon.class || columnClass == > ImageIcon.class) { > ? ? ? ? ? ? ? ? setHorizontalAlignment(JLabel.CENTER); > ? ? ? ? ? ? ? ? setIcon((value instanceof Icon) ? (Icon)value : null); > ? ? ? ? ? ? ? ? setText(""); > ? ? ? ? ? ? } > ? ? ? ? ? ? else if (columnClass == Date.class) { > ? ? ? ? ? ? ? ? if (dateFormat == null) { > ? ? ? ? ? ? ? ? ? ? dateFormat = DateFormat.getDateInstance(); > ? ? ? ? ? ? ? ? } > ? ? ? ? ? ? ? ? setHorizontalAlignment(JLabel.LEADING); > ? ? ? ? ? ? ? ? setText((value == null) ? "" : > ((Format)dateFormat).format(value)); > ? ? ? ? ? ? } > ? ? ? ? ? ? else { > ? ? ? ? ? ? ? ? configureValue(value, columnClass.getSuperclass()); // > note this: recursively going to superclass automatically > ? ? ? ? ? ? } > ? ? ? ? } > > With best regards, > Tagir Valeev. > > > On Mon, Apr 9, 2018 at 8:38 PM, Brian Goetz > wrote: > > I'm skeptical of this feature, because (a) its not as widely > applicable as it looks, (b) its error-prone. > > Both of these stem from the fact that comparing classes with == > excludes subtypes.? So it really only works with final classes -- > but if we had a feature like this, people might mistakenly use it > with nonfinal classes, and be surprised when a subtype shows up > (this can happen even when your IDE tells you there are no > subtypes, because of dynamic proxies).? And all of the examples > you show are in low-level libraries, which is a warning sign. > > Where did these snippets get their Class from?? Good chance, case > 1 got it from calling Object.getClass().? In which case, they can > just pattern match on the type of the thing: > > ??? switch (date) { > ??????? case Date d: ... > ??????? case Timestamp t: ... > ??????? default: ... > ??? } > > Case 2 is more likely just operating on types that it got from a > reflection API.? If you have only a few entries, an if-else will > do; if you have more entries, a Map is likely to be the better > choice.? For situations like this, I'd rather invest in map > literals or better Map.of() builders. > > So, I would worry this feature is unlikely to carry its weight, > and further, may lead to misuse. > > > > On 4/9/2018 1:07 AM, Tagir Valeev wrote: > > Hello! > > I don't remember whether switch on java.lang.Class instance > was discussed. I guess, this pattern is quite common and it > will be useful to support it. Such code often appears in > deserialization logic when we branch on desired type to > deserialize. Here are a couple of examples from opensource > libraries: > > 1. com.google.gson.DefaultDateTypeAdapter#read (gson-2.8.2): > > ? ? Date date = deserializeToDate(in.nextString()); > ? ? if (dateType == Date.class) { > ? ? ? return date; > ? ? } else if (dateType == Timestamp.class) { > ? ? ? return new Timestamp(date.getTime()); > ? ? } else if (dateType == java.sql.Date.class) { > ? ? ? return new java.sql.Date(date.getTime()); > ? ? } else { > ? ? ? // This must never happen: dateType is guarded in the > primary constructor > ? ? ? throw new AssertionError(); > ? ? } > > Could be rewritten as: > > ? ? Date date = deserializeToDate(in.nextString()); > ? ? return switch(dateType) { > ? ? ? case Date.class -> date; > ? ? ? case Timestamp.class -> new Timestamp(date.getTime()); > ? ? ? case java.sql.Date.class -> new > java.sql.Date(date.getTime()); > ? ? ? default -> > ? ? ? ? // This must never happen: dateType is guarded in the > primary constructor > ? ? ? ? throw new AssertionError(); > ? ? }; > > 2. > com.fasterxml.jackson.databind.deser.std.FromStringDeserializer#findDeserializer > (jackson-databind-2.9.4): > > ? ? public static Std findDeserializer(Class rawType) > ? ? { > ? ? ? ? int kind = 0; > ? ? ? ? if (rawType == File.class) { > ? ? ? ? ? ? kind = Std.STD_FILE; > ? ? ? ? } else if (rawType == URL.class) { > ? ? ? ? ? ? kind = Std.STD_URL; > ? ? ? ? } else if (rawType == URI.class) { > ? ? ? ? ? ? kind = Std.STD_URI; > ? ? ? ? } else if (rawType == Class.class) { > ? ? ? ? ? ? kind = Std.STD_CLASS; > ? ? ? ? } else if (rawType == JavaType.class) { > ? ? ? ? ? ? kind = Std.STD_JAVA_TYPE; > ? ? ? ? } else if // more branches like this > ? ? ? ? } else { > ? ? ? ? ? ? return null; > ? ? ? ? } > ? ? ? ? return new Std(rawType, kind); > ? ? } > > Could be rewritten as: > > ? ? public static Std findDeserializer(Class rawType) > ? ? { > ? ? ? ? int kind = switch(rawType) { > ? ? ? ? case File.class -> Std.STD_FILE; > ? ? ? ? case URL.class -> Std.STD_URL; > ? ? ? ? case URI.class -> Std.STD_URI; > ? ? ? ? case Class.cass -> Std.STD_CLASS; > ? ? ? ? case JavaType.class -> Std.STD_JAVA_TYPE; > ? ? ? ? ... > ? ? ? ? default -> 0; > ? ? ? ? }; > ? ? ? ? return kind == 0 ? null : new Std(rawType, kind); > ? ? } > > In such code all branches are mutually exclusive. The > bootstrap method can generate a lookupswitch based on > Class.hashCode, then equals checks, pretty similar to String > switch implementation. Unlike String hash codes Class.hashCode > is not stable and varies between JVM launches, but they are > already known during the bootstrap and we can trust them > during the VM lifetime, so we can generate a lookupswitch. The > minor problematic point is to support primitive classes like > int.class. This cannot be passed directly as indy static > argument, but this can be solved with condy. > > What do you think? > > With best regards, > Tagir Valeev. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Apr 10 21:19:16 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 10 Apr 2018 17:19:16 -0400 Subject: Annos on records In-Reply-To: <350222201.925811.1523394456214.JavaMail.zimbra@u-pem.fr> References:

<606812520.898989.1523381675460.JavaMail.zimbra@u-pem.fr>

<350222201.925811.1523394456214.JavaMail.zimbra@u-pem.fr> Message-ID: <6938fa4e-6f08-865b-88f4-fbf31ff70e2d@oracle.com> And MR-Jar obviates using ASM to add the right value to the meta-annotation; you just have two sources, one for 13 and one for prior. On 4/10/2018 5:07 PM, Remi Forax wrote: > Here is what i've done to support ElementType.MODULE a library that > has to work with Java 8, > adding a target type is usually compatible because the one that add > the annotation target is often the one in control of the code that > will also consume the annotation. > > In order to work you need to answer two questions: > ? - how to create an annotation compatible 8 with a meta-annotation > value only available in 9. > ??? using ASM to add the right value to the annotation meta-annotation > is a 10 lines program, > ?- how to consume a non existing meta-annotation value, > ?? i do a switch on the name of the enum instead of doing a switch on > the enum itself. > > R?mi > > ------------------------------------------------------------------------ > > *De: *"Kevin Bourrillion" > *?: *"Brian Goetz" > *Cc: *"amber-spec-experts" > *Envoy?: *Mardi 10 Avril 2018 22:42:24 > *Objet: *Re: Annos on records (was: Records -- Using them as JPA > entities and validating them with Bean Validation) > > If we create a new ElementType.RECORD, the annotation in question > won't even be /able /to add that target type until it is ready to > /require/?JDK 13 (or whatever) as its new minimum version. > > > On Tue, Apr 10, 2018 at 1:38 PM, Brian Goetz > > wrote: > > [ moving to amber-spec-experts] > > I tend to agree.? It will take longer to adopt, but it _is_ a > new kind of target in a source file, and then frameworks can > decide what it should mean, and then there's no confusion. > > It's possible, too, as a migration move, to split the > difference, though I'm not sure its worth it -- add a new > target, _and_, if the target includes param/field/method, but > does _not_ include record, then lower the anno onto all > applicable members. > > On 4/10/2018 1:34 PM, Remi Forax wrote: > > No, not right for me, > a new Annotation target is better so each framework can > decide what it means for its annotation. > > It will slow the adoption but it's better in the long term. > > R?mi > > ----- Mail original ----- > > De: "Kevin Bourrillion" > > ?: "Gunnar Morling" > > Cc: "amber-dev"

> > Envoy?: Mardi 10 Avril 2018 19:25:57 > Objet: Re: Records -- Using them as JPA entities and > validating them with Bean Validation > On Mon, Apr 9, 2018 at 1:39 PM, Gunnar Morling > > > wrote: > > ? ?* Annotation semantics: I couldn't find any > example of records with > annotations, but IIUC, something like > > ? ? ? ? ?@Entity record Book(@Id long id, String > isbn) { ... } > > ? ? ?would desugar into > > ? ? ? ? ?class @Entity public class Book { private > @Id long id, private > String isbn; ... }; > > ? ? ?For the JPA entity use case it'd be helpful > to have an option to lift > annotations to the corresponding getters instead > of the fields (as the > location of the @Id annotation controls the > default strategy -- field vs. > property -- for reading/writing entity state). > Similarly, Bean Validation > would benefit from such option. > > My assumption has been that we would allow an > annotation on a record > parameter as long as it has *any of > *{FIELD,METHOD,PARAMETER} as target, > and that the annotation would be automatically > propagated to each > synthesized element it applies to. Does this sound > about right to everyone? > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | > kevinb at google.com > > > > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, > Inc.?|kevinb at google.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Tue Apr 10 22:20:01 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 10 Apr 2018 16:20:01 -0600 Subject: Switch expressions -- gathering the threads In-Reply-To: <1288016983.922021.1523392253596.JavaMail.zimbra@u-pem.fr> References: <403596bb-406b-6b99-1dd5-420f7bea5dfa@oracle.com> <55F4951B-277D-4AD9-A96E-DE36406C6ACB@oracle.com> <1288016983.922021.1523392253596.JavaMail.zimbra@u-pem.fr> Message-ID: <324360E8-22AC-4947-8D3F-6D364436CA0A@oracle.com> > On Apr 10, 2018, at 2:30 PM, Remi Forax wrote: > > I'm not sure this difference is important. > > What about the example below, multiple labels or a fallthrough ? > switch(x) { > case 0: > ; > case 1: > } My request is to call this an example of fallthrough. I think you're trying to make a point that some forms of switches with fallthrough behave the same as switches with multiple labels. Sure, that's fine. I still think it's helpful to talk about the two cases separately, as distinct features, because the practical use cases are very different. ?Dan From forax at univ-mlv.fr Tue Apr 10 22:47:45 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 11 Apr 2018 00:47:45 +0200 (CEST) Subject: Switch expressions -- gathering the threads In-Reply-To: <324360E8-22AC-4947-8D3F-6D364436CA0A@oracle.com> References: <403596bb-406b-6b99-1dd5-420f7bea5dfa@oracle.com> <55F4951B-277D-4AD9-A96E-DE36406C6ACB@oracle.com> <1288016983.922021.1523392253596.JavaMail.zimbra@u-pem.fr> <324360E8-22AC-4947-8D3F-6D364436CA0A@oracle.com> Message-ID: <834610449.936839.1523400465859.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "daniel smith" > ?: "Remi Forax" > Cc: "Brian Goetz" , "amber-spec-experts" > Envoy?: Mercredi 11 Avril 2018 00:20:01 > Objet: Re: Switch expressions -- gathering the threads >> On Apr 10, 2018, at 2:30 PM, Remi Forax wrote: >> >> I'm not sure this difference is important. >> >> What about the example below, multiple labels or a fallthrough ? >> switch(x) { >> case 0: >> ; >> case 1: >> } > > My request is to call this an example of fallthrough. > > I think you're trying to make a point that some forms of switches with > fallthrough behave the same as switches with multiple labels. Sure, that's > fine. I still think it's helpful to talk about the two cases separately, as > distinct features, because the practical use cases are very different. I think is see all forms as being fallthrough and what you call a multiple labels form as the result after a peephole optimization, i.e if there is no instruction between the two cases, then the compiler will make them share the same label. > > ?Dan R?mi From gavin.bierman at oracle.com Thu Apr 12 21:27:06 2018 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Thu, 12 Apr 2018 22:27:06 +0100 Subject: JEP325: Switch expressions spec Message-ID: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> I have uploaded a draft spec for JEP 325: Switch expressions at http://cr.openjdk.java.net/~gbierman/switch-expressions.html Note there are still three things missing: * There is no text about typing a switch expression, as this is still being discussed on this list. * There is no name given for the exception raised at runtime when a switch expression fails to find a matching pattern label, as this is still being discussed on this list. * The spec currently permits fall through from a "case pattern:? statement group into a "case pattern ->" clause. We are still working through the consequences of removing this possibility. Comments welcomed! Gavin From maurizio.cimadamore at oracle.com Fri Apr 13 11:15:01 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 13 Apr 2018 12:15:01 +0100 Subject: JEP325: Switch expressions spec In-Reply-To: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> References: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> Message-ID: <973dfd0e-034c-246b-ecf9-3c50a7f6d501@oracle.com> Looks neat. Some comments: * I note that you introduced patterns to describe the new syntactic options; while that's a completely fine choice, I wonder if it could lead to confusion - I always thought of JEP 325 as a set of standalone switch improvements, which don't need the P-word to be justified. Of course, I'm not opposed to what you have done, just noting (aloud) the mismatch with my expectations. * in 14.11 I find these sentences: "then we say that the null pattern matches", "then we say that the pattern matches" A bit odd to read , as the transitive verb 'matches' is missing its object. * also I note some replication: "If all these statements complete normally, or if there are no statements after the pattern label containing the matching pattern, then the entire switch statement completes normally." "If all these statements complete normally, or if there are no statements after the pattern label containing the matching pattern, then the entire switch statement completes normally." "If all these statements complete normally, or if there are no statements after the default pattern label, then the entire switch statement completes normally." The first two are identical, the last only slightly different, perhaps something can be done to consolidate * "A break statement either transfers control out of an enclosing statement or returns a value to an immediately enclosing switch expression." Is it an either/or? My mental model is that break always transfer controls out - it can do so with a value, or w/o a value (as in a classic break). * I like the fact that you define the semantics of the expression switch clauses in terms of desugaring to statements blocks - this is consistent with what we do in other areas (enhanced for loop, try with resources). * I suggest putting the paragraph in 15.29 starting with: "Given a switch expression, all of the following must be true" Ahead of the desugaring paragraph, which seems more execution/semantics-related, while this one is still about well-formedness. * On totality - this line: default???????????????????? -> 10; // Legal deserves some more explanation - e.g. one might think it's unreachable, but it's not because new constants could pop up at runtime; maybe add a clarification. * On non-returning, this sentence is obscure: "Thus a switch expression block that can not complete normally, can only do so by occurrences of a break statement with an Expression. This ensures that a switch expression must either result in a value, or complete abruptly." because it contradicts what is said just a line above: "an occurrence of a break statement with an Expression in a switch expression means that the switch expression will complete normally with the the value of the Expression" The way I read this is: 1) the only way for the block after a 'case' in a switch pattern to complete abnormally is via a break expression 2) even if the _block_ completes abnormally, the containing switch expression will complete normally, with the value of Expression Is that what you meant? * At the end of the switch expression section there are sub-optimal sentences like the one that appear for switch statements (e.g. "pattern matches") - see above. Cheers Maurizio On 12/04/18 22:27, Gavin Bierman wrote: > I have uploaded a draft spec for JEP 325: Switch expressions at http://cr.openjdk.java.net/~gbierman/switch-expressions.html > > Note there are still three things missing: > > * There is no text about typing a switch expression, as this is still being discussed on this list. > * There is no name given for the exception raised at runtime when a switch expression fails to find a matching pattern label, as this is still being discussed on this list. > * The spec currently permits fall through from a "case pattern:? statement group into a "case pattern ->" clause. We are still working through the consequences of removing this possibility. > > Comments welcomed! > Gavin From brian.goetz at oracle.com Fri Apr 13 16:46:39 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 13 Apr 2018 12:46:39 -0400 Subject: [records] Ancillary fields (was: Records -- current status) In-Reply-To: References: Message-ID: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com> Let's see if we can make some progress on the elephant in the room -- ancillary fields.? Several have expressed the concern that without the ability to declare some additional instance state, the feature will be too limited. The argument in favor of additional fields is the obvious one; more classes can be records.? And there are some arguably valid use cases for additional fields that don't conflict with the design center for records.? The best example is derived state: ?- When a field is a cached property derived from the record state (such as how String caches its hashCode) Arguably, if a field is derived deterministically from immutable record state, then it is not creating any new record state.? This surely seems within the circle. The argument against is more of a slippery-slope one; I believe developers would like to view this feature through the lens of syntactic boilerplate, rather than through semantics.? If we let them, they would surely and routinely do the following: ??? record A(int a, int b) { ??????? private int c; ??????? public A(int a, int b, int c) { ??????????? this(a, b); ??????????? this.c = c; ??????? } ??????? public boolean equals(Object other) { ??????????? return default.equals(other) && ((A) other).c == c; ??????? } ??? } Here, `c` is surely part of the state of `A`.? And, they wouldn't even know what they'd lost; they would just assume records are a way of "kickstarting" a class declaration with some public fields, and then you can mix in whatever private state you want. Why is this bad?? While "reduced-boilerplate classes" is a valid feature idea, our design goal for records is much more than that. The semantic constraints on records are valuable because they yield useful invariants; that they are "just" their state vector, that they can be freely taken apart and put back together with no loss of information, and hence can be freely serialized/marshaled to JSON and back, etc. We currently prohibit records like `A` via a number of restrictions: no additional fields, no override of equals.? We don't need all of these restrictions to achieve the desired goal, but we also can't relax them all without opening the gate.? So we should decide carefully which we want to relax, as making the wrong choice constrains us in the future. Before I dive into details of how we might extend records to support the case of "cached derived state", I'd like to first come to some agreement that this covers the use cases that we think fall into the "legitimate" uses of additional fields. On 3/16/2018 2:55 PM, Brian Goetz wrote: > There are a number of potentially open details on the design for > records.? My inclination is to start with the simplest thing that > preserves the flexibility and expectations we want, and consider > opening up later as necessary. > > One of the biggest issues, which Kevin raised as a must-address issue, > is having sufficient support for precondition validation. Without > foreclosing on the ability to do more later with declarative guards, I > think the recent construction proposal meets the requirement for > lightweight enforcement with minimal or no duplication.? I'm hopeful > that this bit is "there". > > Our goal all along has been to define records as being ?just macros? > for a finer-grained set of features.? Some of these are motivated by > boilerplate; some are motivated by semantics (coupling semantics of > API elements to state.)? In general, records will get there first, and > then ordinary classes will get the more general feature, but the > default answer for "can you relax records, so I can use it in this > case that almost but doesn't quite fit" should be "no, but there will > probably be a feature coming that makes that class simpler, wait for > that." > > > Some other open issues (please see my writeup at > http://cr.openjdk.java.net/~briangoetz/amber/datum.html for > reference), and my current thoughts on these, are outlined below. > Comments welcome! > > ?- Extension.? The proposal outlines a notion of abstract record, > which provides a "width subtyped" hierarchy.? Some have questioned > whether this carries its weight, especially given how Scala doesn't > support case-to-case extension (some see this as a bug, others as an > existence proof.)? Records can implement interfaces. > > ?- Concrete records are final.? Relaxing this adds complexity to the > equality story; I'm not seeing good reasons to do so. > > ?- Additional constructors.? I don't see any reason why additional > constructors are problematic, especially if they are constrained to > delegate to the default constructor (which in turn is made far simpler > if there can be statements ahead of the this() call.) Users may find > the lack of additional constructors to be an arbitrary limitation (and > they'd probably be right.) > > ?- Static fields.? Static fields seem harmless. > > ?- Additional instance fields.? These are a much bigger concern. While > the primary arguments against them are of the "slippery slope" > variety, I still have deep misgivings about supporting unrestricted > non-principal instance fields, and I also haven't found a reasonable > set of restrictions that makes this less risky.? I'd like to keep > looking for a better story here, before just caving on this, as I > worry doing so will end up biting us in the back. > > ?- Mutability and accessibility.? I'd like to propose an odd choice > here, which is: fields are final and package (protected for abstract > records) by default, but finality can be explicitly opted out of > (non-final) and accessibility can be explicitly widened (public). > > ?- Accessors.? Perhaps the most controversial aspect is that records > are inherently transparent to read; if something wants to truly > encapsulate state, it's not a record.? Records will eventually have > pattern deconstructors, which will expose their state, so we should go > out of the gate with the equivalent.? The obvious choice is to expose > read accessors automatically.? (These will not be named getXxx; we are > not burning the ill-advised Javabean naming conventions into the > language, no matter how much people think it already is.)? The obvious > naming choice for these accessors is fieldName().? No provision for > write accessors; that's bring-your-own. > > ?- Core methods.? Records will get equals, hashCode, and toString.? > There's a good argument for making equals/hashCode final (so they > can't be explicitly redeclared); this gives us stronger preservation > of the data invariants that allow us to safely and mechanically > snapshot / serialize / marshal (we'd definitely want this if we ever > allowed additional instance fields.)? No reason to suppress override > of toString, though. Records could be safely made cloneable() with > automatic support too (like arrays), but not clear if this is worth it > (its darn useful for arrays, though.)? I think the auto-generated > getters should be final too; this leaves arrays as second-class > components, but I am not sure that bothers me. > > > > > From kevinb at google.com Fri Apr 13 17:15:47 2018 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 13 Apr 2018 10:15:47 -0700 Subject: [records] Ancillary fields (was: Records -- current status) In-Reply-To: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com> References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com> Message-ID: As one of the voices demanding we allow ancillary fields, I can confirm that I had only these derived-state use cases in mind. I don't see anything else as legitimate. That is, I think that the semantic invariants you're trying to preserve for records are worth fighting for, and additional *non-derived* state would violate them. On Fri, Apr 13, 2018 at 9:46 AM, Brian Goetz wrote: > Let's see if we can make some progress on the elephant in the room -- > ancillary fields. Several have expressed the concern that without the > ability to declare some additional instance state, the feature will be too > limited. > > The argument in favor of additional fields is the obvious one; more > classes can be records. And there are some arguably valid use cases for > additional fields that don't conflict with the design center for records. > The best example is derived state: > > - When a field is a cached property derived from the record state (such > as how String caches its hashCode) > > Arguably, if a field is derived deterministically from immutable record > state, then it is not creating any new record state. This surely seems > within the circle. > > The argument against is more of a slippery-slope one; I believe developers > would like to view this feature through the lens of syntactic boilerplate, > rather than through semantics. If we let them, they would surely and > routinely do the following: > > record A(int a, int b) { > private int c; > > public A(int a, int b, int c) { > this(a, b); > this.c = c; > } > > public boolean equals(Object other) { > return default.equals(other) && ((A) other).c == c; > } > } > > Here, `c` is surely part of the state of `A`. And, they wouldn't even > know what they'd lost; they would just assume records are a way of > "kickstarting" a class declaration with some public fields, and then you > can mix in whatever private state you want. > > Why is this bad? While "reduced-boilerplate classes" is a valid feature > idea, our design goal for records is much more than that. The semantic > constraints on records are valuable because they yield useful invariants; > that they are "just" their state vector, that they can be freely taken > apart and put back together with no loss of information, and hence can be > freely serialized/marshaled to JSON and back, etc. > > We currently prohibit records like `A` via a number of restrictions: no > additional fields, no override of equals. We don't need all of these > restrictions to achieve the desired goal, but we also can't relax them all > without opening the gate. So we should decide carefully which we want to > relax, as making the wrong choice constrains us in the future. > > Before I dive into details of how we might extend records to support the > case of "cached derived state", I'd like to first come to some agreement > that this covers the use cases that we think fall into the "legitimate" > uses of additional fields. > > > > On 3/16/2018 2:55 PM, Brian Goetz wrote: > >> There are a number of potentially open details on the design for >> records. My inclination is to start with the simplest thing that preserves >> the flexibility and expectations we want, and consider opening up later as >> necessary. >> >> One of the biggest issues, which Kevin raised as a must-address issue, is >> having sufficient support for precondition validation. Without foreclosing >> on the ability to do more later with declarative guards, I think the recent >> construction proposal meets the requirement for lightweight enforcement >> with minimal or no duplication. I'm hopeful that this bit is "there". >> >> Our goal all along has been to define records as being ?just macros? for >> a finer-grained set of features. Some of these are motivated by >> boilerplate; some are motivated by semantics (coupling semantics of API >> elements to state.) In general, records will get there first, and then >> ordinary classes will get the more general feature, but the default answer >> for "can you relax records, so I can use it in this case that almost but >> doesn't quite fit" should be "no, but there will probably be a feature >> coming that makes that class simpler, wait for that." >> >> >> Some other open issues (please see my writeup at >> http://cr.openjdk.java.net/~briangoetz/amber/datum.html for reference), >> and my current thoughts on these, are outlined below. Comments welcome! >> >> - Extension. The proposal outlines a notion of abstract record, which >> provides a "width subtyped" hierarchy. Some have questioned whether this >> carries its weight, especially given how Scala doesn't support case-to-case >> extension (some see this as a bug, others as an existence proof.) Records >> can implement interfaces. >> >> - Concrete records are final. Relaxing this adds complexity to the >> equality story; I'm not seeing good reasons to do so. >> >> - Additional constructors. I don't see any reason why additional >> constructors are problematic, especially if they are constrained to >> delegate to the default constructor (which in turn is made far simpler if >> there can be statements ahead of the this() call.) Users may find the lack >> of additional constructors to be an arbitrary limitation (and they'd >> probably be right.) >> >> - Static fields. Static fields seem harmless. >> >> - Additional instance fields. These are a much bigger concern. While >> the primary arguments against them are of the "slippery slope" variety, I >> still have deep misgivings about supporting unrestricted non-principal >> instance fields, and I also haven't found a reasonable set of restrictions >> that makes this less risky. I'd like to keep looking for a better story >> here, before just caving on this, as I worry doing so will end up biting us >> in the back. >> >> - Mutability and accessibility. I'd like to propose an odd choice here, >> which is: fields are final and package (protected for abstract records) by >> default, but finality can be explicitly opted out of (non-final) and >> accessibility can be explicitly widened (public). >> >> - Accessors. Perhaps the most controversial aspect is that records are >> inherently transparent to read; if something wants to truly encapsulate >> state, it's not a record. Records will eventually have pattern >> deconstructors, which will expose their state, so we should go out of the >> gate with the equivalent. The obvious choice is to expose read accessors >> automatically. (These will not be named getXxx; we are not burning the >> ill-advised Javabean naming conventions into the language, no matter how >> much people think it already is.) The obvious naming choice for these >> accessors is fieldName(). No provision for write accessors; that's >> bring-your-own. >> >> - Core methods. Records will get equals, hashCode, and toString. >> There's a good argument for making equals/hashCode final (so they can't be >> explicitly redeclared); this gives us stronger preservation of the data >> invariants that allow us to safely and mechanically snapshot / serialize / >> marshal (we'd definitely want this if we ever allowed additional instance >> fields.) No reason to suppress override of toString, though. Records could >> be safely made cloneable() with automatic support too (like arrays), but >> not clear if this is worth it (its darn useful for arrays, though.) I >> think the auto-generated getters should be final too; this leaves arrays as >> second-class components, but I am not sure that bothers me. >> >> >> >> >> >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Apr 13 17:17:10 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 13 Apr 2018 13:17:10 -0400 Subject: [records] equals / hashCode (was: Records -- current status) In-Reply-To: References: Message-ID: <64e8fca3-ab81-e65e-85d5-26bd9f69eabd@oracle.com> Along the lines of the previous mail, people have and will ask "why can't I redefine equals/hashCode".? And the answer has two layers: ?- The constraints on equals/hashCode are stronger for records, and users might inadvertently violate them.? (They can be specified in the overrides of equals/hashCode in AbstractRecord, so there at least can be a place where this specification lives, even if no one reads it.) ?- In conjunction with ancillary fields, the constraints are sure to be violated, whether inadvertently and deliberately. Let's take a look at what sorts of modifications to equals/hashCode would be OK, should we decide to relax this restriction.? Equality should still derive from the record's state, but there might be acceptable variations. Would it be OK to _widen_ the definition of equality, by ignoring a component of the record? This is an example of what Gunnar asked for, which is to restrict equality to the primary key fields: ??? record PersonEntity(int primaryKey, String name, int age) { ??????? // equality based only on primaryKey ??? } Is this OK?? Well, let's look at our model: ?- Does ctor(dtor(c)) == c?? Yes. ?- if S1==S2, does ctor(S1) == ctor(S2)?? Yes. ?- For equal instances, does mutating them in the same way yield equal instances?? Yes. ?- For equal instances, does calling the same method on both with the same parameters yield equivalent results?? No. So, if p1 == p2, we cannot rely on p1.age() == p2.age(), so this fails the requirements of our pseudo-formal model.? (Assuming our model is the right one.) So, how would we feel about that?? Two records that are equals() to each other, but not substitable? A more subtle version of this would be to consider all components, but use a more inclusive notion of equality for that field, such as comparing array components by contents. ??? record Numbers(int[] numbers) { ??????? // equality based on Arrays.equals() ??? } ?- Does ctor(dtor(c)) == c?? Yes. ?- Do equal state vectors produce equal records?? Yes. ?- Do identical mutations on equal records produce equal records? Yes. ?- Does identical operations on equal records produce equal results?? Almost... The Almost qualification can be seen here: ??? int[] a1; ??? int[] a2 = copyOf(a1); ??? Numbers r1 = new Numbers(a1), r2 = new Numbers(a2); ??? boolean same = a1.numbers().equals(a2.numbers()) The accessor will yield up the array references, which will not be equals() to each other.? This is essentially the same problem as above. You get a similar result if your record represents something like a rational number and you don't normalize to lowest terms in the constructor; then you can have q1 equal q2, but q1.numerator() != q1.numerator(). Are any of these variations compelling enough to suggest we've got the wrong model? On 3/16/2018 2:55 PM, Brian Goetz wrote: > There are a number of potentially open details on the design for > records.? My inclination is to start with the simplest thing that > preserves the flexibility and expectations we want, and consider > opening up later as necessary. > > One of the biggest issues, which Kevin raised as a must-address issue, > is having sufficient support for precondition validation. Without > foreclosing on the ability to do more later with declarative guards, I > think the recent construction proposal meets the requirement for > lightweight enforcement with minimal or no duplication.? I'm hopeful > that this bit is "there". > > Our goal all along has been to define records as being ?just macros? > for a finer-grained set of features.? Some of these are motivated by > boilerplate; some are motivated by semantics (coupling semantics of > API elements to state.)? In general, records will get there first, and > then ordinary classes will get the more general feature, but the > default answer for "can you relax records, so I can use it in this > case that almost but doesn't quite fit" should be "no, but there will > probably be a feature coming that makes that class simpler, wait for > that." > > > Some other open issues (please see my writeup at > http://cr.openjdk.java.net/~briangoetz/amber/datum.html for > reference), and my current thoughts on these, are outlined below. > Comments welcome! > > ?- Extension.? The proposal outlines a notion of abstract record, > which provides a "width subtyped" hierarchy.? Some have questioned > whether this carries its weight, especially given how Scala doesn't > support case-to-case extension (some see this as a bug, others as an > existence proof.)? Records can implement interfaces. > > ?- Concrete records are final.? Relaxing this adds complexity to the > equality story; I'm not seeing good reasons to do so. > > ?- Additional constructors.? I don't see any reason why additional > constructors are problematic, especially if they are constrained to > delegate to the default constructor (which in turn is made far simpler > if there can be statements ahead of the this() call.) Users may find > the lack of additional constructors to be an arbitrary limitation (and > they'd probably be right.) > > ?- Static fields.? Static fields seem harmless. > > ?- Additional instance fields.? These are a much bigger concern. While > the primary arguments against them are of the "slippery slope" > variety, I still have deep misgivings about supporting unrestricted > non-principal instance fields, and I also haven't found a reasonable > set of restrictions that makes this less risky.? I'd like to keep > looking for a better story here, before just caving on this, as I > worry doing so will end up biting us in the back. > > ?- Mutability and accessibility.? I'd like to propose an odd choice > here, which is: fields are final and package (protected for abstract > records) by default, but finality can be explicitly opted out of > (non-final) and accessibility can be explicitly widened (public). > > ?- Accessors.? Perhaps the most controversial aspect is that records > are inherently transparent to read; if something wants to truly > encapsulate state, it's not a record.? Records will eventually have > pattern deconstructors, which will expose their state, so we should go > out of the gate with the equivalent.? The obvious choice is to expose > read accessors automatically.? (These will not be named getXxx; we are > not burning the ill-advised Javabean naming conventions into the > language, no matter how much people think it already is.)? The obvious > naming choice for these accessors is fieldName().? No provision for > write accessors; that's bring-your-own. > > ?- Core methods.? Records will get equals, hashCode, and toString.? > There's a good argument for making equals/hashCode final (so they > can't be explicitly redeclared); this gives us stronger preservation > of the data invariants that allow us to safely and mechanically > snapshot / serialize / marshal (we'd definitely want this if we ever > allowed additional instance fields.)? No reason to suppress override > of toString, though. Records could be safely made cloneable() with > automatic support too (like arrays), but not clear if this is worth it > (its darn useful for arrays, though.)? I think the auto-generated > getters should be final too; this leaves arrays as second-class > components, but I am not sure that bothers me. > > > > > From forax at univ-mlv.fr Sat Apr 14 22:14:13 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 15 Apr 2018 00:14:13 +0200 (CEST) Subject: Record design (and ancillary fields) In-Reply-To: References: Message-ID: <469125311.2612492.1523744053717.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Daniel Latr?moli?re" > ?: "amber-spec-comments" > Envoy?: Samedi 14 Avril 2018 05:43:40 > Objet: Record design (and ancillary fields) > Isn't it possible to do for a record, like database design: interesting question > > - fields are, by default, read-write and not concerned by identity of > the row/instance. > > - one special field (primary key) has all constraints of the identity of > the row/instance. > > > For a record, that would signify that one field has to be marked > __Identity. It will be the only field used in equals/hashCode methods of > the record. > > For satisfying constraints of identity (immutability), this field would > be final and necessarily of a primitive type or value type (composite > primary key). Given a value type can be scalarized in the class, > restricting identity to only one field would not have real cost in instance. I do not think we have to do something specific for supporting relational database mapping, the tools that does this mapping already relies on annotation processor or bytecode agent to change the user code (at least to track the changes), so those tools can be updated to detect that a class is a record and provides the right equals/hashCode if those methods are not user defined. > > > Just my point of view, > > Daniel. > > > PS: Given primitive/value type disallow cyclical references, this will > prohibit StackOverflowException in equals/hashCode methods. only if an equals on a value type that contains an object doesn't call equals on that object. R?mi From brian.goetz at oracle.com Sat Apr 14 23:11:21 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 14 Apr 2018 19:11:21 -0400 Subject: Reader mail bag References: Message-ID: <6E0B3626-8CBB-47C5-80D5-C94040CACD40@oracle.com> This was received on the amber-spec-comments list. > Begin forwarded message: > > From: Daniel Latr?moli?re > Subject: Record design (and ancillary fields) > Date: April 13, 2018 at 11:43:40 PM EDT > To: amber-spec-comments at openjdk.java.net > > Isn't it possible to do for a record, like database design: > > - fields are, by default, read-write and not concerned by identity of the row/instance. > > - one special field (primary key) has all constraints of the identity of the row/instance. > > > For a record, that would signify that one field has to be marked __Identity. It will be the only field used in equals/hashCode methods of the record. > > For satisfying constraints of identity (immutability), this field would be final and necessarily of a primitive type or value type (composite primary key). Given a value type can be scalarized in the class, restricting identity to only one field would not have real cost in instance. > > > Just my point of view, > > Daniel. > > > PS: Given primitive/value type disallow cyclical references, this will prohibit StackOverflowException in equals/hashCode methods. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gavin.bierman at oracle.com Mon Apr 16 16:53:17 2018 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Mon, 16 Apr 2018 17:53:17 +0100 Subject: JEP325: Switch expressions spec In-Reply-To: <973dfd0e-034c-246b-ecf9-3c50a7f6d501@oracle.com> References: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> <973dfd0e-034c-246b-ecf9-3c50a7f6d501@oracle.com> Message-ID: Thanks Maurizio. Some replies inline. > On 13 Apr 2018, at 12:15, Maurizio Cimadamore wrote: > > Looks neat. Some comments: > > * I note that you introduced patterns to describe the new syntactic options; while that's a completely fine choice, I wonder if it could lead to confusion - I always thought of JEP 325 as a set of standalone switch improvements, which don't need the P-word to be justified. Of course, I'm not opposed to what you have done, just noting (aloud) the mismatch with my expectations. Yes, you spotting me setting things up for a future release :-) But in my defence: in the current spec, we say ?case constant? where constant is either a constant expression or an enum constant. We are adding to this the possibility of a ?null?, so we need to find another word anyhow. That said, I think you have a point, so I?ll look again to see if I can dial it down a bit. > > * in 14.11 I find these sentences: > > "then we say that the null pattern matches", "then we say that the pattern matches" > > A bit odd to read , as the transitive verb 'matches' is missing its object. I know what you mean, but the spec today already states ?...then we say that the case label *matches*.? So I actually kept that text as it is. > > * also I note some replication: > > "If all these statements complete normally, or if there are no statements after the pattern label containing the matching pattern, then the entire switch statement completes normally." > "If all these statements complete normally, or if there are no statements after the pattern label containing the matching pattern, then the entire switch statement completes normally." > "If all these statements complete normally, or if there are no statements after the default pattern label, then the entire switch statement completes normally." > > The first two are identical, the last only slightly different, perhaps something can be done to consolidate I?ll take another look. > > * "A break statement either transfers control out of an enclosing statement or returns a value to an immediately enclosing switch expression." > > Is it an either/or? My mental model is that break always transfer controls out - it can do so with a value, or w/o a value (as in a classic break). This is a good question, although probably only one for spec-nerds. The problem is that the concept of ?transfer of control? is only valid for statements - in essence you jump from one statement to the other. There is no concept in the JLS of control for *expressions*. So you can?t really say that the break statement with a value transfers control to an *expression*. This is what is so ?unusual? about switch expressions, they are expressions with statements inside... This either/or distinction makes clear, for better or for worse, the new dual nature of break statements: they either transfer control to another statement, or they end up returning a value to an enclosing expression. > > * I like the fact that you define the semantics of the expression switch clauses in terms of desugaring to statements blocks - this is consistent with what we do in other areas (enhanced for loop, try with resources). Thanks! Although, with the proposed change to forbid fall through from statement groups into clauses, I?m not sure they can stay. > > * I suggest putting the paragraph in 15.29 starting with: > > "Given a switch expression, all of the following must be true" > > Ahead of the desugaring paragraph, which seems more execution/semantics-related, while this one is still about well-formedness. Yes! Thanks. > * On totality - this line: > > default -> 10; // Legal > > deserves some more explanation - e.g. one might think it's unreachable, but it's not because new constants could pop up at runtime; maybe add a clarification. Yes! Thanks. > > * On non-returning, this sentence is obscure: > > "Thus a switch expression block that can not complete normally, can only do so by occurrences of a break statement with an Expression. This ensures that a switch expression must either result in a value, or complete abruptly." > > because it contradicts what is said just a line above: > > "an occurrence of a break statement with an Expression in a switch expression means that the switch expression will complete normally with the the value of the Expression" > > The way I read this is: > > 1) the only way for the block after a 'case' in a switch pattern to complete abnormally is via a break expression > 2) even if the _block_ completes abnormally, the containing switch expression will complete normally, with the value of Expression > > Is that what you meant? Yes, although I?m not sure I quite see the ?contradiction?. I?ll take another look at this text. > * At the end of the switch expression section there are sub-optimal sentences like the one that appear for switch statements (e.g. "pattern matches") - see above. Okay, thanks. > > Cheers > > Maurizio > > On 12/04/18 22:27, Gavin Bierman wrote: >> I have uploaded a draft spec for JEP 325: Switch expressions at http://cr.openjdk.java.net/~gbierman/switch-expressions.html >> >> Note there are still three things missing: >> >> * There is no text about typing a switch expression, as this is still being discussed on this list. >> * There is no name given for the exception raised at runtime when a switch expression fails to find a matching pattern label, as this is still being discussed on this list. >> * The spec currently permits fall through from a "case pattern:? statement group into a "case pattern ->" clause. We are still working through the consequences of removing this possibility. >> >> Comments welcomed! >> Gavin > From brian.goetz at oracle.com Wed Apr 18 17:58:31 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 18 Apr 2018 13:58:31 -0400 Subject: [records] Ancillary fields In-Reply-To: References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com> Message-ID: Seeing no dissent on the claim that the essential use case for ancillary fields is caching derived properties, let me talk about how I would like to handle this: lazy (final) fields. For background, this is something we've been exploring for a long time (see for example http://cr.openjdk.java.net/~jrose/draft/lazy-final.html), but this is also something that we can do in the context of the language if we're willing to relax the requirements a bit. The basic idea is that we can describe fields as `lazy` (either static or instance fields), with an initializer, which are implicitly `final`, and have the compiler rewrite reads of those fields to do a lazy initialization instead.? For static fields, we can use ConstantDynamic and get lazy initialization for free; for instance fields, we have to do a little more work (CASes, fences), but the game is the same. This is useful well beyond records.? For example, classes like `String` cache a lazily computed has code; these classes could just do ??? private int cacheHash = computeHashCode(); ??? public int hashCode() { return cacheHash; } It's also useful for frequently used static fields: ??? private lazy Logger logger = Logger.of("com.foo.bar"); Much lazy initialization code is error-prone, so this would eliminate those errors; its also tempting to avoid lazy initialization where it might be marginally useful.? (Static initializers are also one of the big pain points in AOT; this eliminates many static initializers.) What does this have to do with records?? Well, if the goal is to cache lazily computed values derived from the state, then lazy fields would give us that without opening up to the full generality of ancillary fields.? We'd then say that records can only have additional _lazy_ instance fields. (Sometimes lazy fields are cast in the opposite direction -- cached methods rather than lazy fields.? There are an obvious set of tradeoffs for how to structure it, but neither is strictly more powerful than the other.) On 4/13/2018 1:15 PM, Kevin Bourrillion wrote: > As one of the voices demanding we allow ancillary fields, I can > confirm that I had only these derived-state use cases in mind. I don't > see anything else as legitimate. That is, I think that the semantic > invariants you're trying to preserve for records are worth fighting > for, and additional /non-derived/?state would violate them. > > On Fri, Apr 13, 2018 at 9:46 AM, Brian Goetz > wrote: > > Let's see if we can make some progress on the elephant in the room > -- ancillary fields.? Several have expressed the concern that > without the ability to declare some additional instance state, the > feature will be too limited. > > The argument in favor of additional fields is the obvious one; > more classes can be records.? And there are some arguably valid > use cases for additional fields that don't conflict with the > design center for records.? The best example is derived state: > > ?- When a field is a cached property derived from the record state > (such as how String caches its hashCode) > > Arguably, if a field is derived deterministically from immutable > record state, then it is not creating any new record state.? This > surely seems within the circle. > > The argument against is more of a slippery-slope one; I believe > developers would like to view this feature through the lens of > syntactic boilerplate, rather than through semantics.? If we let > them, they would surely and routinely do the following: > > ??? record A(int a, int b) { > ??????? private int c; > > ??????? public A(int a, int b, int c) { > ??????????? this(a, b); > ??????????? this.c = c; > ??????? } > > ??????? public boolean equals(Object other) { > ??????????? return default.equals(other) && ((A) other).c == c; > ??????? } > ??? } > > Here, `c` is surely part of the state of `A`.? And, they wouldn't > even know what they'd lost; they would just assume records are a > way of "kickstarting" a class declaration with some public fields, > and then you can mix in whatever private state you want. > > Why is this bad?? While "reduced-boilerplate classes" is a valid > feature idea, our design goal for records is much more than that. > The semantic constraints on records are valuable because they > yield useful invariants; that they are "just" their state vector, > that they can be freely taken apart and put back together with no > loss of information, and hence can be freely serialized/marshaled > to JSON and back, etc. > > We currently prohibit records like `A` via a number of > restrictions: no additional fields, no override of equals. We > don't need all of these restrictions to achieve the desired goal, > but we also can't relax them all without opening the gate.? So we > should decide carefully which we want to relax, as making the > wrong choice constrains us in the future. > > Before I dive into details of how we might extend records to > support the case of "cached derived state", I'd like to first come > to some agreement that this covers the use cases that we think > fall into the "legitimate" uses of additional fields. > > > > On 3/16/2018 2:55 PM, Brian Goetz wrote: > > There are a number of potentially open details on the design > for records.? My inclination is to start with the simplest > thing that preserves the flexibility and expectations we want, > and consider opening up later as necessary. > > One of the biggest issues, which Kevin raised as a > must-address issue, is having sufficient support for > precondition validation. Without foreclosing on the ability to > do more later with declarative guards, I think the recent > construction proposal meets the requirement for lightweight > enforcement with minimal or no duplication. I'm hopeful that > this bit is "there". > > Our goal all along has been to define records as being ?just > macros? for a finer-grained set of features.? Some of these > are motivated by boilerplate; some are motivated by semantics > (coupling semantics of API elements to state.)? In general, > records will get there first, and then ordinary classes will > get the more general feature, but the default answer for "can > you relax records, so I can use it in this case that almost > but doesn't quite fit" should be "no, but there will probably > be a feature coming that makes that class simpler, wait for that." > > > Some other open issues (please see my writeup at > http://cr.openjdk.java.net/~briangoetz/amber/datum.html > > for reference), and my current thoughts on these, are outlined > below. Comments welcome! > > ?- Extension.? The proposal outlines a notion of abstract > record, which provides a "width subtyped" hierarchy.? Some > have questioned whether this carries its weight, especially > given how Scala doesn't support case-to-case extension (some > see this as a bug, others as an existence proof.)? Records can > implement interfaces. > > ?- Concrete records are final.? Relaxing this adds complexity > to the equality story; I'm not seeing good reasons to do so. > > ?- Additional constructors.? I don't see any reason why > additional constructors are problematic, especially if they > are constrained to delegate to the default constructor (which > in turn is made far simpler if there can be statements ahead > of the this() call.) Users may find the lack of additional > constructors to be an arbitrary limitation (and they'd > probably be right.) > > ?- Static fields.? Static fields seem harmless. > > ?- Additional instance fields.? These are a much bigger > concern. While the primary arguments against them are of the > "slippery slope" variety, I still have deep misgivings about > supporting unrestricted non-principal instance fields, and I > also haven't found a reasonable set of restrictions that makes > this less risky.? I'd like to keep looking for a better story > here, before just caving on this, as I worry doing so will end > up biting us in the back. > > ?- Mutability and accessibility.? I'd like to propose an odd > choice here, which is: fields are final and package (protected > for abstract records) by default, but finality can be > explicitly opted out of (non-final) and accessibility can be > explicitly widened (public). > > ?- Accessors.? Perhaps the most controversial aspect is that > records are inherently transparent to read; if something wants > to truly encapsulate state, it's not a record.? Records will > eventually have pattern deconstructors, which will expose > their state, so we should go out of the gate with the > equivalent.? The obvious choice is to expose read accessors > automatically. (These will not be named getXxx; we are not > burning the ill-advised Javabean naming conventions into the > language, no matter how much people think it already is.)? The > obvious naming choice for these accessors is fieldName(). No > provision for write accessors; that's bring-your-own. > > ?- Core methods.? Records will get equals, hashCode, and > toString.? There's a good argument for making equals/hashCode > final (so they can't be explicitly redeclared); this gives us > stronger preservation of the data invariants that allow us to > safely and mechanically snapshot / serialize / marshal (we'd > definitely want this if we ever allowed additional instance > fields.)? No reason to suppress override of toString, though. > Records could be safely made cloneable() with automatic > support too (like arrays), but not clear if this is worth it > (its darn useful for arrays, though.)? I think the > auto-generated getters should be final too; this leaves arrays > as second-class components, but I am not sure that bothers me. > > > > > > > > > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Apr 18 18:16:30 2018 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 18 Apr 2018 11:16:30 -0700 Subject: JEP325: Switch expressions spec In-Reply-To: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> References: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> Message-ID: If one of the patterns is a constant expression or enum constant that is > equal to the value of the selector expression, then we say that the pattern > *matches*. > I think "equal" is ambiguous for strings (and will be for doubles when they happen). switch (s) { > // Even default does not match > // Will throw an exception > default: > System.out.println("It's a string"); > } I think it would be good to show the modified example that uses `case null: default:` together in order to produce the expected default behavior. > *A pattern label can contain multiple patterns, and is said to match if > any one of these patterns matches. The pattern label can then be seen to be > a disjunction of its constituent patterns.*switch (day) { > case SATURDAY, SUNDAY: > // matches if it is a Saturday OR a Sunday > System.out.println("It's the weekend!"); > } Were we considering allowing `case *something*, default:` or `default, case *something*:`? Of course you would never ever actually *need* this... except in the one case that *something* is null. In a switch expression it would be sad to be forced to revert to the old syntax for only this reason. If we're not allowing that, perhaps that's worth pointing out. Example 14.11-1. Fall-Through in the switch Statement Since there's a whole section on this, it might be helpful to point out that when multiple labels are used with no intervening code (not using the new comma feature), this is* not* considered fall-through. Everyone gets confused about the terminology. Meh... > Evaluation of an expression can produce side effects, because expressions > may contain embedded assignments, increment operators, decrement operators, > and method invocations. > *In addition, lambda expressions and switch expressions have bodies that > may contain arbitrary statements.* A lambda "contains" statements *physically*, but nothing gets executed. If anything, it is anonymous *classes* that belong here (though maybe, arguably, that would be covered if "method invocations" was changed to "method or constructor invocations"?). Suggestion: "... because expressions may contain embedded assignments, increment operators, decrement operators, and method or constructor invocations, as well as arbitrary statements nested inside a switch expression." -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinb at google.com Wed Apr 18 18:46:47 2018 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 18 Apr 2018 11:46:47 -0700 Subject: [records] Ancillary fields In-Reply-To: References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com>

Message-ID: Lazy initialization is a massive pain to get right, so I'm very intrigued by this proposal. On Wed, Apr 18, 2018 at 10:58 AM, Brian Goetz wrote: This is useful well beyond records. For example, classes like `String` > cache a lazily computed has code; these classes could just do > > private int cacheHash = computeHashCode(); > > public int hashCode() { return cacheHash; } > ('course, String itself may not use this, as it prefers to save memory by just letting rare values like "drumwood boulderhead" be uncacheable.) Ahh, you missed the `lazy` keyword on there :-) Which is good because it raises an issue: when you forget it, bad performance may result without other observable consequence. Although, it's already the case that reading code like the above ought to raise all kinds of alarm bells (e.g., now I want to go check which fields computeHashCode() might be referring to, and where *they're* initialized), so I *should* be looking for that `lazy` keyword to put my mind at ease. So maybe this is okay. I assume that, unlike other field initializers, I'm safe to refer to *any* other field regardless of how and where that field is initialized. Right? The intersection with primitives is interesting. I assume it gets secretly created as an Integer? So there's a little extra hidden memory consumption. For a reference type, what happens if the initialization produces `null`? (I suggest throwing NPE, because I think the alternatives are worse?) I pondered also allowing a method to be marked lazy (memoized, really) and let the field(s) be created behind the scenes to store its result, but the risk of that being applied to an impure method is probably too scary. On 4/13/2018 1:15 PM, Kevin Bourrillion wrote: > > As one of the voices demanding we allow ancillary fields, I can confirm > that I had only these derived-state use cases in mind. I don't see anything > else as legitimate. That is, I think that the semantic invariants you're > trying to preserve for records are worth fighting for, and additional > *non-derived* state would violate them. > > On Fri, Apr 13, 2018 at 9:46 AM, Brian Goetz > wrote: > >> Let's see if we can make some progress on the elephant in the room -- >> ancillary fields. Several have expressed the concern that without the >> ability to declare some additional instance state, the feature will be too >> limited. >> >> The argument in favor of additional fields is the obvious one; more >> classes can be records. And there are some arguably valid use cases for >> additional fields that don't conflict with the design center for records. >> The best example is derived state: >> >> - When a field is a cached property derived from the record state (such >> as how String caches its hashCode) >> >> Arguably, if a field is derived deterministically from immutable record >> state, then it is not creating any new record state. This surely seems >> within the circle. >> >> The argument against is more of a slippery-slope one; I believe >> developers would like to view this feature through the lens of syntactic >> boilerplate, rather than through semantics. If we let them, they would >> surely and routinely do the following: >> >> record A(int a, int b) { >> private int c; >> >> public A(int a, int b, int c) { >> this(a, b); >> this.c = c; >> } >> >> public boolean equals(Object other) { >> return default.equals(other) && ((A) other).c == c; >> } >> } >> >> Here, `c` is surely part of the state of `A`. And, they wouldn't even >> know what they'd lost; they would just assume records are a way of >> "kickstarting" a class declaration with some public fields, and then you >> can mix in whatever private state you want. >> >> Why is this bad? While "reduced-boilerplate classes" is a valid feature >> idea, our design goal for records is much more than that. The semantic >> constraints on records are valuable because they yield useful invariants; >> that they are "just" their state vector, that they can be freely taken >> apart and put back together with no loss of information, and hence can be >> freely serialized/marshaled to JSON and back, etc. >> >> We currently prohibit records like `A` via a number of restrictions: no >> additional fields, no override of equals. We don't need all of these >> restrictions to achieve the desired goal, but we also can't relax them all >> without opening the gate. So we should decide carefully which we want to >> relax, as making the wrong choice constrains us in the future. >> >> Before I dive into details of how we might extend records to support the >> case of "cached derived state", I'd like to first come to some agreement >> that this covers the use cases that we think fall into the "legitimate" >> uses of additional fields. >> >> >> >> On 3/16/2018 2:55 PM, Brian Goetz wrote: >> >>> There are a number of potentially open details on the design for >>> records. My inclination is to start with the simplest thing that preserves >>> the flexibility and expectations we want, and consider opening up later as >>> necessary. >>> >>> One of the biggest issues, which Kevin raised as a must-address issue, >>> is having sufficient support for precondition validation. Without >>> foreclosing on the ability to do more later with declarative guards, I >>> think the recent construction proposal meets the requirement for >>> lightweight enforcement with minimal or no duplication. I'm hopeful that >>> this bit is "there". >>> >>> Our goal all along has been to define records as being ?just macros? for >>> a finer-grained set of features. Some of these are motivated by >>> boilerplate; some are motivated by semantics (coupling semantics of API >>> elements to state.) In general, records will get there first, and then >>> ordinary classes will get the more general feature, but the default answer >>> for "can you relax records, so I can use it in this case that almost but >>> doesn't quite fit" should be "no, but there will probably be a feature >>> coming that makes that class simpler, wait for that." >>> >>> >>> Some other open issues (please see my writeup at >>> http://cr.openjdk.java.net/~briangoetz/amber/datum.html for reference), >>> and my current thoughts on these, are outlined below. Comments welcome! >>> >>> - Extension. The proposal outlines a notion of abstract record, which >>> provides a "width subtyped" hierarchy. Some have questioned whether this >>> carries its weight, especially given how Scala doesn't support case-to-case >>> extension (some see this as a bug, others as an existence proof.) Records >>> can implement interfaces. >>> >>> - Concrete records are final. Relaxing this adds complexity to the >>> equality story; I'm not seeing good reasons to do so. >>> >>> - Additional constructors. I don't see any reason why additional >>> constructors are problematic, especially if they are constrained to >>> delegate to the default constructor (which in turn is made far simpler if >>> there can be statements ahead of the this() call.) Users may find the lack >>> of additional constructors to be an arbitrary limitation (and they'd >>> probably be right.) >>> >>> - Static fields. Static fields seem harmless. >>> >>> - Additional instance fields. These are a much bigger concern. While >>> the primary arguments against them are of the "slippery slope" variety, I >>> still have deep misgivings about supporting unrestricted non-principal >>> instance fields, and I also haven't found a reasonable set of restrictions >>> that makes this less risky. I'd like to keep looking for a better story >>> here, before just caving on this, as I worry doing so will end up biting us >>> in the back. >>> >>> - Mutability and accessibility. I'd like to propose an odd choice >>> here, which is: fields are final and package (protected for abstract >>> records) by default, but finality can be explicitly opted out of >>> (non-final) and accessibility can be explicitly widened (public). >>> >>> - Accessors. Perhaps the most controversial aspect is that records are >>> inherently transparent to read; if something wants to truly encapsulate >>> state, it's not a record. Records will eventually have pattern >>> deconstructors, which will expose their state, so we should go out of the >>> gate with the equivalent. The obvious choice is to expose read accessors >>> automatically. (These will not be named getXxx; we are not burning the >>> ill-advised Javabean naming conventions into the language, no matter how >>> much people think it already is.) The obvious naming choice for these >>> accessors is fieldName(). No provision for write accessors; that's >>> bring-your-own. >>> >>> - Core methods. Records will get equals, hashCode, and toString. >>> There's a good argument for making equals/hashCode final (so they can't be >>> explicitly redeclared); this gives us stronger preservation of the data >>> invariants that allow us to safely and mechanically snapshot / serialize / >>> marshal (we'd definitely want this if we ever allowed additional instance >>> fields.) No reason to suppress override of toString, though. Records could >>> be safely made cloneable() with automatic support too (like arrays), but >>> not clear if this is worth it (its darn useful for arrays, though.) I >>> think the auto-generated getters should be final too; this leaves arrays as >>> second-class components, but I am not sure that bothers me. >>> >>> >>> >>> >>> >>> >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.buckley at oracle.com Wed Apr 18 19:02:58 2018 From: alex.buckley at oracle.com (Alex Buckley) Date: Wed, 18 Apr 2018 12:02:58 -0700 Subject: JEP325: Switch expressions spec In-Reply-To: References: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> Message-ID: <5AD79662.60605@oracle.com> On 4/18/2018 11:16 AM, Kevin Bourrillion wrote: > Evaluation of an expression can produce side effects, because > expressions may contain embedded assignments, increment operators, > decrement operators, and method invocations. *In addition, lambda > expressions and switch expressions have bodies that may contain > arbitrary statements. > > A lambda "contains" statements /physically/, but nothing gets > executed. If anything, it is anonymous /classes/ that belong here > (though maybe, arguably, that would be covered if "method invocations" > was changed to "method or constructor invocations"?). The goal was to highlight that a lambda/switch expression is not like (say) a field access expression, because of the ability to have a body of statements rather than merely a tree of subexpressions ... but you're right, "Evaluation of a lambda expression is distinct from execution of the lambda body." (JLS 15.27.4) > Suggestion: "... because expressions may contain embedded assignments, > increment operators, decrement operators, and method or constructor > invocations, as well as arbitrary statements nested inside a switch > expression." Yes, limiting the arbitrariness to switch expressions (the sole "home" for something-resembling-block-expressions) is right. Alex From brian.goetz at oracle.com Wed Apr 18 19:30:23 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 18 Apr 2018 15:30:23 -0400 Subject: JEP325: Switch expressions spec In-Reply-To: References: <8E28CEE7-0F85-485A-9AE7-15801522B06C@oracle.com> Message-ID: <129fb52c-d6b9-cf73-b664-1b888b1b8f56@oracle.com> All good points.? Minor comments inline. > |Were we considering allowing `case /something/, default:` or > `default, case /something/:`? Of course you would never ever actually > /need/ this... except in the one case that /something/ is null. In a > switch expression it would be sad to be forced to revert to the old > syntax for only this reason.| |This may well be needed, especially if we prohibit fallthrough from a colon label into a arrow label. Another case where a simliar problem arises is this: ??? case null: ??? case String s: ??????? // whoops, s is not DA here Really, we'd like to say ??? case null s, String s: or ??? case (null | String) s: or something similar.? We don't have to cross this until we get to type patterns, but it's on the horizon. | -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Apr 18 20:39:12 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 18 Apr 2018 16:39:12 -0400 Subject: [records] Ancillary fields In-Reply-To: References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com>

Message-ID: <74880bca-fc77-0d84-3827-025b241c6a6e@oracle.com> > Ahh, you missed the `lazy` keyword on there :-) Which is good because > it raises an issue: when you forget it, bad performance may result > without other observable consequence. Although, it's already the case > that reading code like the above ought to raise all kinds of alarm > bells (e.g., now I want to go check which fields computeHashCode() > might be referring to, and where /they're/?initialized), so I > /should/?be looking for that `lazy` keyword to put my mind at ease. So > maybe this is okay. Well, "bad" is relative; it won't be any worse than what you do today with eager static fields.? But yes, I did drop the lazy there. > I assume that, unlike other field initializers, I'm safe to refer > to/any/?other field regardless of how and where that field is > initialized. Right? I think you mostly are asking about instance fields.? It would be safe to refer to any other field, however, if you _read_ a lazy field in the constructor, it might trigger computation of the field based on a partially initialized object.? The compiler could warn on the obvious cases where this happens, but of course it can be buried in a chain of method calls. > The intersection with primitives is interesting. I assume it gets > secretly created as an Integer? So there's a little extra hidden > memory consumption. For static fields, there's an obvious and good answer that is optimally time and space efficient with no anomalies: condy.? We desugar ??? lazy static T t = e ??? ... ??? moo(t) into ??? // no field needed ??? static t$init() { return ; } ??? ... ??? moo( ldc condy[ ... ] ) and let the constant pool do the lazy initialization and caching. JITs love this. For instance fields, we have a choice; use extra space in the object to store the "already initialized" bit, or satisfy ourselves with the trick that String does with hashCode() -- allow redundant recomputation in the case where the initializer serves up the default value. So I think the divide is not ref-vs-primitive but whether we are willing to take the recomputation hit when it serves up a default value. -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Wed Apr 18 21:45:12 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 18 Apr 2018 21:45:12 +0000 Subject: [records] Ancillary fields In-Reply-To: <74880bca-fc77-0d84-3827-025b241c6a6e@oracle.com> References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com>

<74880bca-fc77-0d84-3827-025b241c6a6e@oracle.com> Message-ID: <02A0AF7A-218C-4FC6-9EB9-4570588533DE@univ-mlv.fr> On April 18, 2018 8:39:12 PM UTC, Brian Goetz wrote: > > >> Ahh, you missed the `lazy` keyword on there :-) Which is good because > >> it raises an issue: when you forget it, bad performance may result >> without other observable consequence. Although, it's already the case > >> that reading code like the above ought to raise all kinds of alarm >> bells (e.g., now I want to go check which fields computeHashCode() >> might be referring to, and where /they're/?initialized), so I >> /should/?be looking for that `lazy` keyword to put my mind at ease. >So >> maybe this is okay. > >Well, "bad" is relative; it won't be any worse than what you do today >with eager static fields.? But yes, I did drop the lazy there. > >> I assume that, unlike other field initializers, I'm safe to refer >> to/any/?other field regardless of how and where that field is >> initialized. Right? > >I think you mostly are asking about instance fields.? It would be safe >to refer to any other field, however, if you _read_ a lazy field in the > >constructor, it might trigger computation of the field based on a >partially initialized object.? The compiler could warn on the obvious >cases where this happens, but of course it can be buried in a chain of >method calls. > >> The intersection with primitives is interesting. I assume it gets >> secretly created as an Integer? So there's a little extra hidden >> memory consumption. > >For static fields, there's an obvious and good answer that is optimally > >time and space efficient with no anomalies: condy.? We desugar > > ??? lazy static T t = e > ??? ... > ??? moo(t) > >into > > ??? // no field needed > ??? static t$init() { return ; } > ??? ... > ??? moo( ldc condy[ ... ] ) > >and let the constant pool do the lazy initialization and caching. JITs >love this. > >For instance fields, we have a choice; use extra space in the object to > >store the "already initialized" bit, or satisfy ourselves with the >trick >that String does with hashCode() -- allow redundant recomputation in >the >case where the initializer serves up the default value. > >So I think the divide is not ref-vs-primitive but whether we are >willing >to take the recomputation hit when it serves up a default value. I fully agree. The lazy static with condy also has the same semantics, if the bsm do a side effect you may see that the bsm can be called multiple times. For the record, I've just presented the lazy static this afternoon at devoxx fr (in order to explain the semantics of condy) and several people reach me afterward saying it was in interesting idea. Remi -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From kevinb at google.com Wed Apr 18 21:59:02 2018 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 18 Apr 2018 14:59:02 -0700 Subject: [records] Ancillary fields In-Reply-To: <74880bca-fc77-0d84-3827-025b241c6a6e@oracle.com> References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com>

<74880bca-fc77-0d84-3827-025b241c6a6e@oracle.com> Message-ID: On Wed, Apr 18, 2018 at 1:39 PM, Brian Goetz wrote: > Ahh, you missed the `lazy` keyword on there :-) Which is good because it > raises an issue: when you forget it, bad performance may result without > other observable consequence. Although, it's already the case that reading > code like the above ought to raise all kinds of alarm bells (e.g., now I > want to go check which fields computeHashCode() might be referring to, and > where *they're* initialized), so I *should* be looking for that `lazy` > keyword to put my mind at ease. So maybe this is okay. > > > Well, "bad" is relative; it won't be any worse than what you do today with > eager static fields. > Yes, it's just that lazy and eager code aren't as trivially distinguishable anymore, so... I thought I should mention it, but it's no kind of dealbreaker. > For instance fields, we have a choice; use extra space in the object to > store the "already initialized" bit, or satisfy ourselves with the trick > that String does with hashCode() -- allow redundant recomputation in the > case where the initializer serves up the default value. > I strongly suspect there isn't going to be any generally safe way to do the latter. So I think the divide is not ref-vs-primitive but whether we are willing to > take the recomputation hit when it serves up a default value. > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Apr 18 22:19:25 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 18 Apr 2018 18:19:25 -0400 Subject: [records] Ancillary fields In-Reply-To: References: <75ded733-0a75-b72c-2990-d418e87edb40@oracle.com>

<74880bca-fc77-0d84-3827-025b241c6a6e@oracle.com> Message-ID: <1ce6065a-3619-b32c-e3af-0f51e130bc77@oracle.com> For primitives, you can always force yourself to use Integer: ??? lazy Integer i = f(); and make sure f() never returns null.? You can do something similar with a library class (e.g., Optional) for references.? So there are surely _safe_ ways to do it, albeit ugly ones. I kind of prefer to have boxing like this be explicit rather than implicit; if the user thinks they're putting an `int` in their class, I'd like to be as transparent about that as we can. You were willing to throw on null in the reference case; that can also be simulated by: ??? lazy Foo f = requireNonNull(f()); Which isn't even that ugly or expensive.? So I suspect that this is less of a problem that one might first think, but I could be wrong. On 4/18/2018 5:59 PM, Kevin Bourrillion wrote: > > For instance fields, we have a choice; use extra space in the > object to store the "already initialized" bit, or satisfy > ourselves with the trick that String does with hashCode() -- allow > redundant recomputation in the case where the initializer serves > up the default value. > > > I strongly suspect there isn't going to be any generally safe way to > do the latter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Apr 19 20:44:45 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 19 Apr 2018 16:44:45 -0400 Subject: [switch] Further unification on switch Message-ID: <88ba33f3-5c6a-62ac-21ad-f703e705f0cc@oracle.com> We've been reviewing the work to date on switch expressions. Here's where we are, and here's a possible place we might move to, which I like a lot better than where we are now. ## Goals As a reminder, remember that the primary goal here is _not_ switch expressions; switch expressions are supposed to just be an uncontroversial waypoint on the way to the real goal, which is a more expressive and flexible switch construct that works in a wider variety of situations, including supporting patterns, being less hostile to null, use as either an expression or a statement, etc. And the reason we think that improving switch is the right primary goal is because a "do one of these based on ..." construct is _better_ than the corresponding chain of if-else-if, for multiple reasons: ?- Possibility for the compiler to do exhaustiveness analysis, potentially finding more bugs; ?- Possibility for more efficient dispatch -- a switch could be O(1), whereas an if-else chain is almost certainly O(n); ?- More semantically transparent -- it's obvious the user is saying "do one of these, based on ..."; ?- Eliminates the need to repeat (and possibly get wrong) the switch target. Switch does come with a lot of baggage (fallthrough by default, questionable scoping, need to explicitly break), and this baggage has produced the predictable distractions in the discussion -- a desire that we subordinate the primary goal (making switch more expressive) to the more contingent goal of "fixing" the legacy problems of switch. These legacy problems of switch may be unfortunate, but to whatever degree we end up ameliorating these, this has to be purely a side-benefit -- it's not the primarily goal, no matter how annoying people find them.? (The desire to "fix" the mistakes of the past is frequently a siren song, which is why we don't allow ourselves to take these as first-class requirements.) #### What we're not going to do The worst possible outcome (which is also the most commonly suggested "solution" in forums like reddit) would be to invent a new construct that is similar to, but not quite the same as switch (`snitch`), without being a 100% replacement for today's quirky switch.? Today's switch is surely suboptimal, but it's not so fatally flawed that it needs to be euthanized, and we don't want to create an "undead" language construct forever, which everyone will still have to learn, and keep track of the differences between `switch` and `snitch`.? No thank you. That means we extend the existing switch statement, and increase flexibility by supporting an expression form, and to the degree needed, embrace its quirks.? ("No statement left behind.") #### Where we started In the first five minutes of working on this project, we sketched out the following (call it the "napkin sketch"), where an expression switch has case arms of the form: ?? case L -> e; or ?? case L -> { statement*; break e; } This was enough to get started, but of course the devil is in the details. #### Where we are right now We moved away from the napkin sketch for a few reasons, in part because it seemed to be drawing us down the road towards switch and snitch -- which was further worrying as we still had yet to deal with the potential that pattern switch and constant switch might have differences as well.? We want a unified model of switch that deals well enough with all the cases -- expressions and statements, patterns and constants. Our current model (call this Unification Attempt #1, or UA1 for short) is a step towards a unified model of switch, and this is a huge step forward.? In this model, there's one switch construct, and there's one set of control flow rules, including for break (like return, break takes a value in a value context and is void in a void context). For convenience and safety, we then layered a shorthand atop value-bearing switches, which is to interpret ??? case L -> e; as ??? case L: break e; expecting the shorter form would be used almost all the time.? (This has a pleasing symmetry with the expression form of lambdas, and (at least for expression switches) alleviates two of the legacy pain points.? Switch expressions have other things in common with lambdas too; they are the only ones that can have statements; they are the only ones that interact with nonlocal control flow.) This approach offers a lot of flexibility (some would say too much).? You can write "remi-style" expression switches: ??? int x = switch (y) { ??????? case 1: break 2; ??????? case 2: break 4; ??????? default: break 8; ??? }; or you can write "new-style" expression switches: ??? int x = switch (y) { ??????? case 1 -> 2; ??????? case 2-> 4; ??????? default-> 8; ??? }; Some people like the transparency of the first; others like the compactness and fallthrough-safety of the second.? And in cases where you mostly want the benefits of the second, but the real world conspires to make one or two cases difficult, you can mix them, and take full advantage of what "old switch" does -- with no new rules for control flow. #### Complaints There were the usual array of complaints over syntax -- many of which can be put down to "bleah, new is different, different is bad", but the most prominent one seems to be a generalized concern that other users (never us, of course, but we always fear for what others might do) won't be able to "handle" the power of mixed switches and will write terrible code, and then the world will burn.? (And, because the mixing comes with fallthrough, it further engenders the "you idiots, you fixed the wrong thing" reactions.) Personally, I think the fear of mixing is deeply overblown -- I think in most cases people will gravitate towards one of the two clean styles, and only mix where the complexity of the real world forces them to, but there's value in understanding the underpinnings of such reactions, even if in the end they'd turn out to be much hot air about nothing. #### A real issue with mixing! But, there is a real problem with our approach, which is: while a unified switch is the right goal, UA1 is not unified _enough_. Specifically, we haven't fully aligned the statement forms, and this conspires to reduce expressiveness and safety.? That is, in an expression switch you can say: ??? case L -> e; but in a statement switch you can't say ??? case L -> s; The reason for this is a purely accidental one: if we allowed this, then we _would_ likely find ourselves in the mixing hell that people are afraid of, which in turn would make the risk of accidental fallthrough _even worse_ than it is today.? So the failing of mixing is not that it will be abused, but that it constrains us from actually getting to a unified construct. ## Closing the gap So, let's take one more step towards unifying the two forms (call this UA2), rather than a step away from it.? Let's say that _all_ switches can support either old-style (colon) or new-style (arrow) case labels -- but must stick to one kind of case label in a given switch: ??? // statement switch ??? switch (x) { ??????? case 1: println("one"); break; ??????? case 2: println("two"); break; ??? } or ??? // also statement switch ??? switch (x) { ??????? case 1 -> println("one"); ??????? case 2 -> println("two"); ??? } If a switch is a statement, the RHS is a statement, which can be a block statement: ??? case L -> { a; b; } We get there by first taking a step backwards, at least in terms of superficial syntax, to the syntax suggested by the napkin sketch, where if a switch is an expression, the RHS of an -> case is an expression or a block statement (in the latter case, it must complete abruptly by reason of either break-value or throw).? Just as we expected "break value" to be rare in expression switches under UA1 since developers will generally prefer the shorthand form where applicable, we expect it to be equally rare under UA2. Then, as in UA1, we render unto expressions the things that belong to expressions; they must be total (an expression must yield a value or complete abruptly by reason of throwing.) #### Look, accidental benefits! Many of switches failings (fallthrough, scoping) are not directly specified features, as much as emergent properties of the structure and control flow of switches.? Since by definition you can't fall out of a arrow case, then an all-arrow switch gives the fallthrough-haters what they want "for free", with no need to treat it specially. In fact, its even better; in the all-arrow form, all of the things people hate about switch -- the need to say break, the risk of fallthrough, and the questionable scoping -- all go away. #### Scorecard There is one switch construct, which can be use as either an expression or a statement; when used as an expression, it acquires the characteristics of expressions (must be total, no nonlocal control flow out.)? Each can be expressed in one of two syntactic forms (arrow and colon.)? All forms will support patterns, null handling, and multiple labels per case.? The control flow and scoping rules are driven by structural properties of the chosen form. The (statement, colon) case is the switch we have since Java 1.0, enhanced as above (patterns, nulls, etc.) The (statement, arrow) case can be considered a nice syntactic shorthand for the previous, which obviates the annoyance of "break", implicitly prevents fallthrough of all forms, and avoids the confusion of current switch scoping.? Many existing statement switches that are not expressions in disguise can be refactored to this. The (expression, colon) form is a subset of UA1, where you just never say "arrow". The (expression, arrow) case can again be considered a nice shorthand for the previous, again a subset of UA1, where you just never say "colon", and as a result, again don't have to think about fallthrough. Totality is a property of expression switches, regardless of form, because they are expressions, and expressions must be total. Fallthrough is a property of the colon-structured switches; there are no changes here. Nonlocal control flow _out_ of a switch (continue to an enclosing loop, break with label, return) are properties of statement switches. So essentially, rather than dividing the semantics along expression/statement lines, and then attempting to opportunistically heap a bunch of irrelevant features like "no fallthrough" onto the expression side "because they're cool" even though they have nothing to do with expression-ness, we instead divide the world structurally: the colon form gives you the old control flow, and the arrow form gives you the new.? And either can be used as a statement, or an expression.? And no one will be confused by mixing. Orthogonality FTW.? No statement gets left behind. ## Explaining it Relative to UA1, we could describe this as adding back the blocks (its not really a block expression) from the napkin model, supporting an arrow form of statement switches with blocks too, and then restricting switches to all-arrow or all-colon.? Then each quadrant is a restriction of this model.? But that's not how we'd teach it. Relative to Java 10, we'd probably say: ?- Switch statements now come in a simpler (arrow) flavor, where there is no fallthrough, no weird scoping, and no need to say break most of the time.? Many switches can be rewritten this way, and this form can even be taught first. ?- Switches can be used as either expressions or statements, with essentially identical syntax (some grammar differences, but this is mostly interesting only to spec writers).? If a switch is an expression, it should contain expressions; if a switch is a statement, it should contain statements. ?- Expression switches have additional restrictions that are derived exclusively from their expression-ness: totality, can only complete abruptly if by reason of throw. ?- We allow a break-with-value statement in an expression switch as a means of explicitly providing the switch result; this can be combined with a statement block to allow for statements+break-expression. The result is one switch construct, with modern and legacy flavors, which supports either expressions or statements.? You can immediately look at the middle of a switch and tell (by arrow vs colon) whether it has the legacy control flow or not. From guy.steele at oracle.com Thu Apr 19 21:06:35 2018 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 19 Apr 2018 17:06:35 -0400 Subject: [switch] Further unification on switch In-Reply-To: <88ba33f3-5c6a-62ac-21ad-f703e705f0cc@oracle.com> References: <88ba33f3-5c6a-62ac-21ad-f703e705f0cc@oracle.com> Message-ID: > On Apr 19, 2018, at 4:44 PM, Brian Goetz wrote: > > We've been reviewing the work to date on switch expressions. Here's where we are, and here's a possible place we might move to, which I like a lot better than where we are now. > . . . > ## Closing the gap > > So, let's take one more step towards unifying the two forms (call this UA2), rather than a step away from it. Let's say that _all_ switches can support either old-style (colon) or new-style (arrow) case labels -- but must stick to one kind of case label in a given switch . . . > > The result is one switch construct, with modern and legacy flavors, which supports either expressions or statements. You can immediately look at the middle of a switch and tell (by arrow vs colon) whether it has the legacy control flow or not. I like it. I would like to think that an IDE could help you with changing between colon and arrow flavors. ?Guy From dl at cs.oswego.edu Thu Apr 19 21:31:42 2018 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 19 Apr 2018 17:31:42 -0400 Subject: [switch] Further unification on switch In-Reply-To: <88ba33f3-5c6a-62ac-21ad-f703e705f0cc@oracle.com> References: <88ba33f3-5c6a-62ac-21ad-f703e705f0cc@oracle.com> Message-ID: <806dfd48-b19f-919b-16bd-4edee8f680a7@cs.oswego.edu> I was starting to get fatalistically pessimistic about switch, but the all-colon-as-statement vs all-arrow-as-expression idea (with nothing in-between) seems pretty good! And would be even better if JLS impact were carefully checked. -Doug On 04/19/2018 04:44 PM, Brian Goetz wrote: > We've been reviewing the work to date on switch expressions. Here's > where we are, and here's a possible place we might move to, which I like > a lot better than where we are now. > > ## Goals > > As a reminder, remember that the primary goal here is _not_ switch > expressions; switch expressions are supposed to just be an > uncontroversial waypoint on the way to the real goal, which is a more > expressive and flexible switch construct that works in a wider variety > of situations, including supporting patterns, being less hostile to > null, use as either an expression or a statement, etc. > > And the reason we think that improving switch is the right primary goal > is because a "do one of these based on ..." construct is _better_ than > the corresponding chain of if-else-if, for multiple reasons: > > ?- Possibility for the compiler to do exhaustiveness analysis, > potentially finding more bugs; > ?- Possibility for more efficient dispatch -- a switch could be O(1), > whereas an if-else chain is almost certainly O(n); > ?- More semantically transparent -- it's obvious the user is saying "do > one of these, based on ..."; > ?- Eliminates the need to repeat (and possibly get wrong) the switch > target. > > Switch does come with a lot of baggage (fallthrough by default, > questionable scoping, need to explicitly break), and this baggage has > produced the predictable distractions in the discussion -- a desire that > we subordinate the primary goal (making switch more expressive) to the > more contingent goal of "fixing" the legacy problems of switch. > > These legacy problems of switch may be unfortunate, but to whatever > degree we end up ameliorating these, this has to be purely a > side-benefit -- it's not the primarily goal, no matter how annoying > people find them.? (The desire to "fix" the mistakes of the past is > frequently a siren song, which is why we don't allow ourselves to take > these as first-class requirements.) > > #### What we're not going to do > > The worst possible outcome (which is also the most commonly suggested > "solution" in forums like reddit) would be to invent a new construct > that is similar to, but not quite the same as switch (`snitch`), without > being a 100% replacement for today's quirky switch.? Today's switch is > surely suboptimal, but it's not so fatally flawed that it needs to be > euthanized, and we don't want to create an "undead" language construct > forever, which everyone will still have to learn, and keep track of the > differences between `switch` and `snitch`.? No thank you. > > That means we extend the existing switch statement, and increase > flexibility by supporting an expression form, and to the degree needed, > embrace its quirks.? ("No statement left behind.") > > #### Where we started > > In the first five minutes of working on this project, we sketched out > the following (call it the "napkin sketch"), where an expression switch > has case arms of the form: > > ?? case L -> e; > or > ?? case L -> { statement*; break e; } > > This was enough to get started, but of course the devil is in the details. > > #### Where we are right now > > We moved away from the napkin sketch for a few reasons, in part because > it seemed to be drawing us down the road towards switch and snitch -- > which was further worrying as we still had yet to deal with the > potential that pattern switch and constant switch might have differences > as well.? We want a unified model of switch that deals well enough with > all the cases -- expressions and statements, patterns and constants. > > Our current model (call this Unification Attempt #1, or UA1 for short) > is a step towards a unified model of switch, and this is a huge step > forward.? In this model, there's one switch construct, and there's one > set of control flow rules, including for break (like return, break takes > a value in a value context and is void in a void context). > > For convenience and safety, we then layered a shorthand atop > value-bearing switches, which is to interpret > > ??? case L -> e; > > as > > ??? case L: break e; > > expecting the shorter form would be used almost all the time.? (This has > a pleasing symmetry with the expression form of lambdas, and (at least > for expression switches) alleviates two of the legacy pain points.? > Switch expressions have other things in common with lambdas too; they > are the only ones that can have statements; they are the only ones that > interact with nonlocal control flow.) > > This approach offers a lot of flexibility (some would say too much).? > You can write "remi-style" expression switches: > > ??? int x = switch (y) { > ??????? case 1: break 2; > ??????? case 2: break 4; > ??????? default: break 8; > ??? }; > > or you can write "new-style" expression switches: > > ??? int x = switch (y) { > ??????? case 1 -> 2; > ??????? case 2-> 4; > ??????? default-> 8; > ??? }; > > Some people like the transparency of the first; others like the > compactness and fallthrough-safety of the second.? And in cases where > you mostly want the benefits of the second, but the real world conspires > to make one or two cases difficult, you can mix them, and take full > advantage of what "old switch" does -- with no new rules for control flow. > > #### Complaints > > There were the usual array of complaints over syntax -- many of which > can be put down to "bleah, new is different, different is bad", but the > most prominent one seems to be a generalized concern that other users > (never us, of course, but we always fear for what others might do) won't > be able to "handle" the power of mixed switches and will write terrible > code, and then the world will burn.? (And, because the mixing comes with > fallthrough, it further engenders the "you idiots, you fixed the wrong > thing" reactions.) Personally, I think the fear of mixing is deeply > overblown -- I think in most cases people will gravitate towards one of > the two clean styles, and only mix where the complexity of the real > world forces them to, but there's value in understanding the > underpinnings of such reactions, even if in the end they'd turn out to > be much hot air about nothing. > > #### A real issue with mixing! > > But, there is a real problem with our approach, which is: while a > unified switch is the right goal, UA1 is not unified _enough_. > Specifically, we haven't fully aligned the statement forms, and this > conspires to reduce expressiveness and safety.? That is, in an > expression switch you can say: > > ??? case L -> e; > > but in a statement switch you can't say > > ??? case L -> s; > > The reason for this is a purely accidental one: if we allowed this, then > we _would_ likely find ourselves in the mixing hell that people are > afraid of, which in turn would make the risk of accidental fallthrough > _even worse_ than it is today.? So the failing of mixing is not that it > will be abused, but that it constrains us from actually getting to a > unified construct. > > ## Closing the gap > > So, let's take one more step towards unifying the two forms (call this > UA2), rather than a step away from it.? Let's say that _all_ switches > can support either old-style (colon) or new-style (arrow) case labels -- > but must stick to one kind of case label in a given switch: > > ??? // statement switch > ??? switch (x) { > ??????? case 1: println("one"); break; > ??????? case 2: println("two"); break; > ??? } > > or > > ??? // also statement switch > ??? switch (x) { > ??????? case 1 -> println("one"); > ??????? case 2 -> println("two"); > ??? } > > If a switch is a statement, the RHS is a statement, which can be a block > statement: > > ??? case L -> { a; b; } > > We get there by first taking a step backwards, at least in terms of > superficial syntax, to the syntax suggested by the napkin sketch, where > if a switch is an expression, the RHS of an -> case is an expression or > a block statement (in the latter case, it must complete abruptly by > reason of either break-value or throw).? Just as we expected "break > value" to be rare in expression switches under UA1 since developers will > generally prefer the shorthand form where applicable, we expect it to be > equally rare under UA2. > > Then, as in UA1, we render unto expressions the things that belong to > expressions; they must be total (an expression must yield a value or > complete abruptly by reason of throwing.) > > #### Look, accidental benefits! > > Many of switches failings (fallthrough, scoping) are not directly > specified features, as much as emergent properties of the structure and > control flow of switches.? Since by definition you can't fall out of a > arrow case, then an all-arrow switch gives the fallthrough-haters what > they want "for free", with no need to treat it specially. In fact, its > even better; in the all-arrow form, all of the things people hate about > switch -- the need to say break, the risk of fallthrough, and the > questionable scoping -- all go away. > > #### Scorecard > > There is one switch construct, which can be use as either an expression > or a statement; when used as an expression, it acquires the > characteristics of expressions (must be total, no nonlocal control flow > out.)? Each can be expressed in one of two syntactic forms (arrow and > colon.)? All forms will support patterns, null handling, and multiple > labels per case.? The control flow and scoping rules are driven by > structural properties of the chosen form. > > The (statement, colon) case is the switch we have since Java 1.0, > enhanced as above (patterns, nulls, etc.) > > The (statement, arrow) case can be considered a nice syntactic shorthand > for the previous, which obviates the annoyance of "break", implicitly > prevents fallthrough of all forms, and avoids the confusion of current > switch scoping.? Many existing statement switches that are not > expressions in disguise can be refactored to this. > > The (expression, colon) form is a subset of UA1, where you just never > say "arrow". > > The (expression, arrow) case can again be considered a nice shorthand > for the previous, again a subset of UA1, where you just never say > "colon", and as a result, again don't have to think about fallthrough. > > Totality is a property of expression switches, regardless of form, > because they are expressions, and expressions must be total. > > Fallthrough is a property of the colon-structured switches; there are no > changes here. > > Nonlocal control flow _out_ of a switch (continue to an enclosing loop, > break with label, return) are properties of statement switches. > > So essentially, rather than dividing the semantics along > expression/statement lines, and then attempting to opportunistically > heap a bunch of irrelevant features like "no fallthrough" onto the > expression side "because they're cool" even though they have nothing to > do with expression-ness, we instead divide the world structurally: the > colon form gives you the old control flow, and the arrow form gives you > the new.? And either can be used as a statement, or an expression.? And > no one will be confused by mixing. > > Orthogonality FTW.? No statement gets left behind. > > ## Explaining it > > Relative to UA1, we could describe this as adding back the blocks (its > not really a block expression) from the napkin model, supporting an > arrow form of statement switches with blocks too, and then restricting > switches to all-arrow or all-colon.? Then each quadrant is a restriction > of this model.? But that's not how we'd teach it. > > Relative to Java 10, we'd probably say: > > ?- Switch statements now come in a simpler (arrow) flavor, where there > is no fallthrough, no weird scoping, and no need to say break most of > the time.? Many switches can be rewritten this way, and this form can > even be taught first. > ?- Switches can be used as either expressions or statements, with > essentially identical syntax (some grammar differences, but this is > mostly interesting only to spec writers).? If a switch is an expression, > it should contain expressions; if a switch is a statement, it should > contain statements. > ?- Expression switches have additional restrictions that are derived > exclusively from their expression-ness: totality, can only complete > abruptly if by reason of throw. > ?- We allow a break-with-value statement in an expression switch as a > means of explicitly providing the switch result; this can be combined > with a statement block to allow for statements+break-expression. > > The result is one switch construct, with modern and legacy flavors, > which supports either expressions or statements.? You can immediately > look at the middle of a switch and tell (by arrow vs colon) whether it > has the legacy control flow or not. > > > From kevinb at google.com Thu Apr 19 21:43:30 2018 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 19 Apr 2018 14:43:30 -0700 Subject: Expression switch exception naming In-Reply-To: References: <0CB0D6F1-83AF-4C91-8A86-77BB8201DF67@oracle.com> <2125955809.1367233.1522250385521.JavaMail.zimbra@u-pem.fr> <27798a9d-d88f-8265-3c22-337d6d07bcb1@oracle.com>