From forax at univ-mlv.fr Sat Mar 18 17:22:02 2017 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 18 Mar 2017 18:22:02 +0100 (CET) Subject: Pattern Matching Message-ID: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> Hi guys, i've already implemented a kind of pattern matching in a language that run on the JVM (still closed source :( ), so i've said to Brian that i will send a summary of how it is done. please do not care about the syntax, it's just for explanation :) regards, R?mi --- Pattern Matching 1. expression problem polymorphism: interface I { String m(); } class A implements I { String m() { return "A::m"; } } class B implements I { String m() { return "B::m"; } } pattern matching: static String m(I i) { switch(i) { case A a: return "A::m"; case B b: return "B::m"; default: throw new AssertionError(); } } polymorphism => easy to add a new subtype of I, can not add a new operation pattern matching => easy to add a new opertion, can not add a new subtype said differently, if the hierarchy is open => use polymorphism, if the hierarchy is closed => use pattern matching The expression problem, Philip Wadler => you can not have both and type safety. Other ways: - The visitor pattern uses the double dispatch technique to implement the pattern matching, the visitor interface has to list all the methods so it's equivalent to closed the hierarchy. - A hashmap of lambda [1], extensible in both direction but it required an unsafe cast. - multi dispatch like in CLOS, Dylan or Clojure, not typesafe. 2. smart cast / flow type inference class A { int value; } class B { String s; } Introducing a new name/local variable in the case allows to avoid unnecessary casts, so instead of switch(val) { case A: return ((A)a).value; case B: return ((B)b).s.length(); } one can write: switch(val) { case A a: return a.value; case B b: return b.s.length(); } the other solution is to specify a kind of flow inference, i.e. inside the case A, val is now typed as 'A', and inside of case B, val is now typed as 'B'. switch(val) { case A: // here val is now a A return val.value; case B: // here val is now a B return val.s.length(); } This doesn't work well if you can fallthough from case A to case B (more on this later) and in term of implementation, it's better for the debugger to create a new variable inside the case, so the generated code is more like: switch(val) { case A: A $val = (A)val; return $val.value; case B: B $val = (B)val; return $val.s.length(); } 2. block vs expression When switching on type, being able to fallthrough from a case to another is less useful, because it's only safe if the type of the followup case is a super type. switch(val) { case Foo: case Bar: // here Bar as to be a supertype of Foo. } Moreover, because of the separate compilation, the relationship between Foo and Bar can change after the code that containing the switch is compiled, do Bar being a supertype of Foo as to be verified (once) at runtime or by the bytecode verifier. So instead of using a C like switch based on statements, it's better a construction based on expression like the match in Scala or the when in Kotlin, something like match(val) { case A a -> a.value; case B b -> b.s.length(); } 3. generalized cases Instead of a switch on type, we can go a step further and allow any arbitrary tests, like by example allowing comparison by value, by example in Java, because unlike in Scala there is no hierachy to distinguish if an optional is present or not, with a match that allows comparison on value, we can write match(opt) { case Optional.empty() -> "empty" case Optional opt -> opt.get() } One problem with this construct is that unlike the match on type, here, there is an implicit order between the cases, e.g. at runtime the cases as to be tested in the order like a cascade of if/else. From the performance point of view, the drawback is that if you have a big hierachy, and want to do a switch on type, having a linear test is slow. From my own experience, it's a serious issue when you try to use a switch on type with that perf property in a compiler. A solution i've used in one language is to force users to define all the case on values first and then the case on types, at runtime, the case on values were executed with a cascade of if/else and the cases on types were executed using a if/else or an hashmap depending on the number of cases. 4. structural matching One problem of the pattern matching, is that because it's not expressed inside a hierarchy, it ruins the encapsulation, so either your user has to write getters or you have to expose the structure of the class to the pattern matching. Case class in Scala or record in C# 7 [2] are special constructs that expose the structure of a class, so the case of a match can specified in term of destructured local variable interface I { } case class A(int value) implements I; case class B(String s, String s2) implements I; match(i) { case A(val) -> val case B(s, _) -> s } Both C# and Scala allow user to define how the matching is done, C# relies on the is operator which uses out parameters to extract the values and Scala uses an extractor method (unapply) which uses tuples to extract the values. Java has none of this mechanism, so either we introduce a mechanism that provides a structural definition + getters or we wait the introduction of value types (and tuples). A simple proposal for a structural definition of a class: structural class B(final String s, final String s2) implements I; which is equivalent to /*structural*/ class B { private final String s; private final String s2; public B(String s, String s2) { this.s = s; this.s2 = s2; } public String getS() { return s; } public String getS2() { return s2; } + equals and hashCode + a StructuralAttribute that describe the attributes s and s2. } if a body is declared, then nothing is generated by the compiler apart the StructuralAttribute and it's an error to not provide the getters, equals and hashCode. structural class B(final String s, final String s2) implements I { // need a getS and getS2 and equals and hashCode } Again, it's maybe better to wait value types instead of relying on getters. 5. Compilation of a pattern matching As said previously, a switch on types should not be compiled to a cascade of if/else. - It's dog slow if you have more than a dozen of cases, because it's a linear scan and each instanceof is in the best case also translated to an if/else by the JIT - It doesn't work well with separate compilation, because if/else cascade is a sequence so defines an order. The solution is to use invokedynamic, obviously :) The recipe of a pattern matching can be describe as an array of couple of pattern/action, the pattern is: a test to a constant (static final field) a test to an argument, argument are like captured argument of a lambda pass as argument of invokedynamic a test of a type (a class) a destructured definition (a string encoding the matching tree indicating the extracted values) an action is: a constant method handle that reference a static method that takes as parameter the local variable defines by the pattern. Note: in my language, i've encoded the receipe as a JSON like string uuencoded seen as a constant String because there is no way to definition a tree like constant in the bytecode (yet !). Examples: match(opt) { case Optional.empty() -> "empty" case Optional opt -> opt.get() } can be encoded as invokedynamic(opt) (Ljava/util/Optional;)Ljava/lang/String; with the recipe: constant: method: Ljava/lang/Optional;.empty()Ljava/lang/Optional; type: Optional.class and match(i) { case A(val) -> val case B(s, _) -> s } can be encoded as invokedynamic(i) (LI;)Ljava/lang/Object; with the receipe: structural: LA;(?) structural: LB;(?_) (where ? means capture and _ forget) At runtime, in the bootstrap method, the type of the constant, the type of the argument, the type of the test type and the type of the structural matching are checked to verify that they are subtype of the first parameter of invokedynamic, otherwise an Error is raised. The test to a constant and the test to an argument are created as guardWithTest eagerly. The test on types are created lazily when a new class is found, if more that one pattern match, an error is reported otherwise, a guardWithTest is created and if there are two many guardWithTest (12 currently), a ClassValue is used instead. GWT are inserted the one after the others, a first prototype was trying to profile the branch taken to re-balance the tree is necessary but sometimes c1 was able to compile the code that was doing the profiling before the code that was doing the re-balance was called leading to a stupid code invalidation. A specific MethodHandle could do a better job here ! [1] https://github.com/forax/design-pattern-reloaded/blob/master/src/main/java/visitor/visitor6.java#L28 [2] https://github.com/dotnet/roslyn/blob/features/records/docs/features/records.md From john.r.rose at oracle.com Sat Mar 18 18:50:08 2017 From: john.r.rose at oracle.com (John Rose) Date: Sat, 18 Mar 2017 11:50:08 -0700 Subject: Pattern Matching In-Reply-To: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> Message-ID: <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> On Mar 18, 2017, at 10:22 AM, Remi Forax wrote: > > Hi guys, > i've already implemented a kind of pattern matching in a language that run on the JVM (still closed source :( ), > so i've said to Brian that i will send a summary of how it is done. Thanks, Remi. This is useful. At many points it parallels thinking that we have been doing. I'm glad to see you use indy. As with the string-concat work, there is an opportunity to perform run-time optimizations that way, which an eager bytecode spinning approach can't touch. For example, the static compiler can inspect type relations among different case branches, and it can make requirements such as no dead cases, but it's only at runtime that the final type relations are available. And even if the static compiler ignores some type relations, the BSM for the switch can use such information to flatten the decision tree semi-statically. (My thought ATM is to forbid dead code at compile time but allow it at runtime if the type relations have changed. I think that's in keeping with the way we manage other binary compatibility questions. The runtime optimization can ignore the static presuppositions. This means that the semantics of switch need to be "as if" the switch is "really just" a linear decision chain. We save performance by using a declarative formulation to the BSM which allows the BSM code generator to reorganize the chain as a true switch, or shallow cascade of switches. BTW, we forgot the switch combinator. Maybe we can fix this in the next release? More indy leftovers! It should be used to compile switching on strings and enums, as well as any future pattern matches on constants. It should be optionally lazy and open, which is what I think you are also calling for at the end of your message.) The theory of type safety of multiple dispatch has moved forward with Fortress. Alternatively, if you can sugar up a visitor deployment as if it were multimethods added as extensions, you could prove type-safety and still define apparent methods, outside the type capsule. When we get value types visitors will become cheaper. Maybe at that point we can think about doing multi-method sugar that compiles to invisible visitor classes. (I'm not suggesting we do this now!) Maybe one of our lambda leftovers will help with the problem of flow-typing. switch (val) { case A val: ? } // relax shadowing rule for some bindings?? It's a fraught question, because the anti-shadowing rule prevents confusions which a relaxation re-introduces, if users misuse the freedom of expression. The goal would be to guide users to shadow variables only when the shadowing binding is somehow a logical repeat of the original binding. This cannot be done automatically in all cases we care about. Tricky tradeoffs. We are also struggling with unapply. There are several hard parts, including the encapsulation model and the API for delivering multiple components. The low-level parts probably require more indy-level combinators with special JVM optimizations. I wish we had full value types today so we could just do tuples for multiple-value packages. But even that is a simulation of the true shape of the problem, and simulation has overheads even if we stay out of the heap. A lower-level solution which requires no companion types at all (not even tuples) would be to reify argument lists per se, at the same level of abstraction as method handle types. That's a building block I am currently experimenting with, as a value-based class which can be upgraded to a value type. The type information is encoded as (wait for it) a MethodType, interpreted with the arrows reversed. Thanks for the brain dump! ? John From forax at univ-mlv.fr Sat Mar 18 22:32:04 2017 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sat, 18 Mar 2017 23:32:04 +0100 (CET) Subject: Pattern Matching In-Reply-To: <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> Message-ID: <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "R?mi Forax" > Cc: "amber-spec-experts" > Envoy?: Samedi 18 Mars 2017 19:50:08 > Objet: Re: Pattern Matching > On Mar 18, 2017, at 10:22 AM, Remi Forax wrote: >> >> Hi guys, >> i've already implemented a kind of pattern matching in a language that run on >> the JVM (still closed source :( ), >> so i've said to Brian that i will send a summary of how it is done. > > Thanks, Remi. This is useful. At many points it parallels thinking that we have been doing. > > I'm glad to see you use indy. As with the string-concat work, there is an opportunity to perform > run-time optimizations that way, which an eager bytecode spinning approach can't touch. yes ! > For example, the static compiler can inspect type relations among different case branches, > and it can make requirements such as no dead cases, but it's only at runtime > that the final type relations are available. And even if the static compiler ignores some type > relations, the BSM for the switch can use such information to flatten the decision tree > semi-statically. yes > > (My thought ATM is to forbid dead code at compile time but allow it at runtime if > the type relations have changed. I think that's in keeping with the way we manage > other binary compatibility questions. yes, we have decided to throw an error at runtime in case of dead code mostly to report a compilation bug because we do not have a real IDE/incremental compilation. > The runtime optimization can ignore the > static presuppositions. This means that the semantics of switch need to be > "as if" the switch is "really just" a linear decision chain. We save performance > by using a declarative formulation to the BSM which allows the BSM code > generator to reorganize the chain as a true switch, or shallow cascade of > switches. BTW, we forgot the switch combinator. Maybe we can fix this in > the next release? More indy leftovers! It should be used to compile switching > on strings and enums, as well as any future pattern matches on constants. > It should be optionally lazy and open, which is what I think you are also > calling for at the end of your message.) yes, switching on enums should be done using the indy too, currently swapping two constants is not backward compatible because they is maybe a switch somewhere. and +1 for adding the switch combinator in 10. I would really like to be able to tell the JIT that the tests has no side effect (maybe it can be another combinator too) even if some part of the test are not inlinable. > > The theory of type safety of multiple dispatch has moved forward with Fortress. > Alternatively, if you can sugar up a visitor deployment as if it were multimethods > added as extensions, you could prove type-safety and still define apparent methods, > outside the type capsule. When we get value types visitors will become cheaper. > Maybe at that point we can think about doing multi-method sugar that compiles > to invisible visitor classes. (I'm not suggesting we do this now!) The main issue of the visitor is that you have to add the accept methods on the class of the hierarchy before being able to use the visitor, this is equivalent to be able to insert a method or a constant into a class, which is also equivalent to be able to inject the implementation of a not yet existing interface. > > Maybe one of our lambda leftovers will help with the problem of flow-typing. > switch (val) { case A val: ? } // relax shadowing rule for some bindings?? > It's a fraught question, because the anti-shadowing rule prevents confusions > which a relaxation re-introduces, if users misuse the freedom of expression. > The goal would be to guide users to shadow variables only when the shadowing > binding is somehow a logical repeat of the original binding. This cannot be > done automatically in all cases we care about. Tricky tradeoffs. For lambdas, a lambda is an anonymous function, and even in JavaScript, opening a function open a new scope. If the switch as a syntax close to the lambda one (and less close to the C) it may not be a big deal. switch(val) { A val -> ... } but given that usually you have more that one case, it will rapidly become unreadable switch(val) { A val -> foo(val) B val -> foo(val) ... } > > We are also struggling with unapply. There are several hard parts, including > the encapsulation model and the API for delivering multiple components. > The low-level parts probably require more indy-level combinators with > special JVM optimizations. I wish we had full value types today so we > could just do tuples for multiple-value packages. But even that is a > simulation of the true shape of the problem, and simulation has overheads > even if we stay out of the heap. A lower-level solution which requires > no companion types at all (not even tuples) would be to reify argument > lists per se, at the same level of abstraction as method handle types. Will it work if the pattern support nested types ? case A(B(x y), _): ... You will have to have a kind of growable arguments list very similar to the one you need to implement the method handle combinators. Technically, for the mh combinators you also need to shrink the arguments list but you can cheat if you are able to extract a sub part when doing the calls to the direct method handle. Jerome has used that trick when implementing invokedynamic on android. > That's a building block I am currently experimenting with, as a value-based > class which can be upgraded to a value type. The type information is > encoded as (wait for it) a MethodType, interpreted with the arrows reversed. The arrow reversed MethodType is a way to represent a flatten tuple of types to a named tuple (the value type) with the same component types. So in order to represent a tuple of multiple values which are the structural value of a class, you need a reified tuple of types and you use a MethodType as a hack. Another way to see unapply is to see it as a kind 'maybe tailcall' in a continuation, if the pattern doesn't match, it tailcalls and checks the next pattern otherwise, it drops the unnecessary parameters and calls the action. In case of embedded component, the action itself is to do a 'maybe tailcall' after the component values are on stack but if the pattern doesn't match it now has to pop two stack frames. class A { int value; B b; void unapply(MethodHandle[int, B] match, MethodHandle reject) { // act as a continuation if (value == 0) { reject.invokeExact(); // tail call to the next pattern } match.invokeExact(value, b); // insert the argument into the previous call, calls unapply on b if nested, calls the action otherwise } } > > Thanks for the brain dump! > > ? John R?mi From john.r.rose at oracle.com Sun Mar 19 00:05:20 2017 From: john.r.rose at oracle.com (John Rose) Date: Sat, 18 Mar 2017 17:05:20 -0700 Subject: combinators for pattern matching In-Reply-To: <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> Message-ID: On Mar 18, 2017, at 3:32 PM, forax at univ-mlv.fr wrote: > > switching on enums should be done using the indy too, currently swapping two constants is not backward compatible because they is maybe a switch somewhere. Actually it's OK to permute enums because each switch uses a local mapping table. But that causes overheads (partly because arrays are not constant-foldable, another place we need frozen arrays). A switch combinator for enums would optimize the case where caller and callee agreed on enum order, while covering the corner cases with a @Stable indirection table. That's better in two ways from what we do now. In general, the static compiler would "guess" at some sort of enumeration (or perfect hash function) of the inputs and trust the runtime to correct any flaws. > and +1 for adding the switch combinator in 10. This would be a perfect non-Oracle contribution from the mlvm crowd. Hint, hint. (Sadly, Michael Haupt is not available for the next round of combinators.) Let's follow up on mlvm-dev if we want to discuss that combinator apart from its use in Amber. > I would really like to be able to tell the JIT that the tests has no side effect (maybe it can be another combinator too) even if some part of the test are not inlinable. That's a tough one. It's not like Java to allow the user to assert something that cannot be proven by the JVM (usually by static types or runtime checks). /** Returns a method handle that operates exactly like target, * but passes all side effect attempts to the indicated collector. */ MethodHandle MHs.asPureMethod(MethodHandle target, SideEffectCollector effects); That's a research project. I'd almost settle for a hardwired collector which just throws IllegalSideEffectException or some such. (This is a version of freezing for arrays or other objects, but for methods.) Since it's a research project, it's not something I'm going to say more about in this venue. We can talk on mlvm-dev about the details if you want. >> >> The theory of type safety of multiple dispatch has moved forward with Fortress. >> Alternatively, if you can sugar up a visitor deployment as if it were multimethods >> added as extensions, you could prove type-safety and still define apparent methods, >> outside the type capsule. When we get value types visitors will become cheaper. >> Maybe at that point we can think about doing multi-method sugar that compiles >> to invisible visitor classes. (I'm not suggesting we do this now!) > > The main issue of the visitor is that you have to add the accept methods on the class of the hierarchy before being able to use the visitor, this is equivalent to be able to insert a method or a constant into a class, which is also equivalent to be able to inject the implementation of a not yet existing interface. But you can usually write the accept methods once, universally, right? The multi-method part is often just the specific visit* methods of the visitor. The parts of the multi-methods which switch on the acceptor object types can be factored out just once. Here's the pattern I'm thinking of in multimethods: multimethod WalkY.Result walkY(NodeTypeX node, WalkY walker); ==> value class WalkY$Walker implements NodeTypeTop.Visitor { ? visit(NodeTypeX node) { return walkY(node, this.walker); } ? } The hooks for all the NodeTypeX things would go once into the node hierarchy, assuming such a hierarchy can be pre-configured in a uniform manner, while the various ad hoc methods would go into an ad hoc WalkY walker value class. At this point I would need to study the "expr problem" literature to see how this ties in (I'm sure it's not a new idea), so I'll just say it seems useful as a FUTURE (not NOW) option for sugary multi-methods, and leave it there. No way are we doing multi-methods any time soon. But: The combined pattern of visitor + factory is sometimes called a metamorphism. I think that what we are after, in the end, is a full set of hooks for building ad hoc metamorphisms, without having to add more logic to the participating data classes. A matcher hook is really a co-constructor, which provides a mirror-image computation to the constructor. One takes some values and produces an object from the value, while the other takes an object and produces some values from the object. Both halves are important, and it is also important to make the two halves work together gracefully. IMO that's the deep reason why notations for patterns look like notations for object construction expressions. I am hoping we can exploit this mirror structure in Amber designs. > >> >> The goal would be to guide users to shadow variables only when the shadowing >> binding is somehow a logical repeat of the original binding. This cannot be >> done automatically in all cases we care about. Tricky tradeoffs. > > For lambdas, a lambda is an anonymous function, and even in JavaScript, opening a function open a new scope. > If the switch as a syntax close to the lambda one (and less close to the C) it may not be a big deal. > > switch(val) { > A val -> ... > } > > but given that usually you have more that one case, it will rapidly become unreadable > > switch(val) { > A val -> foo(val) > B val -> foo(val) > ? > } Actually, I find that a reasonable compromise between pure flow typing and explicit rebinding. Yes, the names are stuttery, but there's a logical reason for it; we are explicitly instructing the compiler to rebind the name after each case label. Compare similar stuttering in "this.foo = foo" or "map.computeIfAbsent(k, k -> k.toString())". >> >> A lower-level solution which requires >> no companion types at all (not even tuples) would be to reify argument >> lists per se, at the same level of abstraction as method handle types. > > Will it work if the pattern support nested types ? > case A(B(x y), _): ? Yes. You need a composite MH which takes a possible A (with possible B) and returns an argument list of x,y. The type of the argument list is totally ad hoc. Consider: case A(B(x,_),C(y)): ? In that case, the argument list of (x,y) might have types that never appear together, except at this one place in the source code. > > You will have to have a kind of growable arguments list very similar to the one you need to implement the method handle combinators. > Technically, for the mh combinators you also need to shrink the arguments list but you can cheat if you are able to extract a sub part when doing the calls to the direct method handle. I think that means you need argument list utilities (are they combinators?) which perform for argument lists what dropArgs, insertArgs, collectArgs, spreadArgs, etc., do for MHs. Actually, the existing MH combinators will do all of this, given two more MH hooks which allow method handles to map between the two forms of argument lists: normal, and "collected into a single slug". With a little forcing, collectArguments and spreadArguments could be trained to look for the argument list type as well as array types, et voila. > Jerome has used that trick when implementing invokedynamic on android. Ignoring arguments is a good trick. If you have an argument list pointer, you can ignore both a prefix and a suffix. > >> That's a building block I am currently experimenting with, as a value-based >> class which can be upgraded to a value type. The type information is >> encoded as (wait for it) a MethodType, interpreted with the arrows reversed. > > The arrow reversed MethodType is a way to represent a flatten tuple of types to a named tuple (the value type) with the same component types. Hold on, there are no names here, just positions. That's important! We call Math.max on (DD) not on (Dx;Dy;). > So in order to represent a tuple of multiple values which are the structural value of a class, you need a reified tuple of types and you use a MethodType as a hack. The MT just provides the dynamic type framework. (The return type is always void.class.) The List API provides weakly-typed access, and there also has to be a way to get strongly-typed MH-based access if you know the MT in advance. > Another way to see unapply is to see it as a kind 'maybe tailcall' in a continuation, Yes, but the JVM is very bad at tailcalls, so that's not practical. And in any case, even if tailcalls were great, you'd leave behind your local state, which will make compilers unhappy and lead to the boxing of frame-state (in closures or whatnot). It's the difference between internal and external iterators all over again. I think we need both options. And in this case I think we want to build the internal (CPS flavor) on top of the external (accessor flavor), and not vice versa. The CPU flavor suffers from bad support for tailcall, loss of local state, and one more thing: It's a little too eager. It assumes that the client wants *all* of the component values from the match. It would be nice (though it's not required) if you could say "_" for a component pattern, and avoid the cost of reifying the component. Why is this important? Well, the component might actually cost something to reify; standard examples are defensive copying of array components, or creation of invariant-enforcing view objects. We can't handwave these away by saying that pattern-matchable objects will always be so simple that component extraction will always be cheap. (This is a limitation of the argument-list approach also, so we might want something more here.) None of this bears on the surface syntax of the language, but it's important for some of us to think about since classfiles containing compiled Amber code need to be compact and fully optimizable by the Java runtime. In particular, if we didn't have indy to perform boilerplate generation for us at runtime, our class files would become terribly bloated by the boilerplate features we are adding. A one line source file could expand by 100x if all the boilerplate had to be statically elaborated into the classfile. With indy-style bootstrapping we can avoid paying the cost for boilerplate functions until they are actually used. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Sun Mar 19 18:22:27 2017 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sun, 19 Mar 2017 19:22:27 +0100 (CET) Subject: combinators for pattern matching In-Reply-To: References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> Message-ID: <2080140442.1828720.1489947747837.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "R?mi Forax" > Cc: "amber-spec-experts" > Envoy?: Dimanche 19 Mars 2017 01:05:20 > Objet: combinators for pattern matching > On Mar 18, 2017, at 3:32 PM, forax at univ-mlv.fr wrote: >> switching on enums should be done using the indy too, currently swapping two >> constants is not backward compatible because they is maybe a switch somewhere. > Actually it's OK to permute enums because each switch uses a local mapping > table. > But that causes overheads (partly because arrays are not constant-foldable, > another > place we need frozen arrays). A switch combinator for enums would optimize the > case where caller and callee agreed on enum order, while covering the corner > cases > with a @Stable indirection table. That's better in two ways from what we do now. > In general, the static compiler would "guess" at some sort of enumeration (or > perfect > hash function) of the inputs and trust the runtime to correct any flaws. right, It just remember me something so before i forget, one reason why i come with a string recipe and not something that uses an array of method handles is that you have to support the stupid corner where there are more than 253 cases in a switch. Given i hit that peculiar case with a language which is not used by more that a hundred of people, i suppose Java will have the same issue. [... discussion moved to mlvm ...] >>> The theory of type safety of multiple dispatch has moved forward with Fortress. >>> Alternatively, if you can sugar up a visitor deployment as if it were >>> multimethods >>> added as extensions, you could prove type-safety and still define apparent >>> methods, >>> outside the type capsule. When we get value types visitors will become cheaper. >>> Maybe at that point we can think about doing multi-method sugar that compiles >>> to invisible visitor classes. (I'm not suggesting we do this now!) >> The main issue of the visitor is that you have to add the accept methods on the >> class of the hierarchy before being able to use the visitor, this is equivalent >> to be able to insert a method or a constant into a class, which is also >> equivalent to be able to inject the implementation of a not yet existing >> interface. > But you can usually write the accept methods once, universally, right? if you are able to abstract over an unknown number of parameters, yes. > The multi-method part is often just the specific visit* methods of the visitor. > The parts of the multi-methods which switch on the acceptor object types > can be factored out just once. > Here's the pattern I'm thinking of in multimethods: > multimethod WalkY.Result walkY(NodeTypeX node, WalkY walker); > ==> value class WalkY$Walker implements NodeTypeTop.Visitor { ? > visit(NodeTypeX node) { return walkY(node, this.walker); } ? } > The hooks for all the NodeTypeX things would go once into the node hierarchy, > assuming such a hierarchy can be pre-configured in a uniform manner, while > the various ad hoc methods would go into an ad hoc WalkY walker value class. > At this point I would need to study the "expr problem" literature to see how > this > ties in (I'm sure it's not a new idea), so I'll just say it seems useful as a > FUTURE > (not NOW) option for sugary multi-methods, and leave it there. No way are we > doing multi-methods any time soon. Implementing multimethods with a visitor is a pain, because you have to dispatch on each parameters, so if you are in the middle of a dispatch you need a way to extract an argument, call accept on it with a new extractor that will extract the next argument, so you need tuple and array access to tuple. And this is not the fastest way to implements multimethod, it's better to analyse the signature of all implementations because dispatching on the 3rd parameter and checking the type of the other arguments (not unlike in a unverify entry point) may be a lot of faster than dispatch on the arguments in order. > But: The combined pattern of visitor + factory is sometimes called a > metamorphism. > I think that what we are after, in the end, is a full set of hooks for building > ad hoc > metamorphisms, without having to add more logic to the participating data > classes. > A matcher hook is really a co-constructor, which provides a mirror-image > computation > to the constructor. One takes some values and produces an object from the value, > while the other takes an object and produces some values from the object. yes, if you forget the '_', the matcher hook is the dual of a constructor, an extractor. It's even more true, if you compare with a vnew which load the arguments into the fields directly. A default extractor is something that takes the field from an object and put them on stack. But for any objecst, vnew is not custumizable enough, no null check, defensive copy, etc. that why we have constructor, unapply should be able to let user to specify which fields should be extracted the same way. > Both halves are important, and it is also important to make the two halves > work together gracefully. IMO that's the deep reason why notations for patterns > look like notations for object construction expressions. I am hoping we can > exploit this mirror structure in Amber designs. yes, i fully agree. >>> The goal would be to guide users to shadow variables only when the shadowing >>> binding is somehow a logical repeat of the original binding. This cannot be >>> done automatically in all cases we care about. Tricky tradeoffs. >> For lambdas, a lambda is an anonymous function, and even in JavaScript, opening >> a function open a new scope. >> If the switch as a syntax close to the lambda one (and less close to the C) it >> may not be a big deal. >> switch(val) { >> A val -> ... >> } >> but given that usually you have more that one case, it will rapidly become >> unreadable >> switch(val) { >> A val -> foo(val) >> B val -> foo(val) >> ? >> } > Actually, I find that a reasonable compromise between pure flow typing and > explicit rebinding. > Yes, the names are stuttery, but there's a logical reason for it; we are > explicitly instructing the > compiler to rebind the name after each case label. Compare similar stuttering in > "this.foo = foo" > or "map.computeIfAbsent(k, k -> k.toString())". Maybe, let's see what the other in the EG think. >>> A lower-level solution which requires >>> no companion types at all (not even tuples) would be to reify argument >>> lists per se, at the same level of abstraction as method handle types. >> Will it work if the pattern support nested types ? >> case A(B(x y), _): ? > Yes. You need a composite MH which takes a possible A (with possible B) > and returns an argument list of x,y. The type of the argument list is totally > ad hoc. Consider: > case A(B(x,_),C(y)): ? > In that case, the argument list of (x,y) might have types that never appear > together, except at this one place in the source code. yes, A.unapply returns { k, l } B.unapply retruns { m, n } C.unapply return { o } so a.unapply() -> { k, l } k.unapply() -> { m, n } l.unapply() -> { o } and then box(m, o) -> { m, o } in term of implementation, you can either have one argument list by calls or consider that what you want is just an object with stable field that will represent the whole computation. Here we need 5 slots, which are empty by default, z = { A k, B l, int m, int n, int o } a.unapply() is equivalent to z.k = k, z.l = l; z.k.unapply() is equivalent to z.m = m, z.n = n z.l.unapply() is equivalent to z.o = o then you need to be able to call with all z slots on stack, and use permute arguments or drop arguments to call the action. This is really if you are inside a constructor of the z object, sending itself to ask all unapply() to initialize a part of it. >> You will have to have a kind of growable arguments list very similar to the one >> you need to implement the method handle combinators. >> Technically, for the mh combinators you also need to shrink the arguments list >> but you can cheat if you are able to extract a sub part when doing the calls to >> the direct method handle. > I think that means you need argument list utilities (are they combinators?) > which perform for argument lists what dropArgs, insertArgs, collectArgs, > spreadArgs, etc., do for MHs. Actually, the existing MH combinators > will do all of this, given two more MH hooks which allow method handles > to map between the two forms of argument lists: normal, and "collected > into a single slug". With a little forcing, collectArguments and spreadArguments > could be trained to look for the argument list type as well as array types, > et voila. >> Jerome has used that trick when implementing invokedynamic on android. > Ignoring arguments is a good trick. If you have an argument list pointer, > you can ignore both a prefix and a suffix. or you copy them on top of the stack and the do the method call. >>> That's a building block I am currently experimenting with, as a value-based >>> class which can be upgraded to a value type. The type information is >>> encoded as (wait for it) a MethodType, interpreted with the arrows reversed. >> The arrow reversed MethodType is a way to represent a flatten tuple of types to >> a named tuple (the value type) with the same component types. > Hold on, there are no names here, just positions. That's important! We call > Math.max on (DD) not on (Dx;Dy;). for me the name was the name of value type to upgrade to. >> So in order to represent a tuple of multiple values which are the structural >> value of a class, you need a reified tuple of types and you use a MethodType as >> a hack. > The MT just provides the dynamic type framework. (The return type is always > void.class.) > The List API provides weakly-typed access, and there also has to be a way to get > strongly-typed MH-based access if you know the MT in advance. for me (see above) everything is always strongly typed but maybe un-initialized. >> Another way to see unapply is to see it as a kind 'maybe tailcall' in a >> continuation, > Yes, but the JVM is very bad at tailcalls, so that's not practical. And in any > case, > even if tailcalls were great, you'd leave behind your local state, which will > make > compilers unhappy and lead to the boxing of frame-state (in closures or > whatnot). about forgetting the local variable, first, you can avoid local variable (i.e. only allow effectively final likes with lambdas) then you can always transform a switch with mutable local variable to two switches not unlike how a switch on string is done. int foo; switch(val) { A a -> { foo = a.value; } B b -> { foo = b.s.length(); } ? } can be translated into int index = switch(val) { A a -> { return 0; } B b -> { return 1; } } int foo; switch(index) { case 0: A a = (A) val; foo = a.value; break; case 1: B b = (B) val; foo = b.s.length(); break; } But for the unapply/structural matching switch, we should limit ourselves to actions that have the same restriction as lambdas, i.e. expression switch. For the plain switch on type, we can support the C switch, but even that, i find it evil. BTW, one of my most painful debugging session i had with a Scala code was because an unapply() doing side effects. > It's the difference between internal and external iterators all over again. > I think we need both options. And in this case I think we want to build the > internal (CPS flavor) on top of the external (accessor flavor), and not vice > versa. > The CPU flavor suffers from bad support for tailcall, loss of local state, and > one > more thing: It's a little too eager. It assumes that the client wants *all* of > the > component values from the match. It would be nice (though it's not required) > if you could say "_" for a component pattern, and avoid the cost of reifying > the component. Why is this important? Well, the component might actually > cost something to reify; standard examples are defensive copying of array > components, or creation of invariant-enforcing view objects. We can't handwave > these away by saying that pattern-matchable objects will always be so simple > that component extraction will always be cheap. (This is a limitation of > the argument-list approach also, so we might want something more here.) if you drop the argument after some point, either the JIT will see the whole picture and remove this useless defensive copy, etc. or the JIT will think something escape, and you will pay for it (not Mexico) if you want a more fine grained solution, it's like providing a lambda for each extracted field and in that case, instead of having an unapply function, i think it's better and simpler to in the code make an association between a field, its getter and its order in the pattern. > None of this bears on the surface syntax of the language, but it's > important for some of us to think about since classfiles containing > compiled Amber code need to be compact and fully optimizable > by the Java runtime. In particular, if we didn't have indy to perform > boilerplate generation for us at runtime, our class files would become > terribly bloated by the boilerplate features we are adding. A one > line source file could expand by 100x if all the boilerplate had > to be statically elaborated into the classfile. With indy-style > bootstrapping we can avoid paying the cost for boilerplate > functions until they are actually used. > ? John R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Fri Mar 24 13:56:42 2017 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 24 Mar 2017 13:56:42 +0000 Subject: amber mailing lists traffic control Message-ID: Hi, one further organizational announcement - as amber is composed by several branches, I suggest that, when sending an email _to any of the amber lists_ to discuss a given topic (regardless of whether the email is related to the code or not), the branch name is used as an header for the email subject, using the following pattern: []: As we have three branches now, I recommend the following headers are used: * enhanced enums (JEP 301) -> [enhanced-enums] * lambda leftovers (JEP 302) -> [lambda-leftovers] * local variable type inference (JEP 286) -> [lvti] Example: [lvti]: problem when building repo This should help in keeping things under control. Of course it's not an hard rule, and should you feel that your email doesn't fit any of the specific sub-projects, feel free to omit the header. One last note - currently the only way to try out the branches is to build them. We don't have binary snapshots for amber, but we're working on a solution. Should this situation change we will let you know. Cheers Maurizio From andrey.breslav at jetbrains.com Mon Mar 27 11:04:36 2017 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Mon, 27 Mar 2017 11:04:36 +0000 Subject: Pattern Matching In-Reply-To: <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> Message-ID: My two cents from the Kotlin camp: 1. We still don't have full structural matching in Kotlin, because smart casts + unconditional destructuring work well enough for most cases (unless you are a compiler person, but we realize that such people are a minority :)). Yes, there are use cases outside compilers, but my point is that smart casts hit the sweet spot. Plus, we didn't come up with a syntax for structural matching that would be both clear (variable usage vs definition) and clean (inserting val's everywhere makes complex patterns rather scary-looking). So, my take on this would be: better not do it for now, and wait until it gets wide adoption in C#, since we have the luxury of someone else being brave and taking the risks already. 2. We've been using per-component getter methods for destructuring, which is a huge performance win compared to Scala's unapply that creates a box on every (successful) match. C++17 has gone roughly the same direction, but leveraged the templates with values as arguments. In our "open world" with static extension functions having independent functions for every component raises some questions, but if Java requires members there, it should be simpler. 3. And to share a little bit about smart casts (I'm flattered that you are using our terminology here ;) ): as soon as you have smart casts, the urge to get intersection types into the language strengthens, because of cases like this: if (x is A) { if (x is B) { // x is A&B here } } We still get away without making intersection types explicit, and will probably not add them in the very nearest future, but they become a lot less exotic with smart casts. On Sun, Mar 19, 2017 at 1:32 AM wrote: > ----- Mail original ----- > > De: "John Rose" > > ?: "R?mi Forax" > > Cc: "amber-spec-experts" > > Envoy?: Samedi 18 Mars 2017 19:50:08 > > Objet: Re: Pattern Matching > > > On Mar 18, 2017, at 10:22 AM, Remi Forax wrote: > >> > >> Hi guys, > >> i've already implemented a kind of pattern matching in a language that > run on > >> the JVM (still closed source :( ), > >> so i've said to Brian that i will send a summary of how it is done. > > > > Thanks, Remi. This is useful. At many points it parallels thinking that > we have been doing. > > > > I'm glad to see you use indy. As with the string-concat work, there is > an opportunity to perform > > run-time optimizations that way, which an eager bytecode spinning > approach can't touch. > > yes ! > > > For example, the static compiler can inspect type relations among > different case branches, > > and it can make requirements such as no dead cases, but it's only at > runtime > > that the final type relations are available. And even if the static > compiler ignores some type > > relations, the BSM for the switch can use such information to flatten > the decision tree > > semi-statically. > > yes > > > > > (My thought ATM is to forbid dead code at compile time but allow it at > runtime if > > the type relations have changed. I think that's in keeping with the way > we manage > > other binary compatibility questions. > > yes, > we have decided to throw an error at runtime in case of dead code mostly > to report a compilation bug because we do not have a real IDE/incremental > compilation. > > > The runtime optimization can ignore the > > static presuppositions. This means that the semantics of switch need to > be > > "as if" the switch is "really just" a linear decision chain. We save > performance > > by using a declarative formulation to the BSM which allows the BSM code > > generator to reorganize the chain as a true switch, or shallow cascade of > > switches. BTW, we forgot the switch combinator. Maybe we can fix this > in > > the next release? More indy leftovers! It should be used to compile > switching > > on strings and enums, as well as any future pattern matches on constants. > > It should be optionally lazy and open, which is what I think you are also > > calling for at the end of your message.) > > yes, > switching on enums should be done using the indy too, currently swapping > two constants is not backward compatible because they is maybe a switch > somewhere. > and +1 for adding the switch combinator in 10. I would really like to be > able to tell the JIT that the tests has no side effect (maybe it can be > another combinator too) even if some part of the test are not inlinable. > > > > > The theory of type safety of multiple dispatch has moved forward with > Fortress. > > Alternatively, if you can sugar up a visitor deployment as if it were > multimethods > > added as extensions, you could prove type-safety and still define > apparent methods, > > outside the type capsule. When we get value types visitors will become > cheaper. > > Maybe at that point we can think about doing multi-method sugar that > compiles > > to invisible visitor classes. (I'm not suggesting we do this now!) > > The main issue of the visitor is that you have to add the accept methods > on the class of the hierarchy before being able to use the visitor, this is > equivalent to be able to insert a method or a constant into a class, which > is also equivalent to be able to inject the implementation of a not yet > existing interface. > > > > > Maybe one of our lambda leftovers will help with the problem of > flow-typing. > > switch (val) { case A val: ? } // relax shadowing rule for some > bindings?? > > It's a fraught question, because the anti-shadowing rule prevents > confusions > > which a relaxation re-introduces, if users misuse the freedom of > expression. > > The goal would be to guide users to shadow variables only when the > shadowing > > binding is somehow a logical repeat of the original binding. This > cannot be > > done automatically in all cases we care about. Tricky tradeoffs. > > For lambdas, a lambda is an anonymous function, and even in JavaScript, > opening a function open a new scope. > If the switch as a syntax close to the lambda one (and less close to the > C) it may not be a big deal. > > switch(val) { > A val -> ... > } > > but given that usually you have more that one case, it will rapidly become > unreadable > > switch(val) { > A val -> foo(val) > B val -> foo(val) > ... > } > > > > > > We are also struggling with unapply. There are several hard parts, > including > > the encapsulation model and the API for delivering multiple components. > > The low-level parts probably require more indy-level combinators with > > special JVM optimizations. I wish we had full value types today so we > > could just do tuples for multiple-value packages. But even that is a > > simulation of the true shape of the problem, and simulation has overheads > > even if we stay out of the heap. A lower-level solution which requires > > no companion types at all (not even tuples) would be to reify argument > > lists per se, at the same level of abstraction as method handle types. > > Will it work if the pattern support nested types ? > case A(B(x y), _): ... > > You will have to have a kind of growable arguments list very similar to > the one you need to implement the method handle combinators. > Technically, for the mh combinators you also need to shrink the arguments > list but you can cheat if you are able to extract a sub part when doing the > calls to the direct method handle. > Jerome has used that trick when implementing invokedynamic on android. > > > That's a building block I am currently experimenting with, as a > value-based > > class which can be upgraded to a value type. The type information is > > encoded as (wait for it) a MethodType, interpreted with the arrows > reversed. > > The arrow reversed MethodType is a way to represent a flatten tuple of > types to a named tuple (the value type) with the same component types. > > So in order to represent a tuple of multiple values which are the > structural value of a class, you need a reified tuple of types and you use > a MethodType as a hack. > > Another way to see unapply is to see it as a kind 'maybe tailcall' in a > continuation, > if the pattern doesn't match, it tailcalls and checks the next pattern > otherwise, it drops the unnecessary parameters and calls the action. > In case of embedded component, the action itself is to do a 'maybe > tailcall' after the component values are on stack but if the pattern > doesn't match it now has to pop two stack frames. > > class A { > int value; > B b; > > void unapply(MethodHandle[int, B] match, MethodHandle reject) { // act > as a continuation > if (value == 0) { > reject.invokeExact(); // tail call to the next pattern > } > match.invokeExact(value, b); // insert the argument into the > previous call, calls unapply on b if nested, calls the action otherwise > } > } > > > > > > Thanks for the brain dump! > > > > ? John > > R?mi > -- Andrey Breslav Project Lead of Kotlin JetBrains http://kotlinlang.org/ The Drive to Develop -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Mon Mar 27 12:10:48 2017 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 27 Mar 2017 13:10:48 +0100 Subject: Pattern Matching In-Reply-To: References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> Message-ID: <6301f270-b247-d6b4-648b-b070b236e003@oracle.com> Hi Andrey On 27/03/17 12:04, Andrey Breslav wrote: > 3. And to share a little bit about smart casts (I'm flattered that you > are using our terminology here ;) ): as soon as you have smart casts, > the urge to get intersection types into the language strengthens, > because of cases like this: > > if (x is A) { > if (x is B) { > // x is A&B here > } > } > > We still get away without making intersection types explicit, and will > probably not add them in the very nearest future, but they become a > lot less exotic with smart casts. Interesting point - our current scoping rules would claim that it is an error to introduce a binding for 'x' with type T, if 'x' is already in scope with a type S != T. Do you have any compelling use cases from Kotlin to go down the more precise (and liberal) intersection type path? Cheers Maurizio From forax at univ-mlv.fr Mon Mar 27 13:26:46 2017 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 27 Mar 2017 15:26:46 +0200 (CEST) Subject: Pattern Matching In-Reply-To: References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> Message-ID: <409770778.306509.1490621206014.JavaMail.zimbra@u-pem.fr> Hi Andrey, > De: "Andrey Breslav" > ?: forax at univ-mlv.fr, "John Rose" > Cc: "amber-spec-experts" > Envoy?: Lundi 27 Mars 2017 13:04:36 > Objet: Re: Pattern Matching > My two cents from the Kotlin camp: > 1. We still don't have full structural matching in Kotlin, because smart casts + > unconditional destructuring work well enough for most cases (unless you are a > compiler person, but we realize that such people are a minority :)). Yes, there > are use cases outside compilers, but my point is that smart casts hit the sweet > spot. > Plus, we didn't come up with a syntax for structural matching that would be both > clear (variable usage vs definition) and clean (inserting val's everywhere > makes complex patterns rather scary-looking). > So, my take on this would be: better not do it for now, and wait until it gets > wide adoption in C#, since we have the luxury of someone else being brave and > taking the risks already. yes, one way to do that is to do simple pattern matching for 10 and add structural matching for 11 (see below). > 2. We've been using per-component getter methods for destructuring, which is a > huge performance win compared to Scala's unapply that creates a box on every > (successful) match. it's worst that that is you have more than one component, the return type of the extractor (the de-constructor) is an Option of a tuple, so you have two boxes :( > C++17 has gone roughly the same direction, but leveraged the templates with > values as arguments. In our "open world" with static extension functions having > independent functions for every component raises some questions, but if Java > requires members there, it should be simpler. getters and unapply are two faces of the same coin, unapply allows to share code when processing the extracted values, getters allows to avoid to calculate extracted values if you do not need them (if there are _ in the pattern), you can try to unify them by having an unapply that takes a representation of the variable you want, something like unapply(["x", "z"]), but trying to come with the right signature for unapply, at least until we do not have generics over primitive. As you said, a simpler solution is just to use getters, Java dukes are familiar with them after all. > 3. And to share a little bit about smart casts (I'm flattered that you are using > our terminology here ;) ): as soon as you have smart casts, the urge to get > intersection types into the language strengthens, because of cases like this: > if (x is A) { > if (x is B) { > // x is A&B here > } > } > We still get away without making intersection types explicit, and will probably > not add them in the very nearest future, but they become a lot less exotic with > smart casts. Given the way the pattern matching pattern is constructed, you can not type the same variable more than once, so no intersection types can be created using the pattern matching. About adding 'is' in the language as an instanceof + smart cast, i think it's another issue that is not directly related to the pattern matching. regards, R?mi > On Sun, Mar 19, 2017 at 1:32 AM < forax at univ-mlv.fr > wrote: >> ----- Mail original ----- >> > De: "John Rose" < john.r.rose at oracle.com > >> > ?: "R?mi Forax" < forax at univ-mlv.fr > >> > Cc: "amber-spec-experts" < amber-spec-experts at openjdk.java.net > >> > Envoy?: Samedi 18 Mars 2017 19:50:08 >> > Objet: Re: Pattern Matching >> > On Mar 18, 2017, at 10:22 AM, Remi Forax < forax at univ-mlv.fr > wrote: >> >> Hi guys, >> >> i've already implemented a kind of pattern matching in a language that run on >> >> the JVM (still closed source :( ), >> >> so i've said to Brian that i will send a summary of how it is done. >>> Thanks, Remi. This is useful. At many points it parallels thinking that we have >> > been doing. >>> I'm glad to see you use indy. As with the string-concat work, there is an >> > opportunity to perform >>> run-time optimizations that way, which an eager bytecode spinning approach can't >> > touch. >> yes ! >>> For example, the static compiler can inspect type relations among different case >> > branches, >> > and it can make requirements such as no dead cases, but it's only at runtime >>> that the final type relations are available. And even if the static compiler >> > ignores some type >>> relations, the BSM for the switch can use such information to flatten the >> > decision tree >> > semi-statically. >> yes >>> (My thought ATM is to forbid dead code at compile time but allow it at runtime >> > if >>> the type relations have changed. I think that's in keeping with the way we >> > manage >> > other binary compatibility questions. >> yes, >> we have decided to throw an error at runtime in case of dead code mostly to >> report a compilation bug because we do not have a real IDE/incremental >> compilation. >> > The runtime optimization can ignore the >> > static presuppositions. This means that the semantics of switch need to be >> > "as if" the switch is "really just" a linear decision chain. We save performance >> > by using a declarative formulation to the BSM which allows the BSM code >> > generator to reorganize the chain as a true switch, or shallow cascade of >> > switches. BTW, we forgot the switch combinator. Maybe we can fix this in >> > the next release? More indy leftovers! It should be used to compile switching >> > on strings and enums, as well as any future pattern matches on constants. >> > It should be optionally lazy and open, which is what I think you are also >> > calling for at the end of your message.) >> yes, >> switching on enums should be done using the indy too, currently swapping two >> constants is not backward compatible because they is maybe a switch somewhere. >> and +1 for adding the switch combinator in 10. I would really like to be able to >> tell the JIT that the tests has no side effect (maybe it can be another >> combinator too) even if some part of the test are not inlinable. >> > The theory of type safety of multiple dispatch has moved forward with Fortress. >>> Alternatively, if you can sugar up a visitor deployment as if it were >> > multimethods >>> added as extensions, you could prove type-safety and still define apparent >> > methods, >> > outside the type capsule. When we get value types visitors will become cheaper. >> > Maybe at that point we can think about doing multi-method sugar that compiles >> > to invisible visitor classes. (I'm not suggesting we do this now!) >> The main issue of the visitor is that you have to add the accept methods on the >> class of the hierarchy before being able to use the visitor, this is equivalent >> to be able to insert a method or a constant into a class, which is also >> equivalent to be able to inject the implementation of a not yet existing >> interface. >> > Maybe one of our lambda leftovers will help with the problem of flow-typing. >> > switch (val) { case A val: ? } // relax shadowing rule for some bindings?? >> > It's a fraught question, because the anti-shadowing rule prevents confusions >> > which a relaxation re-introduces, if users misuse the freedom of expression. >> > The goal would be to guide users to shadow variables only when the shadowing >> > binding is somehow a logical repeat of the original binding. This cannot be >> > done automatically in all cases we care about. Tricky tradeoffs. >> For lambdas, a lambda is an anonymous function, and even in JavaScript, opening >> a function open a new scope. >> If the switch as a syntax close to the lambda one (and less close to the C) it >> may not be a big deal. >> switch(val) { >> A val -> ... >> } >> but given that usually you have more that one case, it will rapidly become >> unreadable >> switch(val) { >> A val -> foo(val) >> B val -> foo(val) >> ... >> } >> > We are also struggling with unapply. There are several hard parts, including >> > the encapsulation model and the API for delivering multiple components. >> > The low-level parts probably require more indy-level combinators with >> > special JVM optimizations. I wish we had full value types today so we >> > could just do tuples for multiple-value packages. But even that is a >> > simulation of the true shape of the problem, and simulation has overheads >> > even if we stay out of the heap. A lower-level solution which requires >> > no companion types at all (not even tuples) would be to reify argument >> > lists per se, at the same level of abstraction as method handle types. >> Will it work if the pattern support nested types ? >> case A(B(x y), _): ... >> You will have to have a kind of growable arguments list very similar to the one >> you need to implement the method handle combinators. >> Technically, for the mh combinators you also need to shrink the arguments list >> but you can cheat if you are able to extract a sub part when doing the calls to >> the direct method handle. >> Jerome has used that trick when implementing invokedynamic on android. >> > That's a building block I am currently experimenting with, as a value-based >> > class which can be upgraded to a value type. The type information is >> > encoded as (wait for it) a MethodType, interpreted with the arrows reversed. >> The arrow reversed MethodType is a way to represent a flatten tuple of types to >> a named tuple (the value type) with the same component types. >> So in order to represent a tuple of multiple values which are the structural >> value of a class, you need a reified tuple of types and you use a MethodType as >> a hack. >> Another way to see unapply is to see it as a kind 'maybe tailcall' in a >> continuation, >> if the pattern doesn't match, it tailcalls and checks the next pattern >> otherwise, it drops the unnecessary parameters and calls the action. >> In case of embedded component, the action itself is to do a 'maybe tailcall' >> after the component values are on stack but if the pattern doesn't match it now >> has to pop two stack frames. >> class A { >> int value; >> B b; >> void unapply(MethodHandle[int, B] match, MethodHandle reject) { // act as a >> continuation >> if (value == 0) { >> reject.invokeExact(); // tail call to the next pattern >> } >> match.invokeExact(value, b); // insert the argument into the previous call, >> calls unapply on b if nested, calls the action otherwise >> } >> } >> > Thanks for the brain dump! >> > ? John >> R?mi > -- > Andrey Breslav > Project Lead of Kotlin > JetBrains > http://kotlinlang.org/ > The Drive to Develop -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrey.breslav at jetbrains.com Mon Mar 27 15:06:47 2017 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Mon, 27 Mar 2017 15:06:47 +0000 Subject: Pattern Matching In-Reply-To: <409770778.306509.1490621206014.JavaMail.zimbra@u-pem.fr> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> <409770778.306509.1490621206014.JavaMail.zimbra@u-pem.fr> Message-ID: To clarify, in Kotlin, "is" is the analog of "instanceof" that has smart casts as a "side effect", so my note above may not really apply to the Java case if we consider smart casts for patterns in the context of switch/case only. > Do you have any compelling use cases from Kotlin to go down the more precise (and liberal) intersection type path? In the Kotlin case, where we are basically talking about if's with instanceof/comparison conditions, this happens a lot in all kinds of use cases: - I can check for (a != null) and then somewhere inside check for (a is MyClass). This does not directly apply to Java (unless we envision introduction of nullable types in the future). - In a mixed hierarchy, it may often makes sense to first check for (a is CharSequence) and later ? for (a is List), having methods of both CharSequence and List on a. I can look for exact examples, if they are of interest to you I'd like to note that such requests don't tend cross the user's mind unless smart casts are introduced in the ubiquitous way we have them in Kotlin. So, what I was saying is "smart casts in arbitrary conditions => intersection types become appealing". On Mon, Mar 27, 2017 at 4:26 PM wrote: > Hi Andrey, > > ------------------------------ > > *De: *"Andrey Breslav" > *?: *forax at univ-mlv.fr, "John Rose" > *Cc: *"amber-spec-experts" > *Envoy?: *Lundi 27 Mars 2017 13:04:36 > *Objet: *Re: Pattern Matching > > My two cents from the Kotlin camp: > > 1. We still don't have full structural matching in Kotlin, because smart > casts + unconditional destructuring work well enough for most cases (unless > you are a compiler person, but we realize that such people are a minority > :)). Yes, there are use cases outside compilers, but my point is that smart > casts hit the sweet spot. > Plus, we didn't come up with a syntax for structural matching that would > be both clear (variable usage vs definition) and clean (inserting val's > everywhere makes complex patterns rather scary-looking). > So, my take on this would be: better not do it for now, and wait until it > gets wide adoption in C#, since we have the luxury of someone else being > brave and taking the risks already. > > > yes, one way to do that is to do simple pattern matching for 10 and add > structural matching for 11 (see below). > > > 2. We've been using per-component getter methods for destructuring, which > is a huge performance win compared to Scala's unapply that creates a box on > every (successful) match. > > > it's worst that that is you have more than one component, the return type > of the extractor (the de-constructor) is an Option of a tuple, so you have > two boxes :( > > C++17 has gone roughly the same direction, but leveraged the templates > with values as arguments. In our "open world" with static extension > functions having independent functions for every component raises some > questions, but if Java requires members there, it should be simpler. > > > getters and unapply are two faces of the same coin, > unapply allows to share code when processing the extracted values, getters > allows to avoid to calculate extracted values if you do not need them (if > there are _ in the pattern), > you can try to unify them by having an unapply that takes a representation > of the variable you want, something like unapply(["x", "z"]), but trying to > come with the right signature for unapply, at least until we do not have > generics over primitive. > > As you said, a simpler solution is just to use getters, Java dukes are > familiar with them after all. > > > 3. And to share a little bit about smart casts (I'm flattered that you are > using our terminology here ;) ): as soon as you have smart casts, the urge > to get intersection types into the language strengthens, because of cases > like this: > > if (x is A) { > if (x is B) { > // x is A&B here > } > } > > We still get away without making intersection types explicit, and will > probably not add them in the very nearest future, but they become a lot > less exotic with smart casts. > > > Given the way the pattern matching pattern is constructed, you can not > type the same variable more than once, so no intersection types can be > created using the pattern matching. > About adding 'is' in the language as an instanceof + smart cast, i think > it's another issue that is not directly related to the pattern matching. > > regards, > R?mi > > > On Sun, Mar 19, 2017 at 1:32 AM wrote: > > ----- Mail original ----- > > De: "John Rose" > > ?: "R?mi Forax" > > Cc: "amber-spec-experts" > > Envoy?: Samedi 18 Mars 2017 19:50:08 > > Objet: Re: Pattern Matching > > > On Mar 18, 2017, at 10:22 AM, Remi Forax wrote: > >> > >> Hi guys, > >> i've already implemented a kind of pattern matching in a language that > run on > >> the JVM (still closed source :( ), > >> so i've said to Brian that i will send a summary of how it is done. > > > > Thanks, Remi. This is useful. At many points it parallels thinking that > we have been doing. > > > > I'm glad to see you use indy. As with the string-concat work, there is > an opportunity to perform > > run-time optimizations that way, which an eager bytecode spinning > approach can't touch. > > yes ! > > > For example, the static compiler can inspect type relations among > different case branches, > > and it can make requirements such as no dead cases, but it's only at > runtime > > that the final type relations are available. And even if the static > compiler ignores some type > > relations, the BSM for the switch can use such information to flatten > the decision tree > > semi-statically. > > yes > > > > > (My thought ATM is to forbid dead code at compile time but allow it at > runtime if > > the type relations have changed. I think that's in keeping with the way > we manage > > other binary compatibility questions. > > yes, > we have decided to throw an error at runtime in case of dead code mostly > to report a compilation bug because we do not have a real IDE/incremental > compilation. > > > The runtime optimization can ignore the > > static presuppositions. This means that the semantics of switch need to > be > > "as if" the switch is "really just" a linear decision chain. We save > performance > > by using a declarative formulation to the BSM which allows the BSM code > > generator to reorganize the chain as a true switch, or shallow cascade of > > switches. BTW, we forgot the switch combinator. Maybe we can fix this > in > > the next release? More indy leftovers! It should be used to compile > switching > > on strings and enums, as well as any future pattern matches on constants. > > It should be optionally lazy and open, which is what I think you are also > > calling for at the end of your message.) > > yes, > switching on enums should be done using the indy too, currently swapping > two constants is not backward compatible because they is maybe a switch > somewhere. > and +1 for adding the switch combinator in 10. I would really like to be > able to tell the JIT that the tests has no side effect (maybe it can be > another combinator too) even if some part of the test are not inlinable. > > > > > The theory of type safety of multiple dispatch has moved forward with > Fortress. > > Alternatively, if you can sugar up a visitor deployment as if it were > multimethods > > added as extensions, you could prove type-safety and still define > apparent methods, > > outside the type capsule. When we get value types visitors will become > cheaper. > > Maybe at that point we can think about doing multi-method sugar that > compiles > > to invisible visitor classes. (I'm not suggesting we do this now!) > > The main issue of the visitor is that you have to add the accept methods > on the class of the hierarchy before being able to use the visitor, this is > equivalent to be able to insert a method or a constant into a class, which > is also equivalent to be able to inject the implementation of a not yet > existing interface. > > > > > Maybe one of our lambda leftovers will help with the problem of > flow-typing. > > switch (val) { case A val: ? } // relax shadowing rule for some > bindings?? > > It's a fraught question, because the anti-shadowing rule prevents > confusions > > which a relaxation re-introduces, if users misuse the freedom of > expression. > > The goal would be to guide users to shadow variables only when the > shadowing > > binding is somehow a logical repeat of the original binding. This > cannot be > > done automatically in all cases we care about. Tricky tradeoffs. > > For lambdas, a lambda is an anonymous function, and even in JavaScript, > opening a function open a new scope. > If the switch as a syntax close to the lambda one (and less close to the > C) it may not be a big deal. > > switch(val) { > A val -> ... > } > > but given that usually you have more that one case, it will rapidly become > unreadable > > switch(val) { > A val -> foo(val) > B val -> foo(val) > ... > } > > > > > > We are also struggling with unapply. There are several hard parts, > including > > the encapsulation model and the API for delivering multiple components. > > The low-level parts probably require more indy-level combinators with > > special JVM optimizations. I wish we had full value types today so we > > could just do tuples for multiple-value packages. But even that is a > > simulation of the true shape of the problem, and simulation has overheads > > even if we stay out of the heap. A lower-level solution which requires > > no companion types at all (not even tuples) would be to reify argument > > lists per se, at the same level of abstraction as method handle types. > > Will it work if the pattern support nested types ? > case A(B(x y), _): ... > > You will have to have a kind of growable arguments list very similar to > the one you need to implement the method handle combinators. > Technically, for the mh combinators you also need to shrink the arguments > list but you can cheat if you are able to extract a sub part when doing the > calls to the direct method handle. > Jerome has used that trick when implementing invokedynamic on android. > > > That's a building block I am currently experimenting with, as a > value-based > > class which can be upgraded to a value type. The type information is > > encoded as (wait for it) a MethodType, interpreted with the arrows > reversed. > > The arrow reversed MethodType is a way to represent a flatten tuple of > types to a named tuple (the value type) with the same component types. > > So in order to represent a tuple of multiple values which are the > structural value of a class, you need a reified tuple of types and you use > a MethodType as a hack. > > Another way to see unapply is to see it as a kind 'maybe tailcall' in a > continuation, > if the pattern doesn't match, it tailcalls and checks the next pattern > otherwise, it drops the unnecessary parameters and calls the action. > In case of embedded component, the action itself is to do a 'maybe > tailcall' after the component values are on stack but if the pattern > doesn't match it now has to pop two stack frames. > > class A { > int value; > B b; > > void unapply(MethodHandle[int, B] match, MethodHandle reject) { // act > as a continuation > if (value == 0) { > reject.invokeExact(); // tail call to the next pattern > } > match.invokeExact(value, b); // insert the argument into the > previous call, calls unapply on b if nested, calls the action otherwise > } > } > > > > > > Thanks for the brain dump! > > > > ? John > > R?mi > > -- > Andrey Breslav > Project Lead of Kotlin > JetBrains > http://kotlinlang.org/ > The Drive to Develop > > -- Andrey Breslav Project Lead of Kotlin JetBrains http://kotlinlang.org/ The Drive to Develop -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Mon Mar 27 17:02:01 2017 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 27 Mar 2017 18:02:01 +0100 Subject: Pattern Matching In-Reply-To: References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> <409770778.306509.1490621206014.JavaMail.zimbra@u-pem.fr> Message-ID: On 27/03/17 16:06, Andrey Breslav wrote: > - I can check for (a != null) and then somewhere inside check for (a > is MyClass). This does not directly apply to Java (unless we envision > introduction of nullable types in the future). > - In a mixed hierarchy, it may often makes sense to first check for (a > is CharSequence) and later ? for (a is List), having methods of both > CharSequence and List on a. I can look for exact examples, if they are > of interest to you Thanks - that answer my question :-) Maurizio -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Mon Mar 27 21:39:23 2017 From: john.r.rose at oracle.com (John Rose) Date: Mon, 27 Mar 2017 14:39:23 -0700 Subject: Pattern Matching In-Reply-To: <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> Message-ID: <1ECE266D-C8D4-4504-B542-94A505599096@oracle.com> On Mar 18, 2017, at 3:32 PM, forax at univ-mlv.fr wrote: > > The main issue of the visitor is that you have to add the accept methods on the class of the hierarchy before being able to use the visitor, this is equivalent to be able to insert a method or a constant into a class, which is also equivalent to be able to inject the implementation of a not yet existing interface. Good point. Some form of post-facto interface injection (if we could figure out the details, which is very hard) would presumably address this problem. The issue of visitors and matchers is important because if we introduce a new kind of class (data class, record class, whatever) with enhanced pattern capabilities, we have basically one chance to define a universal pattern match interface for that kind of class. (We could add it in after first release, but it's hard to add more than once.) Here's a second point of similar weight: The interface itself has parts which are signature-polymorphic. If you try to represent it as a classic Java interface you can see that the polymorphism causes boxing: interface Matchable { R match(); } Whatever the API structure is for match patterns and results, the result eventually has to deliver a tuple of extracted values. But there is no good way (yet, until value types and any-generics) to express the type of a tuple. So we get List, etc. The closest we can get to a tuple type in the JVM is an argument list type, reified as a MethodType and accepted by a MethodHandle. Therefore, I think a workable "Matchable" API can involve method handles and be type-classified by MethodTypes (returning a conventional void result). As a first cut: interface Matchable> { boolean match(MethodHandle collector); MT matchType(); } (The pattern part of the match is left out for clarity. You can put it back in easily enough as another argument to the "match" call. Maybe match is overloaded by pattern kind.) The type variable decorations are ill-defined and have to be stripped out of the real code. Second cut: interface Matchable> { BUF match(); // returns null on failure, buffered match-values on success R matchExtract(BUF, MethodHandle> collector); MT matchType(); } The extract calls either look into the match-result buffer for the required match components, or (as an optimization) might look directly into the object fields, if it is safe to do so. A third cut might break the mold completely (of a classic interface) and present the Matchable API as a statically linkable bundle of method handles, one bundle per match API binding (i.e., per concrete class). The bundle would look like: interface Extractor> { MethodHandle matchHandle(); // null on failure, T or other value on success MethodHandle componentHandle(int i); // extract one of the A values MT matchType(); Class targetType(); Class bufferType(); } You could omit the BUF type completely, but there is a big cost: There is no way for the T object to deliver a tuple of types apart from being willing at any moment to be the subject of an accessor call. Those accessor calls will need in general to do redundant calculations and are subject to race conditions which might make the match disappear before the components were extracted. The presence of the T type (alongside BUF) in the component handles allows an object with immutable fields to deliver those particular values by direct access, instead of copying them through a buffer object. The BUF type is secret to the implementation of T. You can use an extractor without knowing it except via a wildcard. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrey.breslav at jetbrains.com Tue Mar 28 08:04:57 2017 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Tue, 28 Mar 2017 08:04:57 +0000 Subject: Pattern Matching In-Reply-To: <1ECE266D-C8D4-4504-B542-94A505599096@oracle.com> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> <1ECE266D-C8D4-4504-B542-94A505599096@oracle.com> Message-ID: For completeness, since Remi aimed at listing all alternatives, I think we should not forget Object algebras: https://www.cs.utexas.edu/~wcook/Drafts/2012/ecoop2012.pdf My experience has been that this pattern is good for observing data structures (transforming, pretty-printing, etc), but it's usually hard to implement, say, equals() through it. On Tue, Mar 28, 2017 at 12:40 AM John Rose wrote: > On Mar 18, 2017, at 3:32 PM, forax at univ-mlv.fr wrote: > > > The main issue of the visitor is that you have to add the accept methods > on the class of the hierarchy before being able to use the visitor, this is > equivalent to be able to insert a method or a constant into a class, which > is also equivalent to be able to inject the implementation of a not yet > existing interface. > > > Good point. Some form of post-facto interface injection (if we could > figure > out the details, which is very hard) would presumably address this problem. > > The issue of visitors and matchers is important because if we introduce > a new kind of class (data class, record class, whatever) with enhanced > pattern capabilities, we have basically one chance to define a universal > pattern match interface for that kind of class. (We could add it in after > first release, but it's hard to add more than once.) > > Here's a second point of similar weight: The interface itself has parts > which are signature-polymorphic. If you try to represent it as a classic > Java interface you can see that the polymorphism causes boxing: > > interface Matchable { > R match(); > } > > Whatever the API structure is for match patterns and results, > the result eventually has to deliver a tuple of extracted values. > But there is no good way (yet, until value types and any-generics) > to express the type of a tuple. So we get List, etc. > > The closest we can get to a tuple type in the JVM is an argument > list type, reified as a MethodType and accepted by a MethodHandle. > Therefore, I think a workable "Matchable" API can involve method > handles and be type-classified by MethodTypes (returning a > conventional void result). > > As a first cut: > > interface Matchable> { > boolean match(MethodHandle collector); > MT matchType(); > } > > (The pattern part of the match is left out for clarity. > You can put it back in easily enough as another argument > to the "match" call. Maybe match is overloaded by pattern > kind.) > > The type variable decorations are ill-defined and have to > be stripped out of the real code. > > Second cut: > > interface Matchable> { > BUF match(); // returns null on failure, buffered match-values on > success > R matchExtract(BUF, MethodHandle> collector); > MT matchType(); > } > > The extract calls either look into the match-result buffer > for the required match components, or (as an optimization) > might look directly into the object fields, if it is safe to do so. > > A third cut might break the mold completely (of a classic > interface) and present the Matchable API as a statically > linkable bundle of method handles, one bundle per match > API binding (i.e., per concrete class). The bundle would > look like: > > interface Extractor> { > MethodHandle matchHandle(); // null on failure, T or other > value on success > MethodHandle componentHandle(int i); // extract one of the A > values > MT matchType(); > Class targetType(); > Class bufferType(); > } > > You could omit the BUF type completely, but there is a big cost: > There is no way for the T object to deliver a tuple of types apart from > being willing at any moment to be the subject of an accessor call. > Those accessor calls will need in general to do redundant calculations > and are subject to race conditions which might make the match > disappear before the components were extracted. > > The presence of the T type (alongside BUF) in the component > handles allows an object with immutable fields to deliver those > particular values by direct access, instead of copying them > through a buffer object. > > The BUF type is secret to the implementation of T. You can > use an extractor without knowing it except via a wildcard. > > ? John > -- Andrey Breslav Project Lead of Kotlin JetBrains http://kotlinlang.org/ The Drive to Develop -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Mar 28 15:42:19 2017 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 28 Mar 2017 17:42:19 +0200 (CEST) Subject: Pattern Matching In-Reply-To: <1ECE266D-C8D4-4504-B542-94A505599096@oracle.com> References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> <1ECE266D-C8D4-4504-B542-94A505599096@oracle.com> Message-ID: <2122555469.996329.1490715739189.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "R?mi Forax" > Cc: "amber-spec-experts" > Envoy?: Lundi 27 Mars 2017 23:39:23 > Objet: Re: Pattern Matching > On Mar 18, 2017, at 3:32 PM, forax at univ-mlv.fr wrote: >> The main issue of the visitor is that you have to add the accept methods on the >> class of the hierarchy before being able to use the visitor, this is equivalent >> to be able to insert a method or a constant into a class, which is also >> equivalent to be able to inject the implementation of a not yet existing >> interface. > Good point. Some form of post-facto interface injection (if we could figure > out the details, which is very hard) would presumably address this problem. > The issue of visitors and matchers is important because if we introduce > a new kind of class (data class, record class, whatever) with enhanced > pattern capabilities, we have basically one chance to define a universal > pattern match interface for that kind of class. (We could add it in after > first release, but it's hard to add more than once.) I think should can try to come with extractors/de-constructors for 10 but if we are not satisfied, we can support getters as a backup strategy (if class is immutable) and still introduce extractors in 11. > Here's a second point of similar weight: The interface itself has parts > which are signature-polymorphic. If you try to represent it as a classic > Java interface you can see that the polymorphism causes boxing: > interface Matchable { > R match(); > } > Whatever the API structure is for match patterns and results, > the result eventually has to deliver a tuple of extracted values. > But there is no good way (yet, until value types and any-generics) > to express the type of a tuple. So we get List, etc. yes, very true. if Optional and tuples are in the language, writing an API for extractor is easy, it's the unapply of Scala. As you already said, it still requires to compute the value of an extracted value even if the value will be ignored by a '_'. > The closest we can get to a tuple type in the JVM is an argument > list type, reified as a MethodType and accepted by a MethodHandle. > Therefore, I think a workable "Matchable" API can involve method > handles and be type-classified by MethodTypes (returning a > conventional void result). > As a first cut: > interface Matchable> { > boolean match(MethodHandle collector); > MT matchType(); > } > (The pattern part of the match is left out for clarity. > You can put it back in easily enough as another argument > to the "match" call. Maybe match is overloaded by pattern > kind.) This is the part i do not like with the unapply, unapply is a mix of an extractor, i.e. something that decompose an object into several components and at the same time something that checks if the pattern is accepted. My gut feeling is that we should only provide a deconstructor, the pattern recognition should be encoded in term of component values thus independent of a peculiar object once it has be decomposed. By example with: int j = ... switch(i) { case A(B(int x, _), j) : ... } A should be decomposed using the extractor and the second component value should be compared with the value of j, the comparison should be installed by the bootstrap method, independently of the way the extractor is written. > The type variable decorations are ill-defined and have to > be stripped out of the real code. > Second cut: > interface Matchable> { > BUF match(); // returns null on failure, buffered match-values on success > R matchExtract(BUF, MethodHandle> collector); > MT matchType(); > } > The extract calls either look into the match-result buffer > for the required match components, or (as an optimization) > might look directly into the object fields, if it is safe to do so. > A third cut might break the mold completely (of a classic > interface) and present the Matchable API as a statically > linkable bundle of method handles, one bundle per match > API binding (i.e., per concrete class). The bundle would > look like: > interface Extractor> { > MethodHandle matchHandle(); // null on failure, T or other value on > success > MethodHandle componentHandle(int i); // extract one of the A values > MT matchType(); > Class targetType(); > Class bufferType(); > } > You could omit the BUF type completely, but there is a big cost: > There is no way for the T object to deliver a tuple of types apart from > being willing at any moment to be the subject of an accessor call. > Those accessor calls will need in general to do redundant calculations > and are subject to race conditions which might make the match > disappear before the components were extracted. > The presence of the T type (alongside BUF) in the component > handles allows an object with immutable fields to deliver those > particular values by direct access, instead of copying them > through a buffer object. > The BUF type is secret to the implementation of T. You can > use an extractor without knowing it except via a wildcard. I still think you do not need so only for the extractor part, what a user can write is data class A(B b, int y) { public extractor B, int () { return b, y; } } and the compiler can transform it to // attribute data class class A { private final B b; private final int y; ... // attribute: components: B, int // attribute: extractor, synthetic? public Empty () { return invokedynamic (B, int)Empty (b, y); } } the return value is a value type dynamically created, an anonymous value type?, maybe masqueraded by an empty interface (Empty), allocated by invokedynamic and that can not be called by anything else than a MethodHandle (from the bootstrap method of the switch) that will de-structure the value type as the component values of the call to the implementation part of the pattern matching. > ? John R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Mar 29 22:01:05 2017 From: john.r.rose at oracle.com (John Rose) Date: Wed, 29 Mar 2017 15:01:05 -0700 Subject: Pattern Matching In-Reply-To: References: <1397250216.1674494.1489857722330.JavaMail.zimbra@u-pem.fr> <60D628EB-CC65-4F11-89FB-AD6AFBC02241@oracle.com> <935045536.1688361.1489876324086.JavaMail.zimbra@u-pem.fr> <1ECE266D-C8D4-4504-B542-94A505599096@oracle.com> Message-ID: On Mar 28, 2017, at 1:04 AM, Andrey Breslav wrote: > > For completeness, since Remi aimed at listing all alternatives, I think we should not forget Object algebras: https://www.cs.utexas.edu/~wcook/Drafts/2012/ecoop2012.pdf > > My experience has been that this pattern is good for observing data structures (transforming, pretty-printing, etc), but it's usually hard to implement, say, equals() through it Thanks for the reference. Cook's work on batches is probably relevant. There is something deep going on here, and that paper shines some light on it. I hope we can connect Java deeply to the sorts of patterns they talk about. Since matching is not (yet, officially) on the table for Amber, it would be premature to use this venue to say much more. One teaser: I hope to get some time to write about what I think of as "metamorphic programming", or coordinated hooks for the "tear down" and "build up" of data. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.smith at oracle.com Wed Mar 29 23:12:53 2017 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 29 Mar 2017 17:12:53 -0600 Subject: Spec draft for JEP 286 Local Variable Type Inference Message-ID: Hello, all. Please see below for a draft spec of language changes to support local variable type inference ('var'). If you haven't read it already, you'll want to look at the proposal [1] and Brian's summary of design considerations and community feedback (sent to platform-jep-discuss) [2]. With that context, here's the spec document: http://cr.openjdk.java.net/~dlsmith/local-var-inference.html I'll follow up with some supplementary background on design considerations, beyond what's covered by the platform-jep-discuss email. Feel free to respond with your feedback. Thanks, Dan [1]: http://openjdk.java.net/jeps/286 [2]: http://mail.openjdk.java.net/pipermail/platform-jep-discuss/2016-December/000066.html From forax at univ-mlv.fr Wed Mar 29 23:58:33 2017 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 30 Mar 2017 01:58:33 +0200 (CEST) Subject: Spec draft for JEP 286 Local Variable Type Inference In-Reply-To: References: Message-ID: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> Hi Dan, Looks great ! In 14.4.1, a minor remark, i had to look for what standalone expression is, not poly expression seems better for me so i suggest that Because the initializer is a standalone expression, an error occurs if it is a lambda expression (15.27) or a method reference expression (15.13). can be Because the initializer is not a poly expression, an error occurs if it is a lambda expression (15.27) or a method reference expression (15.13). So a var initialized by an anonymous class with diamond is not supported because it's not a poly expression (but its type is denotable) and an anonymous class with type arguments is not supported because its type is not denotable. Given that without var, when the type of an anonymous class with a diamond is inferred, the resulting type is the super type, i think you can relax the current rule to allow var to be initialized with an anonymous class (with no diamond), the resulting type will be the super type. regards, R?mi ----- Mail original ----- > De: "Dan Smith" > ?: "amber-spec-experts" > Envoy?: Jeudi 30 Mars 2017 01:12:53 > Objet: Spec draft for JEP 286 Local Variable Type Inference > Hello, all. Please see below for a draft spec of language changes to support > local variable type inference ('var'). > > If you haven't read it already, you'll want to look at the proposal [1] and > Brian's summary of design considerations and community feedback (sent to > platform-jep-discuss) [2]. > > With that context, here's the spec document: > http://cr.openjdk.java.net/~dlsmith/local-var-inference.html > > I'll follow up with some supplementary background on design considerations, > beyond what's covered by the platform-jep-discuss email. > > Feel free to respond with your feedback. > > Thanks, > Dan > > [1]: http://openjdk.java.net/jeps/286 > [2]: > http://mail.openjdk.java.net/pipermail/platform-jep-discuss/2016-December/000066.html From daniel.smith at oracle.com Thu Mar 30 16:53:30 2017 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 30 Mar 2017 10:53:30 -0600 Subject: Spec draft for JEP 286 Local Variable Type Inference In-Reply-To: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> References: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> Message-ID: <50FD0AAF-3FD1-426C-9C29-CD863A8EEA16@oracle.com> > On Mar 29, 2017, at 5:58 PM, Remi Forax wrote: > > Hi Dan, > Looks great ! > > In 14.4.1, a minor remark, i had to look for what standalone expression is, not poly expression seems better for me so i suggest that > Because the initializer is a standalone expression, an error occurs if it is a lambda expression (15.27) or a method reference expression (15.13). > can be > Because the initializer is not a poly expression, an error occurs if it is a lambda expression (15.27) or a method reference expression (15.13). Thanks for raising this. I don't like either "is a standalone expression" or "is not a poly expression", because lambda expressions are *always* poly expressions. I guess what I want to say is this: "Because the initializer is treated as if it did not appear in an assignment context, an error occurs if it is a lambda expression or a method reference expression." > So a var initialized by an anonymous class with diamond is not supported because it's not a poly expression (but its type is denotable) and > an anonymous class with type arguments is not supported because its type is not denotable. No. There is not a requirement that diamond class instance creation expressions (for anonymous classes or otherwise) be poly expressions. It's perfectly fine to use one of these in a context that has no target type. > Given that without var, when the type of an anonymous class with a diamond is inferred, the resulting type is the super type, > i think you can relax the current rule to allow var to be initialized with an anonymous class (with no diamond), the resulting type will be the super type. I'll have more to say about the treatment of non-denoteable types, but a couple of corrections: - The type of an anonymous class instance creation expression, diamond or not, is always a non-denoteable class type. There's an anomaly in which *type inference* can't use that type and uses the superclass/interface instead, but that doesn't change the ultimate type of the expression. - The likely treatment of all anonymous instance creation expressions that are 'var' initializers, diamond or not, is one of 1) use the non-denoteable anonymous class type, or 2) use the class's immediate superclass/superinterface. ?Dan From forax at univ-mlv.fr Thu Mar 30 18:24:34 2017 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 30 Mar 2017 20:24:34 +0200 (CEST) Subject: Spec draft for JEP 286 Local Variable Type Inference In-Reply-To: <50FD0AAF-3FD1-426C-9C29-CD863A8EEA16@oracle.com> References: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> <50FD0AAF-3FD1-426C-9C29-CD863A8EEA16@oracle.com> Message-ID: <1138334629.1909649.1490898274908.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Dan Smith" > ?: "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Jeudi 30 Mars 2017 18:53:30 > Objet: Re: Spec draft for JEP 286 Local Variable Type Inference >> On Mar 29, 2017, at 5:58 PM, Remi Forax wrote: >> >> Hi Dan, >> Looks great ! >> >> In 14.4.1, a minor remark, i had to look for what standalone expression is, not >> poly expression seems better for me so i suggest that >> Because the initializer is a standalone expression, an error occurs if it is a >> lambda expression (15.27) or a method reference expression (15.13). >> can be >> Because the initializer is not a poly expression, an error occurs if it is a >> lambda expression (15.27) or a method reference expression (15.13). > > Thanks for raising this. I don't like either "is a standalone expression" or "is > not a poly expression", because lambda expressions are *always* poly > expressions. I guess what I want to say is this: > > "Because the initializer is treated as if it did not appear in an assignment > context, an error occurs if it is a lambda expression or a method reference > expression." yes, better. > >> So a var initialized by an anonymous class with diamond is not supported because >> it's not a poly expression (but its type is denotable) and >> an anonymous class with type arguments is not supported because its type is not >> denotable. > > No. There is not a requirement that diamond class instance creation expressions > (for anonymous classes or otherwise) be poly expressions. It's perfectly fine > to use one of these in a context that has no target type. Ok, my bad, an expression can be right hand side context sensitive without being a poly expression. > >> Given that without var, when the type of an anonymous class with a diamond is >> inferred, the resulting type is the super type, >> i think you can relax the current rule to allow var to be initialized with an >> anonymous class (with no diamond), the resulting type will be the super type. > > I'll have more to say about the treatment of non-denoteable types, but a couple > of corrections: > > - The type of an anonymous class instance creation expression, diamond or not, > is always a non-denoteable class type. There's an anomaly in which *type > inference* can't use that type and uses the superclass/interface instead, but > that doesn't change the ultimate type of the expression. Ok, got it, only type inference uses the supertype. > > - The likely treatment of all anonymous instance creation expressions that are > 'var' initializers, diamond or not, is one of 1) use the non-denoteable > anonymous class type, or 2) use the class's immediate > superclass/superinterface. I agree. Having a var that uses a non denotable type seems wrong to me, showing/hiding the type of a var should be a valid refactoring in any cases, IMO. So i vote for (2). > > ?Dan R?mi From daniel.smith at oracle.com Thu Mar 30 18:56:58 2017 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 30 Mar 2017 12:56:58 -0600 Subject: Spec draft for JEP 286 Local Variable Type Inference In-Reply-To: <1138334629.1909649.1490898274908.JavaMail.zimbra@u-pem.fr> References: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> <50FD0AAF-3FD1-426C-9C29-CD863A8EEA16@oracle.com> <1138334629.1909649.1490898274908.JavaMail.zimbra@u-pem.fr> Message-ID: > On Mar 30, 2017, at 12:24 PM, forax at univ-mlv.fr wrote: > >> No. There is not a requirement that diamond class instance creation expressions >> (for anonymous classes or otherwise) be poly expressions. It's perfectly fine >> to use one of these in a context that has no target type. > > Ok, my bad, an expression can be right hand side context sensitive without being a poly expression. More accurately (see JLS 15.2), a poly expression is an expression that: 1) Is of an appropriate syntactic form (like a class instance creation expression), 2) Appears in an appropriate context (like an assignment), and 3) Has certain properties, specific to its form, that make it context-sensitive (like using diamond) ?Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Mar 30 19:00:09 2017 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 30 Mar 2017 15:00:09 -0400 Subject: Spec draft for JEP 286 Local Variable Type Inference In-Reply-To: <1138334629.1909649.1490898274908.JavaMail.zimbra@u-pem.fr> References: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> <50FD0AAF-3FD1-426C-9C29-CD863A8EEA16@oracle.com> <1138334629.1909649.1490898274908.JavaMail.zimbra@u-pem.fr> Message-ID: <67ed0fd7-7cbb-b124-9b6d-3f6edfdb565e@oracle.com> We're not voting yet -- we haven't even explained the issues yet :) The issue of non-denotable types is where all the complexity (and opportunity to get it wrong) in this feature lives. Dan will soon post some examples that hopefully will illustrate why both "just don't infer them, make the user say what they mean" and "just infer them, they're types" -- as "simple" and consistent as both of these seem -- are both extreme (and/or naive) positions. (FWIW, initially I was in the "just don't infer" camp too; the attraction of that is that every program with `var` corresponds to an equivalent program with `var`. But the number of times where inference produces a capture or intersection is surprisingly high, and it will absolutely be perceived as "that stupid Java compiler, can't they just tell that..." Additionally, users will perceive the "penalty" of inference failure as messing up how their code prettily lines up -- and likely will seek to distort their code to avoid this aesthetic fail.) On 3/30/2017 2:24 PM, forax at univ-mlv.fr wrote: > Having a var that uses a non denotable type seems wrong to me, showing/hiding the type of a var should be a valid refactoring in any cases, IMO. > So i vote for (2). From forax at univ-mlv.fr Fri Mar 31 16:25:06 2017 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 31 Mar 2017 18:25:06 +0200 (CEST) Subject: Spec draft for JEP 286 Local Variable Type Inference In-Reply-To: <67ed0fd7-7cbb-b124-9b6d-3f6edfdb565e@oracle.com> References: <843158725.1510959.1490831913910.JavaMail.zimbra@u-pem.fr> <50FD0AAF-3FD1-426C-9C29-CD863A8EEA16@oracle.com> <1138334629.1909649.1490898274908.JavaMail.zimbra@u-pem.fr> <67ed0fd7-7cbb-b124-9b6d-3f6edfdb565e@oracle.com> Message-ID: <445070358.2384198.1490977506688.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: forax at univ-mlv.fr, "Dan Smith" > Cc: "amber-spec-experts" > Envoy?: Jeudi 30 Mars 2017 21:00:09 > Objet: Re: Spec draft for JEP 286 Local Variable Type Inference > We're not voting yet -- we haven't even explained the issues yet :) :) > > The issue of non-denotable types is where all the complexity (and > opportunity to get it wrong) in this feature lives. Dan will soon post > some examples that hopefully will illustrate why both "just don't infer > them, make the user say what they mean" and "just infer them, they're > types" -- as "simple" and consistent as both of these seem -- are both > extreme (and/or naive) positions. There is a 3rd case, the inference give you a non-denotable types but you can lower it down to a compatible denotable type. In that case, - something will be inferred - it's denotable so you can substitute var by the inferred type and it will work. Examples of non-denotable type that can be lowered (i'm not saying we should use this rule, those are just examples) - anonymous class -> super type - intersection type -> first type (different from Object) - null -> Object - capture -> use the bound Luke etc. > > (FWIW, initially I was in the "just don't infer" camp too; the > attraction of that is that every program with `var` corresponds to an > equivalent program with `var`. But the number of times where inference > produces a capture or intersection is surprisingly high, and it will > absolutely be perceived as "that stupid Java compiler, can't they just > tell that..." Additionally, users will perceive the "penalty" of > inference failure as messing up how their code prettily lines up -- and > likely will seek to distort their code to avoid this aesthetic fail.) I agree, but you do not have to infer a non-denotable type and stick with it. regards, R?mi > > On 3/30/2017 2:24 PM, forax at univ-mlv.fr wrote: >> Having a var that uses a non denotable type seems wrong to me, showing/hiding >> the type of a var should be a valid refactoring in any cases, IMO. > > So i vote for (2). From daniel.smith at oracle.com Fri Mar 31 23:39:00 2017 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 31 Mar 2017 17:39:00 -0600 Subject: [lvti] Handling of capture variables Message-ID: <488ABD21-11DC-4E4E-8462-04C4F76BB50D@oracle.com> As described in the JSR 286 spec document, inferring the type of a local variable to be a non-denotable type (one that can't be written in source) is something to be careful about, due to "potential for confusion, bad error messages, or added exposure to bugs". The most significant area here (in terms of likely frequency) is the presence of capture variables in the type. I did some analysis of the Java SE APIs to identify and illustrate problematic cases. == Case 1: wildcard-parameterized return type == Any method (or field) that returns a wildcard-parameterized type will produce a non-denotable type on invocation, because the return type must be captured (JLS 15.12.3). var myClass = getClass(); var c = Class.forName("java.lang.Object"); var sup = String.class.getSuperclass(); var entries = new ZipFile("/etc/filename.zip").entries(); var joiner = Collectors.joining(" - \n", "", ""); var plusCollector = Collectors.reducing(BigInteger.ZERO, BigInteger::add); var future = Executors.newCachedThreadPool().submit(System::gc); void m(MethodType type) { var ret = type.returnType(); } void m(TreeSet set) { var comparator = set.comparator(); } void m(Annotation ann) { var annClass = ann.annotationType(); } void m(ReferenceQueue queue) { var stringRef = queue.poll(); } Using wildcards in a return type is sometimes discouraged, but other times it's the right thing to do. So while I wouldn't say these methods are pervasive, there are quite a few of them (especially where the common idiom is to almost always use a wildcard, as in Class and Collector). There are no capture variables present for methods that return arrays, lists, etc., of wildcard-parameterized types, because capture doesn't touch those nested wildcards: void m(MethodType type) { var params = type.parameterArray(); } void m(MethodType type) { var params = type.parameterList(); } == Case 2: instance method returning a class type parameter == A method (or field) whose return type is a class type parameter will produce a capture variable when invoked for a wildcard-parameterized type. void m(Class c) throws Exception { var runnable = c.newInstance(); } void m(Map map) { var e = map.get("some.key"); } void m(List> sets) { var first = sets.get(0); } Object find(Collection coll, Object o) { for (var elt : coll) { if (elt.equals(o)) return elt; } return null; } void m(Optional opt) { var num = opt.get(); } void m(IntFunction f) { var reader = f.apply(14); } void m(Future future) { var entry = future.get(10, TimeUnit.SECONDS); } If you substitute a wildcard-parameterized type into the return type, that also leads to capture: void m(List> list) { var set = list.get(0); } This is true for for-each, too (for now, javac fails to perform capture correctly, so you don't see this in the prototype): void m(List> list) { for (var set : list) set.clear(); } == Method category 3: instance method returning a type that mentions a class type parameter == A method (or field) whose return type *mentions* a class type parameter (e.g., Iterator in Iterable.iterator) will also produce a non-denotable type when invoked for a wildcard-parameterized type. Unlike Category 2, which tend to be "terminal operations", these types often arise in chains. var constructor = Class.forName("java.lang.Object").getConstructor(); void m(Map map) { var keys = map.keySet(); } void m(Map map) { var iter = map.keySet().iterator(); } void m(TreeMap map) { var tail = map.subMap("b", "c"); } void m(TreeSet set) { var reverseOrder = set.comparator().reversed(); } void m(List list) { var unique = list.stream().distinct().sorted(); } void m(List stream) { var best = stream.min(Comparator.comparing(e -> e.getStackTrace().length)); } void m(Function f1, Function f2) { var f = f1.andThen(f2); } void m(Predicate discard) { var keep = discard.negate(); } == Case 4: method with inferred type parameter in return type == A method (or constructor) whose return type includes an inferred type parameter may end up substituting capture variables or other non-denotable types. This typically depends on the types of the arguments, again with a wildcard-parameterized type showing up somewhere. void m(Enumeration tasks) { var list = Collections.list(tasks); } void m(Set set) { var syncSet = Collections.synchronizedSet(set); } void m(Function f) { var es = Stream.of("a", "b", "c").map(f); } There are also cases here that are specified to produce capture vars but do not in javac: void m(List ns) { var firstSet = Collections.singleton(ns.get(0)); } ---------------- With that in mind, looking at our three options for dealing with capture variables: 1) Allow the non-denotable type 2) Map the type to a supertype that is denotable 3) Report an error (3) isn't viable. "You can't use 'var' with 'getClass'" is already pretty bad. Prohibiting all the uses above would be really bad. We've thought a lot about (1) and (2). The JEP includes this example: void test(List l1, List l2) { var l3 = l1; // List or List? l3 = l2; // error? l3.add(l3.get(0)); // error? } On 'l3 = l2': I wouldn't say it's an important priority that all 'var' variables have a type that is convenient for future mutation. But we do expect users do be able to easily see *why* an assignment wouldn't be allowed. Unfortunately, capture variables are such a subtle thing that they're often invisible, and programmers don't even realize that they appear as an intermediate step. So, most people would see 'var l3 = l1' and expect that the type of l3 is List. On 'l3.add(l3.get(0))': This is a cool trick. The use of 'var' essentially serves the same purpose as invoking a generic method in order to give a capture variable a name: dupFirst(List list) { list.add(list.get(0)); } ... dupFirst(l1); On the other hand, it's a subtle trick, and the average user isn't going to understand what's going on. (Or, more likely: 'l3.add(l3.get(0))' looks fine to them, but they won't understand why it stops working when that gets refactored to 'l1.add(l1.get(0))'.) So, in terms of user experience, it seems like (2) is the desired outcome here. That choice isn't without some sacrifice: it would be a nice property if lifting a subexpression out of an expression into its own 'var' declaration yields identical types. Since (2) changes the intermediate type, that doesn't hold. That said, hopefully our mapping function is reasonably unobtrusive... How do we define the mapping? "Use the bound" is the easy answer, although in practice it's more complicated than that: - Which bound? (upper or lower?) - What if the bound contains the capture var? - What do you do with a capture variable appearing as a (invariant) type argument? - What do you do with a capture variable appearing as a wildcard bound? We're working on finalizing the details. While this operation isn't trivial, it turns out it's pretty important: we already need it to solve bugs in the type system involving type inference [1] and lambda expressions [2]. It's a useful general-purpose tool. ?Dan [1] https://bugs.openjdk.java.net/browse/JDK-8016196 [2] https://bugs.openjdk.java.net/browse/JDK-8170887