From brian.goetz at oracle.com Sat Aug 3 00:00:03 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 3 Aug 2013 00:00:03 -0700 Subject: Stability of lambda serialization In-Reply-To: <572125674.885842.1374600926640.JavaMail.root@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> Message-ID: <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> This was discussed in the meeting after JVMLS this week. The consensus was that, while *this particular* issue could be addressed, there is an infinite spectrum of similar issues that cannot be addressed, and that it is preferable to draw a clean line about what the user can expect in terms of code changes destabilizing lambdas. That line is: If the code inside a method, or the method's signature, changes *in any way*, lambdas captured in that method should be considered destabilized. Changes to other methods, or changes to the order of methods, do not affect lambdas in an unchanged method. On Jul 23, 2013, at 10:35 AM, Scott Stark wrote: > Red Hat has a concern regarding how fragile the default serialization behavior of lambda expressions is in the current reference implementation, currently: > ironmaiden:OpenJDK starksm$ /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -version > java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b98) > Java HotSpot(TM) 64-Bit Server VM (build 25.0-b40, mixed mode) > > The problem is that the serialized form of a lambda expression depends on the order in which captured arguments are declared. The attached simple example demonstrates how easy it is for a trivial reordering of the lambda code block to result in an inability to deserialize a previously saved expression. > > To produce this exception: > 1. Run the serialization.AuthenticationContext.testWriteLambda method with the lambda expression written as: > Authenticator a = (Authenticator & Serializable) (String principal, char[] pass) -> { > // Run with p declared first when writing out the /tmp/testWriteLambda.bin, then switch > // to declare u first when running testReadLambda > String p = "-> Password " + password + " <-"; > String u = "-> User " + user + " <-"; > return u + " " + p; > }; > 2. Change the lambda expression to: > Authenticator a = (Authenticator & Serializable) (String principal, char[] pass) -> { > // Run with p declared first when writing out the /tmp/testWriteLambda.bin, then switch > // to declare u first when running testReadLambda > String u = "-> User " + user + " <-"; > String p = "-> Password " + password + " <-"; > return u + " " + p; > }; > > Recompile and run serialization.AuthenticationContext.testReadLambda to produce: > > java.io.IOException: unexpected exception type > at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) > at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1110) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at serialization.AuthenticationContext.testReadLambda(AuthenticationContext.java:34) > ... > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:222) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) > ... 30 more > Caused by: java.lang.IllegalArgumentException: Invalid lambda deserialization > at serialization.AuthenticationContext.$deserializeLambda$(AuthenticationContext.java:1) > ... 40 more > > One does not see the same level of sensitivity to the ordering of the serialization fields in a POJO as demonstrated by the serialization.AuthenticationContext.testWritePOJO/testReadPOJO cases where one can reorder the TestPOJO.{user,password} fields without having serialization fail. > > We would like to see at least that level of stability of the serialized form of lambda expressions. > From sstark at redhat.com Mon Aug 5 11:09:14 2013 From: sstark at redhat.com (Scott Stark) Date: Mon, 5 Aug 2013 14:09:14 -0400 (EDT) Subject: Stability of lambda serialization In-Reply-To: References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> Message-ID: <206041618.5397062.1375726154541.JavaMail.root@redhat.com> So Red Hat will go on the record to state that if serialization of lambdas is beyond the scope that DML suggested, we will vote no on the JSR. As a concluding remark on issue, consider an alternate form of the example given where the user and password variable types do not differ. In this situation, the reordering results in the password seen as the user. I can see that this will be the subject of yet another CVE somewhere down the road. Running testReadLambda after reorder... -> User secret1password <- -> Password bob <- ----- Original Message ----- From: "Paul Benedict" To: lambda-spec-observers at openjdk.java.net Cc: "Scott Stark" , lambda-spec-experts at openjdk.java.net Sent: Saturday, August 3, 2013 4:54:58 PM Subject: Re: Stability of lambda serialization I imagine this will mostly impact EE applications in a very negative way. It's one thing if my method signatures change, or I explicitly add some fields, but now adding/removing a local lambda will do that? Serialization errors are the worst thing in EE -- and this kind of just heaps on more misery. Paul On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz wrote: > This was discussed in the meeting after JVMLS this week. The consensus > was that, while *this particular* issue could be addressed, there is an > infinite spectrum of similar issues that cannot be addressed, and that it > is preferable to draw a clean line about what the user can expect in terms > of code changes destabilizing lambdas. > > That line is: If the code inside a method, or the method's signature, > changes *in any way*, lambdas captured in that method should be considered > destabilized. Changes to other methods, or changes to the order of > methods, do not affect lambdas in an unchanged method. > > On Jul 23, 2013, at 10:35 AM, Scott Stark wrote: > > > Red Hat has a concern regarding how fragile the default serialization > behavior of lambda expressions is in the current reference implementation, > currently: > > ironmaiden:OpenJDK starksm$ > /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java > -version > > java version "1.8.0-ea" > > Java(TM) SE Runtime Environment (build 1.8.0-ea-b98) > > Java HotSpot(TM) 64-Bit Server VM (build 25.0-b40, mixed mode) > > > > The problem is that the serialized form of a lambda expression depends > on the order in which captured arguments are declared. The attached simple > example demonstrates how easy it is for a trivial reordering of the lambda > code block to result in an inability to deserialize a previously saved > expression. > > > > To produce this exception: > > 1. Run the serialization.AuthenticationContext.testWriteLambda method > with the lambda expression written as: > > Authenticator a = (Authenticator & Serializable) (String principal, > char[] pass) -> { > > // Run with p declared first when writing out the > /tmp/testWriteLambda.bin, then switch > > // to declare u first when running testReadLambda > > String p = "-> Password " + password + " <-"; > > String u = "-> User " + user + " <-"; > > return u + " " + p; > > }; > > 2. Change the lambda expression to: > > Authenticator a = (Authenticator & Serializable) (String principal, > char[] pass) -> { > > // Run with p declared first when writing out the > /tmp/testWriteLambda.bin, then switch > > // to declare u first when running testReadLambda > > String u = "-> User " + user + " <-"; > > String p = "-> Password " + password + " <-"; > > return u + " " + p; > > }; > > > > Recompile and run serialization.AuthenticationContext.testReadLambda to > produce: > > > > java.io.IOException: unexpected exception type > > at > java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) > > at > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1110) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) > > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > > at > serialization.AuthenticationContext.testReadLambda(AuthenticationContext.java:34) > > ... > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at > java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:222) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) > > ... 30 more > > Caused by: java.lang.IllegalArgumentException: Invalid lambda > deserialization > > at > serialization.AuthenticationContext.$deserializeLambda$(AuthenticationContext.java:1) > > ... 40 more > > > > One does not see the same level of sensitivity to the ordering of the > serialization fields in a POJO as demonstrated by the > serialization.AuthenticationContext.testWritePOJO/testReadPOJO cases where > one can reorder the TestPOJO.{user,password} fields without having > serialization fail. > > > > We would like to see at least that level of stability of the serialized > form of lambda expressions. > > > > -- Cheers, Paul From forax at univ-mlv.fr Mon Aug 5 12:08:28 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 05 Aug 2013 21:08:28 +0200 Subject: Stability of lambda serialization In-Reply-To: <206041618.5397062.1375726154541.JavaMail.root@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> Message-ID: <51FFF82C.50608@univ-mlv.fr> You really want to serialize a password :) Funny idea. Anyway, we all know that serialization is broken, in many ways. Usually, the only scenario that works is when the same code is used by on both side of the wires, trying to do something different is like playing the Russian roulette. It may works for some time. If you change the code of an inner classes, you will have the same issue as the one you describe with lambdas. You may argue that lambdas should be better than inner classes, but given that even if we find a way to make the serialization stable to this kind of change there are still a lot of changes, like by example switching two lambdas in the same method, that makes the serialization not stable. So the EG had decided that serialization of lambdas should be not better and not worst than serialization of inner classes. R?mi On 08/05/2013 08:09 PM, Scott Stark wrote: > So Red Hat will go on the record to state that if serialization of lambdas is beyond the scope that DML suggested, we will vote no on the JSR. As a concluding remark on issue, consider an alternate form of the example given where the user and password variable types do not differ. In this situation, the reordering results in the password seen as the user. I can see that this will be the subject of yet another CVE somewhere down the road. > > > Running testReadLambda after reorder... > -> User secret1password <- -> Password bob <- > > > > ----- Original Message ----- > From: "Paul Benedict" > To: lambda-spec-observers at openjdk.java.net > Cc: "Scott Stark" , lambda-spec-experts at openjdk.java.net > Sent: Saturday, August 3, 2013 4:54:58 PM > Subject: Re: Stability of lambda serialization > > I imagine this will mostly impact EE applications in a very negative way. > It's one thing if my method signatures change, or I explicitly add some > fields, but now adding/removing a local lambda will do that? Serialization > errors are the worst thing in EE -- and this kind of just heaps on more > misery. > > Paul > > > On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz wrote: > >> This was discussed in the meeting after JVMLS this week. The consensus >> was that, while *this particular* issue could be addressed, there is an >> infinite spectrum of similar issues that cannot be addressed, and that it >> is preferable to draw a clean line about what the user can expect in terms >> of code changes destabilizing lambdas. >> >> That line is: If the code inside a method, or the method's signature, >> changes *in any way*, lambdas captured in that method should be considered >> destabilized. Changes to other methods, or changes to the order of >> methods, do not affect lambdas in an unchanged method. >> >> On Jul 23, 2013, at 10:35 AM, Scott Stark wrote: >> >>> Red Hat has a concern regarding how fragile the default serialization >> behavior of lambda expressions is in the current reference implementation, >> currently: >>> ironmaiden:OpenJDK starksm$ >> /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java >> -version >>> java version "1.8.0-ea" >>> Java(TM) SE Runtime Environment (build 1.8.0-ea-b98) >>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b40, mixed mode) >>> >>> The problem is that the serialized form of a lambda expression depends >> on the order in which captured arguments are declared. The attached simple >> example demonstrates how easy it is for a trivial reordering of the lambda >> code block to result in an inability to deserialize a previously saved >> expression. >>> To produce this exception: >>> 1. Run the serialization.AuthenticationContext.testWriteLambda method >> with the lambda expression written as: >>> Authenticator a = (Authenticator & Serializable) (String principal, >> char[] pass) -> { >>> // Run with p declared first when writing out the >> /tmp/testWriteLambda.bin, then switch >>> // to declare u first when running testReadLambda >>> String p = "-> Password " + password + " <-"; >>> String u = "-> User " + user + " <-"; >>> return u + " " + p; >>> }; >>> 2. Change the lambda expression to: >>> Authenticator a = (Authenticator & Serializable) (String principal, >> char[] pass) -> { >>> // Run with p declared first when writing out the >> /tmp/testWriteLambda.bin, then switch >>> // to declare u first when running testReadLambda >>> String u = "-> User " + user + " <-"; >>> String p = "-> Password " + password + " <-"; >>> return u + " " + p; >>> }; >>> >>> Recompile and run serialization.AuthenticationContext.testReadLambda to >> produce: >>> java.io.IOException: unexpected exception type >>> at >> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) >>> at >> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1110) >>> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) >>> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >> serialization.AuthenticationContext.testReadLambda(AuthenticationContext.java:34) >>> ... >>> Caused by: java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at >> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:222) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at >> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) >>> ... 30 more >>> Caused by: java.lang.IllegalArgumentException: Invalid lambda >> deserialization >>> at >> serialization.AuthenticationContext.$deserializeLambda$(AuthenticationContext.java:1) >>> ... 40 more >>> >>> One does not see the same level of sensitivity to the ordering of the >> serialization fields in a POJO as demonstrated by the >> serialization.AuthenticationContext.testWritePOJO/testReadPOJO cases where >> one can reorder the TestPOJO.{user,password} fields without having >> serialization fail. >>> We would like to see at least that level of stability of the serialized >> form of lambda expressions. >>> >> > From david.lloyd at redhat.com Mon Aug 5 15:15:06 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Mon, 05 Aug 2013 17:15:06 -0500 Subject: Stability of lambda serialization In-Reply-To: <51FFF82C.50608@univ-mlv.fr> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> Message-ID: <520023EA.80906@redhat.com> This was and is a bad decision. Some things serialize better than others. It makes far more sense to attempt to serialize as well as the things which serialize in the best possible way (i.e. fewest known issues). By simply requiring stable names for all serializable state, you achieve the best possible result. Justifying one bad solution because of another existing bad solution is bad engineering, and results in perpetuation of bad software. There's absolutely no reason to settle for this. There are simply no benefits to it other than the emotional comfort of being similar to something that exists, which is a foolish benchmark. If anonymous inner classes were not implemented in this manner, why would that make a difference to the way lambdas are serialized? The password example is deliberately chosen to be a simple expression of the potential problem. I imagine much more sophisticated problems resulting from modified sort order, information exposure from inadvertent capture, and so on. It's so much simpler to simply forbid serializing capturing lambdas, and much safer besides. On the other hand, I don't think we have really captured any compelling use cases for supporting serializable, capturing lambdas other than a lot of hand-wavy kinds of things. On 08/05/2013 02:08 PM, Remi Forax wrote: > You really want to serialize a password :) Funny idea. > > Anyway, we all know that serialization is broken, in many ways. > Usually, the only scenario that works is when the same code is used by > on both side of the wires, > trying to do something different is like playing the Russian roulette. > It may works for some time. > > If you change the code of an inner classes, you will have the same issue > as the one you describe > with lambdas. You may argue that lambdas should be better than inner > classes, > but given that even if we find a way to make the serialization stable to > this kind of change > there are still a lot of changes, like by example switching two lambdas > in the same method, > that makes the serialization not stable. > > So the EG had decided that serialization of lambdas should be not better > and not worst > than serialization of inner classes. > > R?mi > > > On 08/05/2013 08:09 PM, Scott Stark wrote: >> So Red Hat will go on the record to state that if serialization of >> lambdas is beyond the scope that DML suggested, we will vote no on the >> JSR. As a concluding remark on issue, consider an alternate form of >> the example given where the user and password variable types do not >> differ. In this situation, the reordering results in the password seen >> as the user. I can see that this will be the subject of yet another >> CVE somewhere down the road. >> >> >> Running testReadLambda after reorder... >> -> User secret1password <- -> Password bob <- >> >> >> >> ----- Original Message ----- >> From: "Paul Benedict" >> To: lambda-spec-observers at openjdk.java.net >> Cc: "Scott Stark" , >> lambda-spec-experts at openjdk.java.net >> Sent: Saturday, August 3, 2013 4:54:58 PM >> Subject: Re: Stability of lambda serialization >> >> I imagine this will mostly impact EE applications in a very negative way. >> It's one thing if my method signatures change, or I explicitly add some >> fields, but now adding/removing a local lambda will do that? >> Serialization >> errors are the worst thing in EE -- and this kind of just heaps on more >> misery. >> >> Paul >> >> >> On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz >> wrote: >> >>> This was discussed in the meeting after JVMLS this week. The consensus >>> was that, while *this particular* issue could be addressed, there is an >>> infinite spectrum of similar issues that cannot be addressed, and >>> that it >>> is preferable to draw a clean line about what the user can expect in >>> terms >>> of code changes destabilizing lambdas. >>> >>> That line is: If the code inside a method, or the method's signature, >>> changes *in any way*, lambdas captured in that method should be >>> considered >>> destabilized. Changes to other methods, or changes to the order of >>> methods, do not affect lambdas in an unchanged method. >>> >>> On Jul 23, 2013, at 10:35 AM, Scott Stark wrote: >>> >>>> Red Hat has a concern regarding how fragile the default serialization >>> behavior of lambda expressions is in the current reference >>> implementation, >>> currently: >>>> ironmaiden:OpenJDK starksm$ >>> /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java >>> -version >>>> java version "1.8.0-ea" >>>> Java(TM) SE Runtime Environment (build 1.8.0-ea-b98) >>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b40, mixed mode) >>>> >>>> The problem is that the serialized form of a lambda expression depends >>> on the order in which captured arguments are declared. The attached >>> simple >>> example demonstrates how easy it is for a trivial reordering of the >>> lambda >>> code block to result in an inability to deserialize a previously saved >>> expression. >>>> To produce this exception: >>>> 1. Run the serialization.AuthenticationContext.testWriteLambda method >>> with the lambda expression written as: >>>> Authenticator a = (Authenticator & Serializable) (String >>>> principal, >>> char[] pass) -> { >>>> // Run with p declared first when writing out the >>> /tmp/testWriteLambda.bin, then switch >>>> // to declare u first when running testReadLambda >>>> String p = "-> Password " + password + " <-"; >>>> String u = "-> User " + user + " <-"; >>>> return u + " " + p; >>>> }; >>>> 2. Change the lambda expression to: >>>> Authenticator a = (Authenticator & Serializable) (String >>>> principal, >>> char[] pass) -> { >>>> // Run with p declared first when writing out the >>> /tmp/testWriteLambda.bin, then switch >>>> // to declare u first when running testReadLambda >>>> String u = "-> User " + user + " <-"; >>>> String p = "-> Password " + password + " <-"; >>>> return u + " " + p; >>>> }; >>>> >>>> Recompile and run serialization.AuthenticationContext.testReadLambda to >>> produce: >>>> java.io.IOException: unexpected exception type >>>> at >>> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) >>> >>>> at >>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1110) >>>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) >>> >>>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >>>> at >>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>>> at >>> serialization.AuthenticationContext.testReadLambda(AuthenticationContext.java:34) >>> >>>> ... >>>> Caused by: java.lang.reflect.InvocationTargetException >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> >>>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>>> at >>> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:222) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> >>>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>>> at >>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) >>>> ... 30 more >>>> Caused by: java.lang.IllegalArgumentException: Invalid lambda >>> deserialization >>>> at >>> serialization.AuthenticationContext.$deserializeLambda$(AuthenticationContext.java:1) >>> >>>> ... 40 more >>>> >>>> One does not see the same level of sensitivity to the ordering of the >>> serialization fields in a POJO as demonstrated by the >>> serialization.AuthenticationContext.testWritePOJO/testReadPOJO cases >>> where >>> one can reorder the TestPOJO.{user,password} fields without having >>> serialization fail. >>>> We would like to see at least that level of stability of the serialized >>> form of lambda expressions. >>>> >>> >> > -- - DML From daniel.smith at oracle.com Mon Aug 5 16:49:46 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 5 Aug 2013 16:49:46 -0700 Subject: Stability of lambda serialization In-Reply-To: <520023EA.80906@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> Message-ID: On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz wrote: > while *this particular* issue could be addressed, there is an infinite spectrum of similar issues that cannot be addressed, and that it is preferable to draw a clean line about what the user can expect in terms of code changes destabilizing lambdas. > > That line is: If the code inside a method, or the method's signature, changes *in any way*, lambdas captured in that method should be considered destabilized. On Aug 5, 2013, at 3:15 PM, David M. Lloyd wrote: > It's so much simpler to simply forbid serializing capturing lambdas, and much safer besides. It's not clear to me why you feel that problems arising from captured variables are more serious than other problems that can arise from changes to the code inside a method. For example, if I rearrange the order in which lambdas appear, old serialized (non-capturing) lambdas may match up with the wrong lambda on deserialization. Also note that there are really two issues here: first, which changes are considered destabilizing, and second, what to do with destabilized lambdas. What is destabilizing? EG: any change to a method; you: reordering captured variables What to do? EG: make a best effort, with documented caveats; you: conservatively prohibit serialization of capturing lambdas; third alternative: conservatively detect problems and break at deserialization If we have zero tolerance for destabilization of the serialized form, then we should either prohibit serialization of _all_ lambdas, or somehow encode the method contents (a hash?) and detect changes at deserialization time. Prohibiting just capturing lambdas is a half measure. The EG agreed instead to be tolerant, acknowledging that in the presence of a destabilizing change, sometimes everything works perfectly well, and occasionally things will break. ?Dan From brian.goetz at oracle.com Mon Aug 5 16:52:29 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 5 Aug 2013 16:52:29 -0700 Subject: Stability of lambda serialization In-Reply-To: <520023EA.80906@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> Message-ID: I understand your objection. This was a difficult issue, and one which occupied a frustratingly large amount of time for the expert group. This is an area where there are no good solutions, just tradeoffs between different levels of user surprise, stability, and complexity. Different parties may have different preferences due to their own subjective ordering of the undesirability of different failure modes, and therefore may reach different conclusions about which solutions are more or less desirable. The consensus of the EG has been -- with the sole exception of Red Hat -- that the current target is the least bad of the options proposed, and that's what we chose to pursue. Named lambdas were included on the list of issues gathered at the recent meeting to consider for the next round. Indeed, named lambdas would address this problem once and for all -- and we all knew this when it was raised the previous several times it came up. We left room to pursue them later, but for the time being, the EG felt that the syntactic overhead did not pull its weight. The idea of restricting lambda serialization to non-capturing lambdas gained zero traction when it was proposed, in part because it makes things more confusing for the user, who now has to reason about which lambdas are capturing (its not like C++, where capturing lambdas are syntactically distinct from non-capturing ones). It is unfortunate that Red Hat sees fit to threaten a No vote because an extensively discussed decision went in a direction they did not prefer. But the reality is, this issue *was* extensively discussed, Red Hat had many opportunities to attempt to win other EG members over to their preference, and the EG still -- after reopening the issue multiple times -- preferred the "weakly serializable" guarantees offered by the current model. On Aug 5, 2013, at 3:15 PM, David M. Lloyd wrote: > This was and is a bad decision. Some things serialize better than others. It makes far more sense to attempt to serialize as well as the things which serialize in the best possible way (i.e. fewest known issues). By simply requiring stable names for all serializable state, you achieve the best possible result. > > Justifying one bad solution because of another existing bad solution is bad engineering, and results in perpetuation of bad software. There's absolutely no reason to settle for this. There are simply no benefits to it other than the emotional comfort of being similar to something that exists, which is a foolish benchmark. If anonymous inner classes were not implemented in this manner, why would that make a difference to the way lambdas are serialized? > > The password example is deliberately chosen to be a simple expression of the potential problem. I imagine much more sophisticated problems resulting from modified sort order, information exposure from inadvertent capture, and so on. It's so much simpler to simply forbid serializing capturing lambdas, and much safer besides. On the other hand, I don't think we have really captured any compelling use cases for supporting serializable, capturing lambdas other than a lot of hand-wavy kinds of things. > > On 08/05/2013 02:08 PM, Remi Forax wrote: >> You really want to serialize a password :) Funny idea. >> >> Anyway, we all know that serialization is broken, in many ways. >> Usually, the only scenario that works is when the same code is used by >> on both side of the wires, >> trying to do something different is like playing the Russian roulette. >> It may works for some time. >> >> If you change the code of an inner classes, you will have the same issue >> as the one you describe >> with lambdas. You may argue that lambdas should be better than inner >> classes, >> but given that even if we find a way to make the serialization stable to >> this kind of change >> there are still a lot of changes, like by example switching two lambdas >> in the same method, >> that makes the serialization not stable. >> >> So the EG had decided that serialization of lambdas should be not better >> and not worst >> than serialization of inner classes. >> >> R?mi >> >> >> On 08/05/2013 08:09 PM, Scott Stark wrote: >>> So Red Hat will go on the record to state that if serialization of >>> lambdas is beyond the scope that DML suggested, we will vote no on the >>> JSR. As a concluding remark on issue, consider an alternate form of >>> the example given where the user and password variable types do not >>> differ. In this situation, the reordering results in the password seen >>> as the user. I can see that this will be the subject of yet another >>> CVE somewhere down the road. >>> >>> >>> Running testReadLambda after reorder... >>> -> User secret1password <- -> Password bob <- >>> >>> >>> >>> ----- Original Message ----- >>> From: "Paul Benedict" >>> To: lambda-spec-observers at openjdk.java.net >>> Cc: "Scott Stark" , >>> lambda-spec-experts at openjdk.java.net >>> Sent: Saturday, August 3, 2013 4:54:58 PM >>> Subject: Re: Stability of lambda serialization >>> >>> I imagine this will mostly impact EE applications in a very negative way. >>> It's one thing if my method signatures change, or I explicitly add some >>> fields, but now adding/removing a local lambda will do that? >>> Serialization >>> errors are the worst thing in EE -- and this kind of just heaps on more >>> misery. >>> >>> Paul >>> >>> >>> On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz >>> wrote: >>> >>>> This was discussed in the meeting after JVMLS this week. The consensus >>>> was that, while *this particular* issue could be addressed, there is an >>>> infinite spectrum of similar issues that cannot be addressed, and >>>> that it >>>> is preferable to draw a clean line about what the user can expect in >>>> terms >>>> of code changes destabilizing lambdas. >>>> >>>> That line is: If the code inside a method, or the method's signature, >>>> changes *in any way*, lambdas captured in that method should be >>>> considered >>>> destabilized. Changes to other methods, or changes to the order of >>>> methods, do not affect lambdas in an unchanged method. >>>> >>>> On Jul 23, 2013, at 10:35 AM, Scott Stark wrote: >>>> >>>>> Red Hat has a concern regarding how fragile the default serialization >>>> behavior of lambda expressions is in the current reference >>>> implementation, >>>> currently: >>>>> ironmaiden:OpenJDK starksm$ >>>> /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java >>>> -version >>>>> java version "1.8.0-ea" >>>>> Java(TM) SE Runtime Environment (build 1.8.0-ea-b98) >>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b40, mixed mode) >>>>> >>>>> The problem is that the serialized form of a lambda expression depends >>>> on the order in which captured arguments are declared. The attached >>>> simple >>>> example demonstrates how easy it is for a trivial reordering of the >>>> lambda >>>> code block to result in an inability to deserialize a previously saved >>>> expression. >>>>> To produce this exception: >>>>> 1. Run the serialization.AuthenticationContext.testWriteLambda method >>>> with the lambda expression written as: >>>>> Authenticator a = (Authenticator & Serializable) (String >>>>> principal, >>>> char[] pass) -> { >>>>> // Run with p declared first when writing out the >>>> /tmp/testWriteLambda.bin, then switch >>>>> // to declare u first when running testReadLambda >>>>> String p = "-> Password " + password + " <-"; >>>>> String u = "-> User " + user + " <-"; >>>>> return u + " " + p; >>>>> }; >>>>> 2. Change the lambda expression to: >>>>> Authenticator a = (Authenticator & Serializable) (String >>>>> principal, >>>> char[] pass) -> { >>>>> // Run with p declared first when writing out the >>>> /tmp/testWriteLambda.bin, then switch >>>>> // to declare u first when running testReadLambda >>>>> String u = "-> User " + user + " <-"; >>>>> String p = "-> Password " + password + " <-"; >>>>> return u + " " + p; >>>>> }; >>>>> >>>>> Recompile and run serialization.AuthenticationContext.testReadLambda to >>>> produce: >>>>> java.io.IOException: unexpected exception type >>>>> at >>>> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) >>>> >>>>> at >>>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1110) >>>>> at >>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) >>>> >>>>> at >>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >>>>> at >>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>>>> at >>>> serialization.AuthenticationContext.testReadLambda(AuthenticationContext.java:34) >>>> >>>>> ... >>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> >>>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> >>>>> at >>>> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:222) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> >>>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> >>>>> at >>>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) >>>>> ... 30 more >>>>> Caused by: java.lang.IllegalArgumentException: Invalid lambda >>>> deserialization >>>>> at >>>> serialization.AuthenticationContext.$deserializeLambda$(AuthenticationContext.java:1) >>>> >>>>> ... 40 more >>>>> >>>>> One does not see the same level of sensitivity to the ordering of the >>>> serialization fields in a POJO as demonstrated by the >>>> serialization.AuthenticationContext.testWritePOJO/testReadPOJO cases >>>> where >>>> one can reorder the TestPOJO.{user,password} fields without having >>>> serialization fail. >>>>> We would like to see at least that level of stability of the serialized >>>> form of lambda expressions. >>>>> >>>> >>> >> > > > -- > - DML From david.lloyd at redhat.com Tue Aug 6 06:17:53 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 06 Aug 2013 08:17:53 -0500 Subject: Stability of lambda serialization In-Reply-To: References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> Message-ID: <5200F781.8080501@redhat.com> On 08/05/2013 06:49 PM, Dan Smith wrote: > > On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz > wrote: > >> while *this particular* issue could be addressed, there is an >> infinite spectrum of similar issues that cannot be addressed, and >> that it is preferable to draw a clean line about what the user can >> expect in terms of code changes destabilizing lambdas. >> >> That line is: If the code inside a method, or the method's >> signature, changes *in any way*, lambdas captured in that method >> should be considered destabilized. > > > On Aug 5, 2013, at 3:15 PM, David M. Lloyd > wrote: > >> It's so much simpler to simply forbid serializing capturing >> lambdas, and much safer besides. > > It's not clear to me why you feel that problems arising from captured > variables are more serious than other problems that can arise from > changes to the code inside a method. For example, if I rearrange the > order in which lambdas appear, old serialized (non-capturing) lambdas > may match up with the wrong lambda on deserialization. I do not think these problems are more serious. In fact I had proposed multiple times to limit serializability to named method refs due to this potential issue. > Also note that there are really two issues here: first, which changes > are considered destabilizing, and second, what to do with > destabilized lambdas. > > What is destabilizing? EG: any change to a method; you: reordering > captured variables Me: Reordering captured variables, reordering lambda incidence. The EG's stance is just a generalization. It's not a stance in any case: things which destabilize lambdas in terms of serializability are not a question of opinion, and it's bizarre to frame it that way. > What to do? EG: make a best effort, with documented caveats; you: > conservatively prohibit serialization of capturing lambdas; third > alternative: conservatively detect problems and break at > deserialization I'm OK with either "you" or "third alternative". > If we have zero tolerance for destabilization of the serialized form, > then we should either prohibit serialization of _all_ lambdas, or > somehow encode the method contents (a hash?) and detect changes at > deserialization time. Prohibiting just capturing lambdas is a half > measure. I always advocate security over tolerance - to do otherwise invites CVEs (a possibly familiar story). Capturing lambdas has been the focus of a couple recent emails of mine but indeed I do not think we should have any tolerance for any destabilizing of serialized lambdas. I think that calculating a non-changeable serialVersionUID might be a good way forward, if we can work out what must enter in to the calculation (perhaps it's simply the entire compilation unit which includes the lambda). > The EG agreed instead to be tolerant, acknowledging that in the > presence of a destabilizing change, sometimes everything works > perfectly well, and occasionally things will break. I'd say breakage is the least concern. But in any case I do not agree, and unless we can come to some solution, we will codify that disagreement in a No vote, since that's what it's for, after all. -- - DML From dl at cs.oswego.edu Tue Aug 6 07:36:25 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 06 Aug 2013 10:36:25 -0400 Subject: Stability of lambda serialization In-Reply-To: <5200F781.8080501@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> Message-ID: <520109E9.4050304@cs.oswego.edu> On 08/06/13 09:17, David M. Lloyd wrote: > Me: Reordering captured variables, reordering lambda incidence. The EG's stance > is just a generalization. It's not a stance in any case: things which > destabilize lambdas in terms of serializability are not a question of opinion, > and it's bizarre to frame it that way. > >> What to do? EG: make a best effort, with documented caveats; you: >> conservatively prohibit serialization of capturing lambdas; third >> alternative: conservatively detect problems and break at >> deserialization > > I'm OK with either "you" or "third alternative". I'm OK with 3rd alternative if some reasonably efficient checksum/serial id ensuring breakage could be devised. David, any ideas? -Doug > >> If we have zero tolerance for destabilization of the serialized form, >> then we should either prohibit serialization of _all_ lambdas, or >> somehow encode the method contents (a hash?) and detect changes at >> deserialization time. Prohibiting just capturing lambdas is a half >> measure. > > I always advocate security over tolerance - to do otherwise invites CVEs (a > possibly familiar story). Capturing lambdas has been the focus of a couple > recent emails of mine but indeed I do not think we should have any tolerance for > any destabilizing of serialized lambdas. > > I think that calculating a non-changeable serialVersionUID might be a good way > forward, if we can work out what must enter in to the calculation (perhaps it's > simply the entire compilation unit which includes the lambda). > >> The EG agreed instead to be tolerant, acknowledging that in the >> presence of a destabilizing change, sometimes everything works >> perfectly well, and occasionally things will break. > > I'd say breakage is the least concern. But in any case I do not agree, and > unless we can come to some solution, we will codify that disagreement in a No > vote, since that's what it's for, after all. From dl at cs.oswego.edu Tue Aug 6 08:42:21 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 06 Aug 2013 11:42:21 -0400 Subject: FJ parallelism zero Message-ID: <5201195D.1050501@cs.oswego.edu> Mostly to Brian, but other thoughts welcome: As a compromise of sorts with the EE folks, we allow the ForkJoinPool.commonPool to be set to parallelism zero via a system property. We by default set to Runtime.availableProcessors()-1. All the Stream stuff can handle parallelism zero (in these cases, the submitter ends up executing all the subtasks). But when the ForkJoinPool.commonPool is used for purely async stuff, it is surprising at best that tasks may never run unless somehow joined or the submitter takes some alternative action (which we sometimes do in other JDK usages of commonPool). I think the EE folks are in general OK with the surprise. But not so for those few people out there running JDK8 on uniprocessors, with default configurations, that also end up setting parallelism to zero. So unless anyone can think of a reason otherwise, I'm going to change to minimum parallelism of one unless set to zero by system property. (Brian: do you have any recollection of why we didn't do it this way in the first place?) -Doug From david.lloyd at redhat.com Tue Aug 6 09:14:53 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 06 Aug 2013 11:14:53 -0500 Subject: Stability of lambda serialization In-Reply-To: <520109E9.4050304@cs.oswego.edu> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> <520109E9.4050304@cs.oswego.edu> Message-ID: <520120FD.3040004@redhat.com> On 08/06/2013 09:36 AM, Doug Lea wrote: > On 08/06/13 09:17, David M. Lloyd wrote: > >> Me: Reordering captured variables, reordering lambda incidence. The >> EG's stance >> is just a generalization. It's not a stance in any case: things which >> destabilize lambdas in terms of serializability are not a question of >> opinion, >> and it's bizarre to frame it that way. >> >>> What to do? EG: make a best effort, with documented caveats; you: >>> conservatively prohibit serialization of capturing lambdas; third >>> alternative: conservatively detect problems and break at >>> deserialization >> >> I'm OK with either "you" or "third alternative". > > I'm OK with 3rd alternative if some reasonably efficient > checksum/serial id ensuring breakage could be devised. > David, any ideas? It's a good idea. I can think of a few requirements offhand: * Generation of the hash would necessarily occur at compile time * The hash would have to be unique for each lambda within a class and/or compilation unit * The hash would have to be sensitive to any changes which would cause any indeterminism in how the lambda is resolved - this may extend to hashing even the bytecode of methods which include the lambda. This is the key/most complex concept to tackle to make this solution work. * The last step is to tag the UID value on to the serialized lambda representation. I don't think there is much more to it than this; the hardest part is determining what/how to hash. If it happens at compile time then resolution at run time (i.e. the more performance-sensitive context) should be the same kind of numerical comparison which is already done for serialVersionUID. > > -Doug > > > >> >>> If we have zero tolerance for destabilization of the serialized form, >>> then we should either prohibit serialization of _all_ lambdas, or >>> somehow encode the method contents (a hash?) and detect changes at >>> deserialization time. Prohibiting just capturing lambdas is a half >>> measure. >> >> I always advocate security over tolerance - to do otherwise invites >> CVEs (a >> possibly familiar story). Capturing lambdas has been the focus of a >> couple >> recent emails of mine but indeed I do not think we should have any >> tolerance for >> any destabilizing of serialized lambdas. >> >> I think that calculating a non-changeable serialVersionUID might be a >> good way >> forward, if we can work out what must enter in to the calculation >> (perhaps it's >> simply the entire compilation unit which includes the lambda). >> >>> The EG agreed instead to be tolerant, acknowledging that in the >>> presence of a destabilizing change, sometimes everything works >>> perfectly well, and occasionally things will break. >> >> I'd say breakage is the least concern. But in any case I do not >> agree, and >> unless we can come to some solution, we will codify that disagreement >> in a No >> vote, since that's what it's for, after all. > -- - DML From david.lloyd at redhat.com Tue Aug 6 09:43:40 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 06 Aug 2013 11:43:40 -0500 Subject: Stability of lambda serialization In-Reply-To: <520120FD.3040004@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> <520109E9.4050304@cs.oswego.edu> <520120FD.3040004@redhat.com> Message-ID: <520127BC.9090705@redhat.com> On 08/06/2013 11:14 AM, David M. Lloyd wrote: > On 08/06/2013 09:36 AM, Doug Lea wrote: >> On 08/06/13 09:17, David M. Lloyd wrote: >> >>> Me: Reordering captured variables, reordering lambda incidence. The >>> EG's stance >>> is just a generalization. It's not a stance in any case: things which >>> destabilize lambdas in terms of serializability are not a question of >>> opinion, >>> and it's bizarre to frame it that way. >>> >>>> What to do? EG: make a best effort, with documented caveats; you: >>>> conservatively prohibit serialization of capturing lambdas; third >>>> alternative: conservatively detect problems and break at >>>> deserialization >>> >>> I'm OK with either "you" or "third alternative". >> >> I'm OK with 3rd alternative if some reasonably efficient >> checksum/serial id ensuring breakage could be devised. >> David, any ideas? > > It's a good idea. I can think of a few requirements offhand: > > * Generation of the hash would necessarily occur at compile time > * The hash would have to be unique for each lambda within a class and/or > compilation unit > * The hash would have to be sensitive to any changes which would cause > any indeterminism in how the lambda is resolved - this may extend to > hashing even the bytecode of methods which include the lambda. This is > the key/most complex concept to tackle to make this solution work. > * The last step is to tag the UID value on to the serialized lambda > representation. > > I don't think there is much more to it than this; the hardest part is > determining what/how to hash. If it happens at compile time then > resolution at run time (i.e. the more performance-sensitive context) > should be the same kind of numerical comparison which is already done > for serialVersionUID. Brian pointed out a couple things: * Such a scheme would have to be very strongly and clearly specified * The scheme cannot depend on any particular non-spec compiler behavior (i.e. the same source file should create the same hashes regardless of compiler version or vendor) I suggested as a possible starting point a scheme which could create a 64-bit hash based on a combination of: * Any captured variables' name and declaration order * The declaration order of the lambda * Information about the enclosing method: name and signature, maybe decl order? (though it should be redundant wrt. the lambda decl order) * The usual serialVersionUID calculation I would really appreciate anyone's thoughts as to the efficacy of this approach and any potential weaknesses; in particular I'd like to hear if anyone things this is a non-trivial change in terms of compilation and runtime. In particular, it is not 100% clear how the calculation would work with nested lambdas or lambdas nested in inner classes for example. > >> >> -Doug >> >> >> >>> >>>> If we have zero tolerance for destabilization of the serialized form, >>>> then we should either prohibit serialization of _all_ lambdas, or >>>> somehow encode the method contents (a hash?) and detect changes at >>>> deserialization time. Prohibiting just capturing lambdas is a half >>>> measure. >>> >>> I always advocate security over tolerance - to do otherwise invites >>> CVEs (a >>> possibly familiar story). Capturing lambdas has been the focus of a >>> couple >>> recent emails of mine but indeed I do not think we should have any >>> tolerance for >>> any destabilizing of serialized lambdas. >>> >>> I think that calculating a non-changeable serialVersionUID might be a >>> good way >>> forward, if we can work out what must enter in to the calculation >>> (perhaps it's >>> simply the entire compilation unit which includes the lambda). >>> >>>> The EG agreed instead to be tolerant, acknowledging that in the >>>> presence of a destabilizing change, sometimes everything works >>>> perfectly well, and occasionally things will break. >>> >>> I'd say breakage is the least concern. But in any case I do not >>> agree, and >>> unless we can come to some solution, we will codify that disagreement >>> in a No >>> vote, since that's what it's for, after all. >> > > -- - DML From david.lloyd at redhat.com Tue Aug 6 10:04:50 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 06 Aug 2013 12:04:50 -0500 Subject: Stability of lambda serialization In-Reply-To: <520127BC.9090705@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> <520109E9.4050304@cs.oswego.edu> <520120FD.3040004@redhat.com> <520127BC.9090705@redhat.com> Message-ID: <52012CB2.9050904@redhat.com> On 08/06/2013 11:43 AM, David M. Lloyd wrote: > On 08/06/2013 11:14 AM, David M. Lloyd wrote: >> On 08/06/2013 09:36 AM, Doug Lea wrote: >>> On 08/06/13 09:17, David M. Lloyd wrote: >>> >>>> Me: Reordering captured variables, reordering lambda incidence. The >>>> EG's stance >>>> is just a generalization. It's not a stance in any case: things which >>>> destabilize lambdas in terms of serializability are not a question of >>>> opinion, >>>> and it's bizarre to frame it that way. >>>> >>>>> What to do? EG: make a best effort, with documented caveats; you: >>>>> conservatively prohibit serialization of capturing lambdas; third >>>>> alternative: conservatively detect problems and break at >>>>> deserialization >>>> >>>> I'm OK with either "you" or "third alternative". >>> >>> I'm OK with 3rd alternative if some reasonably efficient >>> checksum/serial id ensuring breakage could be devised. >>> David, any ideas? >> >> It's a good idea. I can think of a few requirements offhand: >> >> * Generation of the hash would necessarily occur at compile time >> * The hash would have to be unique for each lambda within a class and/or >> compilation unit >> * The hash would have to be sensitive to any changes which would cause >> any indeterminism in how the lambda is resolved - this may extend to >> hashing even the bytecode of methods which include the lambda. This is >> the key/most complex concept to tackle to make this solution work. >> * The last step is to tag the UID value on to the serialized lambda >> representation. >> >> I don't think there is much more to it than this; the hardest part is >> determining what/how to hash. If it happens at compile time then >> resolution at run time (i.e. the more performance-sensitive context) >> should be the same kind of numerical comparison which is already done >> for serialVersionUID. > > Brian pointed out a couple things: > > * Such a scheme would have to be very strongly and clearly specified > * The scheme cannot depend on any particular non-spec compiler behavior > (i.e. the same source file should create the same hashes regardless of > compiler version or vendor) > > I suggested as a possible starting point a scheme which could create a > 64-bit hash based on a combination of: > > * Any captured variables' name and declaration order > * The declaration order of the lambda > * Information about the enclosing method: name and signature, maybe decl > order? (though it should be redundant wrt. the lambda decl order) > * The usual serialVersionUID calculation > > I would really appreciate anyone's thoughts as to the efficacy of this > approach and any potential weaknesses; in particular I'd like to hear if > anyone things this is a non-trivial change in terms of compilation and > runtime. > > In particular, it is not 100% clear how the calculation would work with > nested lambdas or lambdas nested in inner classes for example. For runtime it seems to me that this would largely consist of bundling the hash with the method handle information which can be passed to its serialized representation. The deserialization of the lambda could then hopefully just verify the hash against the local method handle and throw an exception if it has changed. -- - DML From brian.goetz at oracle.com Tue Aug 6 12:10:59 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 6 Aug 2013 12:10:59 -0700 Subject: Use of issue tracker Message-ID: There have been some questions raised over the use of the issue tracker hosted at java.net. Let me try and clarify how we've been working. We did not get an issue tracker until Oct of 2012. Without getting into the unfortunate details of why, let's just agree this was unfortunate. At the time, a significant fraction of issues had already been discussed on the EG list, and, despite a brief period of time where we tried to use the issue tracker, changing the way the EG had been working didn't stick. As a result, the issue tracker is kind of abandoned, and, while still contains valid information, offers a non-representative snapshot of a small portion of the EG communication. Since that time, the mailing list has been public and archived, and the mailing list represents the official record of features proposed, discussed, accepted, or rejected. We've attempted to periodically capture "what's left" and "loose ends" to make up for the weaknesses of this approach to working. While this is less than ideal, we feel that this captures the spirit of the JCP 2.8 process, providing a permanent, public record of proposals and discussions of open issues. We also have a -comments list. The purpose of this list, which is post-only, is to provide a means to say "I have a concern about X." This list is also archived, and concerns raised there will be cataloged before the EG winds up its business. Because it is dramatically lower traffic, those seeking to 'read a comment into the record' may wish to use that mechanism. From dl at cs.oswego.edu Tue Aug 6 12:35:21 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 06 Aug 2013 15:35:21 -0400 Subject: Stability of lambda serialization In-Reply-To: <52012CB2.9050904@redhat.com> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> <520109E9.4050304@cs.oswego.edu> <520120FD.3040004@redhat.com> <520127BC.9090705@redhat.com> <52012CB2.9050904@redhat.com> Message-ID: <52014FF9.9050503@cs.oswego.edu> On 08/06/13 13:04, David M. Lloyd wrote: > On 08/06/2013 11:43 AM, David M. Lloyd wrote: >> On 08/06/2013 11:14 AM, David M. Lloyd wrote: >>> On 08/06/2013 09:36 AM, Doug Lea wrote: >>>> On 08/06/13 09:17, David M. Lloyd wrote: >>>> >>>>> Me: Reordering captured variables, reordering lambda incidence. The >>>>> EG's stance >>>>> is just a generalization. It's not a stance in any case: things which >>>>> destabilize lambdas in terms of serializability are not a question of >>>>> opinion, >>>>> and it's bizarre to frame it that way. >>>>> >>>>>> What to do? EG: make a best effort, with documented caveats; you: >>>>>> conservatively prohibit serialization of capturing lambdas; third >>>>>> alternative: conservatively detect problems and break at >>>>>> deserialization >>>>> >>>>> I'm OK with either "you" or "third alternative". >>>> >>>> I'm OK with 3rd alternative if some reasonably efficient >>>> checksum/serial id ensuring breakage could be devised. >>>> David, any ideas? >>> >>> It's a good idea. I can think of a few requirements offhand: >>> >>> * Generation of the hash would necessarily occur at compile time >>> * The hash would have to be unique for each lambda within a class and/or >>> compilation unit >>> * The hash would have to be sensitive to any changes which would cause >>> any indeterminism in how the lambda is resolved - this may extend to >>> hashing even the bytecode of methods which include the lambda. This is >>> the key/most complex concept to tackle to make this solution work. >>> * The last step is to tag the UID value on to the serialized lambda >>> representation. >>> >>> I don't think there is much more to it than this; the hardest part is >>> determining what/how to hash. If it happens at compile time then >>> resolution at run time (i.e. the more performance-sensitive context) >>> should be the same kind of numerical comparison which is already done >>> for serialVersionUID. >> >> Brian pointed out a couple things: >> >> * Such a scheme would have to be very strongly and clearly specified >> * The scheme cannot depend on any particular non-spec compiler behavior >> (i.e. the same source file should create the same hashes regardless of >> compiler version or vendor) >> >> I suggested as a possible starting point a scheme which could create a >> 64-bit hash based on a combination of: How about just a hash of its actual string representation, plus its context (enclosing method etc). A little crazy but among the few simple and feasible ones I can think of. It means you blow up if you add a space. Fine: If you are going to draw the line somewhere, it might as well be here. Although at this point you wonder, why bother serializing. Just pass the string and invoke a compiler to parse... Probably not a lot slower. -Doug >> >> * Any captured variables' name and declaration order >> * The declaration order of the lambda >> * Information about the enclosing method: name and signature, maybe decl >> order? (though it should be redundant wrt. the lambda decl order) >> * The usual serialVersionUID calculation >> >> I would really appreciate anyone's thoughts as to the efficacy of this >> approach and any potential weaknesses; in particular I'd like to hear if >> anyone things this is a non-trivial change in terms of compilation and >> runtime. >> >> In particular, it is not 100% clear how the calculation would work with >> nested lambdas or lambdas nested in inner classes for example. > > For runtime it seems to me that this would largely consist of bundling the hash > with the method handle information which can be passed to its serialized > representation. The deserialization of the lambda could then hopefully just > verify the hash against the local method handle and throw an exception if it has > changed. From david.lloyd at redhat.com Tue Aug 6 13:06:16 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 06 Aug 2013 15:06:16 -0500 Subject: Stability of lambda serialization In-Reply-To: <52014FF9.9050503@cs.oswego.edu> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> <520109E9.4050304@cs.oswego.edu> <520120FD.3040004@redhat.com> <520127BC.9090705@redhat.com> <52012CB2.9050904@redhat.com> <52014FF9.9050503@cs.oswego.edu> Message-ID: <52015738.90201@redhat.com> On 08/06/2013 02:35 PM, Doug Lea wrote: > On 08/06/13 13:04, David M. Lloyd wrote: >> On 08/06/2013 11:43 AM, David M. Lloyd wrote: >>> On 08/06/2013 11:14 AM, David M. Lloyd wrote: >>>> On 08/06/2013 09:36 AM, Doug Lea wrote: >>>>> On 08/06/13 09:17, David M. Lloyd wrote: >>>>> >>>>>> Me: Reordering captured variables, reordering lambda incidence. The >>>>>> EG's stance >>>>>> is just a generalization. It's not a stance in any case: things >>>>>> which >>>>>> destabilize lambdas in terms of serializability are not a question of >>>>>> opinion, >>>>>> and it's bizarre to frame it that way. >>>>>> >>>>>>> What to do? EG: make a best effort, with documented caveats; you: >>>>>>> conservatively prohibit serialization of capturing lambdas; third >>>>>>> alternative: conservatively detect problems and break at >>>>>>> deserialization >>>>>> >>>>>> I'm OK with either "you" or "third alternative". >>>>> >>>>> I'm OK with 3rd alternative if some reasonably efficient >>>>> checksum/serial id ensuring breakage could be devised. >>>>> David, any ideas? >>>> >>>> It's a good idea. I can think of a few requirements offhand: >>>> >>>> * Generation of the hash would necessarily occur at compile time >>>> * The hash would have to be unique for each lambda within a class >>>> and/or >>>> compilation unit >>>> * The hash would have to be sensitive to any changes which would cause >>>> any indeterminism in how the lambda is resolved - this may extend to >>>> hashing even the bytecode of methods which include the lambda. This is >>>> the key/most complex concept to tackle to make this solution work. >>>> * The last step is to tag the UID value on to the serialized lambda >>>> representation. >>>> >>>> I don't think there is much more to it than this; the hardest part is >>>> determining what/how to hash. If it happens at compile time then >>>> resolution at run time (i.e. the more performance-sensitive context) >>>> should be the same kind of numerical comparison which is already done >>>> for serialVersionUID. >>> >>> Brian pointed out a couple things: >>> >>> * Such a scheme would have to be very strongly and clearly specified >>> * The scheme cannot depend on any particular non-spec compiler behavior >>> (i.e. the same source file should create the same hashes regardless of >>> compiler version or vendor) >>> >>> I suggested as a possible starting point a scheme which could create a >>> 64-bit hash based on a combination of: > > How about just a hash of its actual string representation, > plus its context (enclosing method etc). A little crazy but among the > few simple and feasible ones I can think of. It means you blow up > if you add a space. Fine: If you are going to draw the line somewhere, > it might as well be here. That actually seems like a pretty reasonable and simple approach (though I'd say "original byte representation" given that transcoding might be lossy and things might get weird as a result). Add in to the context the declaration order sequence number and captured var names in order. > Although at this point you wonder, why bother serializing. > Just pass the string and invoke a compiler to parse... Probably > not a lot slower. Well bear in mind that we're talking about a simple numerical comparison - all the hashing would (should) be done at compile time, not at run time. Compiling on deserialize (while a nifty/intriguing idea, all practical concerns aside) will definitely be much slower. -- - DML From brian.goetz at oracle.com Tue Aug 6 13:23:37 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 6 Aug 2013 13:23:37 -0700 Subject: Stability of lambda serialization In-Reply-To: <52014FF9.9050503@cs.oswego.edu> References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> <5200F781.8080501@redhat.com> <520109E9.4050304@cs.oswego.edu> <520120FD.3040004@redhat.com> <520127BC.9090705@redhat.com> <52012CB2.9050904@redhat.com> <52014FF9.9050503@cs.oswego.edu> Message-ID: Can't just be the representation of the lambda; needs to fold in the enclosing context as well. Otherwise: foo() { String x = ...; bar(() -> x.length()); } to foo() { File x = ...; bar(() -> x.length()); } will fool the hash. On Aug 6, 2013, at 12:35 PM, Doug Lea wrote: > On 08/06/13 13:04, David M. Lloyd wrote: >> On 08/06/2013 11:43 AM, David M. Lloyd wrote: >>> On 08/06/2013 11:14 AM, David M. Lloyd wrote: >>>> On 08/06/2013 09:36 AM, Doug Lea wrote: >>>>> On 08/06/13 09:17, David M. Lloyd wrote: >>>>> >>>>>> Me: Reordering captured variables, reordering lambda incidence. The >>>>>> EG's stance >>>>>> is just a generalization. It's not a stance in any case: things which >>>>>> destabilize lambdas in terms of serializability are not a question of >>>>>> opinion, >>>>>> and it's bizarre to frame it that way. >>>>>> >>>>>>> What to do? EG: make a best effort, with documented caveats; you: >>>>>>> conservatively prohibit serialization of capturing lambdas; third >>>>>>> alternative: conservatively detect problems and break at >>>>>>> deserialization >>>>>> >>>>>> I'm OK with either "you" or "third alternative". >>>>> >>>>> I'm OK with 3rd alternative if some reasonably efficient >>>>> checksum/serial id ensuring breakage could be devised. >>>>> David, any ideas? >>>> >>>> It's a good idea. I can think of a few requirements offhand: >>>> >>>> * Generation of the hash would necessarily occur at compile time >>>> * The hash would have to be unique for each lambda within a class and/or >>>> compilation unit >>>> * The hash would have to be sensitive to any changes which would cause >>>> any indeterminism in how the lambda is resolved - this may extend to >>>> hashing even the bytecode of methods which include the lambda. This is >>>> the key/most complex concept to tackle to make this solution work. >>>> * The last step is to tag the UID value on to the serialized lambda >>>> representation. >>>> >>>> I don't think there is much more to it than this; the hardest part is >>>> determining what/how to hash. If it happens at compile time then >>>> resolution at run time (i.e. the more performance-sensitive context) >>>> should be the same kind of numerical comparison which is already done >>>> for serialVersionUID. >>> >>> Brian pointed out a couple things: >>> >>> * Such a scheme would have to be very strongly and clearly specified >>> * The scheme cannot depend on any particular non-spec compiler behavior >>> (i.e. the same source file should create the same hashes regardless of >>> compiler version or vendor) >>> >>> I suggested as a possible starting point a scheme which could create a >>> 64-bit hash based on a combination of: > > How about just a hash of its actual string representation, > plus its context (enclosing method etc). A little crazy but among the > few simple and feasible ones I can think of. It means you blow up > if you add a space. Fine: If you are going to draw the line somewhere, > it might as well be here. > > Although at this point you wonder, why bother serializing. > Just pass the string and invoke a compiler to parse... Probably > not a lot slower. > > -Doug > > > > > >>> >>> * Any captured variables' name and declaration order >>> * The declaration order of the lambda >>> * Information about the enclosing method: name and signature, maybe decl >>> order? (though it should be redundant wrt. the lambda decl order) >>> * The usual serialVersionUID calculation >>> >>> I would really appreciate anyone's thoughts as to the efficacy of this >>> approach and any potential weaknesses; in particular I'd like to hear if >>> anyone things this is a non-trivial change in terms of compilation and >>> runtime. >>> >>> In particular, it is not 100% clear how the calculation would work with >>> nested lambdas or lambdas nested in inner classes for example. >> >> For runtime it seems to me that this would largely consist of bundling the hash >> with the method handle information which can be passed to its serialized >> representation. The deserialization of the lambda could then hopefully just >> verify the hash against the local method handle and throw an exception if it has >> changed. > From andrey.breslav at jetbrains.com Tue Aug 6 21:14:23 2013 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Wed, 7 Aug 2013 08:14:23 +0400 Subject: Stability of lambda serialization In-Reply-To: References: <572125674.885842.1374600926640.JavaMail.root@redhat.com> <4FE8395D-2CDA-44A8-95F3-884F04899569@oracle.com> <206041618.5397062.1375726154541.JavaMail.root@redhat.com> <51FFF82C.50608@univ-mlv.fr> <520023EA.80906@redhat.com> Message-ID: <1E24D531-D428-4DEC-8BF6-F613C7CB683D@jetbrains.com> > Named lambdas were included on the list of issues gathered at the recent meeting to consider for the next round. Indeed, named lambdas would address this problem once and for all -- and we all knew this when it was raised the previous several times it came up. We left room to pursue them later, but for the time being, the EG felt that the syntactic overhead did not pull its weight. The ordering issue will remain even if lambdas were named. Naming lambdas only makes the class name stable, but not the order, names of meaning of the captured fields. So, the initial issue discussed in this thread seems to be orthogonal to named lambdas. -- Andrey Breslav http://jetbrains.com Develop with pleasure! On Aug 6, 2013, at 03:52 , Brian Goetz wrote: > I understand your objection. This was a difficult issue, and one which occupied a frustratingly large amount of time for the expert group. This is an area where there are no good solutions, just tradeoffs between different levels of user surprise, stability, and complexity. Different parties may have different preferences due to their own subjective ordering of the undesirability of different failure modes, and therefore may reach different conclusions about which solutions are more or less desirable. The consensus of the EG has been -- with the sole exception of Red Hat -- that the current target is the least bad of the options proposed, and that's what we chose to pursue. > > Named lambdas were included on the list of issues gathered at the recent meeting to consider for the next round. Indeed, named lambdas would address this problem once and for all -- and we all knew this when it was raised the previous several times it came up. We left room to pursue them later, but for the time being, the EG felt that the syntactic overhead did not pull its weight. > > The idea of restricting lambda serialization to non-capturing lambdas gained zero traction when it was proposed, in part because it makes things more confusing for the user, who now has to reason about which lambdas are capturing (its not like C++, where capturing lambdas are syntactically distinct from non-capturing ones). > > It is unfortunate that Red Hat sees fit to threaten a No vote because an extensively discussed decision went in a direction they did not prefer. But the reality is, this issue *was* extensively discussed, Red Hat had many opportunities to attempt to win other EG members over to their preference, and the EG still -- after reopening the issue multiple times -- preferred the "weakly serializable" guarantees offered by the current model. > > On Aug 5, 2013, at 3:15 PM, David M. Lloyd wrote: > >> This was and is a bad decision. Some things serialize better than others. It makes far more sense to attempt to serialize as well as the things which serialize in the best possible way (i.e. fewest known issues). By simply requiring stable names for all serializable state, you achieve the best possible result. >> >> Justifying one bad solution because of another existing bad solution is bad engineering, and results in perpetuation of bad software. There's absolutely no reason to settle for this. There are simply no benefits to it other than the emotional comfort of being similar to something that exists, which is a foolish benchmark. If anonymous inner classes were not implemented in this manner, why would that make a difference to the way lambdas are serialized? >> >> The password example is deliberately chosen to be a simple expression of the potential problem. I imagine much more sophisticated problems resulting from modified sort order, information exposure from inadvertent capture, and so on. It's so much simpler to simply forbid serializing capturing lambdas, and much safer besides. On the other hand, I don't think we have really captured any compelling use cases for supporting serializable, capturing lambdas other than a lot of hand-wavy kinds of things. >> >> On 08/05/2013 02:08 PM, Remi Forax wrote: >>> You really want to serialize a password :) Funny idea. >>> >>> Anyway, we all know that serialization is broken, in many ways. >>> Usually, the only scenario that works is when the same code is used by >>> on both side of the wires, >>> trying to do something different is like playing the Russian roulette. >>> It may works for some time. >>> >>> If you change the code of an inner classes, you will have the same issue >>> as the one you describe >>> with lambdas. You may argue that lambdas should be better than inner >>> classes, >>> but given that even if we find a way to make the serialization stable to >>> this kind of change >>> there are still a lot of changes, like by example switching two lambdas >>> in the same method, >>> that makes the serialization not stable. >>> >>> So the EG had decided that serialization of lambdas should be not better >>> and not worst >>> than serialization of inner classes. >>> >>> R?mi >>> >>> >>> On 08/05/2013 08:09 PM, Scott Stark wrote: >>>> So Red Hat will go on the record to state that if serialization of >>>> lambdas is beyond the scope that DML suggested, we will vote no on the >>>> JSR. As a concluding remark on issue, consider an alternate form of >>>> the example given where the user and password variable types do not >>>> differ. In this situation, the reordering results in the password seen >>>> as the user. I can see that this will be the subject of yet another >>>> CVE somewhere down the road. >>>> >>>> >>>> Running testReadLambda after reorder... >>>> -> User secret1password <- -> Password bob <- >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: "Paul Benedict" >>>> To: lambda-spec-observers at openjdk.java.net >>>> Cc: "Scott Stark" , >>>> lambda-spec-experts at openjdk.java.net >>>> Sent: Saturday, August 3, 2013 4:54:58 PM >>>> Subject: Re: Stability of lambda serialization >>>> >>>> I imagine this will mostly impact EE applications in a very negative way. >>>> It's one thing if my method signatures change, or I explicitly add some >>>> fields, but now adding/removing a local lambda will do that? >>>> Serialization >>>> errors are the worst thing in EE -- and this kind of just heaps on more >>>> misery. >>>> >>>> Paul >>>> >>>> >>>> On Sat, Aug 3, 2013 at 2:00 AM, Brian Goetz >>>> wrote: >>>> >>>>> This was discussed in the meeting after JVMLS this week. The consensus >>>>> was that, while *this particular* issue could be addressed, there is an >>>>> infinite spectrum of similar issues that cannot be addressed, and >>>>> that it >>>>> is preferable to draw a clean line about what the user can expect in >>>>> terms >>>>> of code changes destabilizing lambdas. >>>>> >>>>> That line is: If the code inside a method, or the method's signature, >>>>> changes *in any way*, lambdas captured in that method should be >>>>> considered >>>>> destabilized. Changes to other methods, or changes to the order of >>>>> methods, do not affect lambdas in an unchanged method. >>>>> >>>>> On Jul 23, 2013, at 10:35 AM, Scott Stark wrote: >>>>> >>>>>> Red Hat has a concern regarding how fragile the default serialization >>>>> behavior of lambda expressions is in the current reference >>>>> implementation, >>>>> currently: >>>>>> ironmaiden:OpenJDK starksm$ >>>>> /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java >>>>> -version >>>>>> java version "1.8.0-ea" >>>>>> Java(TM) SE Runtime Environment (build 1.8.0-ea-b98) >>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b40, mixed mode) >>>>>> >>>>>> The problem is that the serialized form of a lambda expression depends >>>>> on the order in which captured arguments are declared. The attached >>>>> simple >>>>> example demonstrates how easy it is for a trivial reordering of the >>>>> lambda >>>>> code block to result in an inability to deserialize a previously saved >>>>> expression. >>>>>> To produce this exception: >>>>>> 1. Run the serialization.AuthenticationContext.testWriteLambda method >>>>> with the lambda expression written as: >>>>>> Authenticator a = (Authenticator & Serializable) (String >>>>>> principal, >>>>> char[] pass) -> { >>>>>> // Run with p declared first when writing out the >>>>> /tmp/testWriteLambda.bin, then switch >>>>>> // to declare u first when running testReadLambda >>>>>> String p = "-> Password " + password + " <-"; >>>>>> String u = "-> User " + user + " <-"; >>>>>> return u + " " + p; >>>>>> }; >>>>>> 2. Change the lambda expression to: >>>>>> Authenticator a = (Authenticator & Serializable) (String >>>>>> principal, >>>>> char[] pass) -> { >>>>>> // Run with p declared first when writing out the >>>>> /tmp/testWriteLambda.bin, then switch >>>>>> // to declare u first when running testReadLambda >>>>>> String u = "-> User " + user + " <-"; >>>>>> String p = "-> Password " + password + " <-"; >>>>>> return u + " " + p; >>>>>> }; >>>>>> >>>>>> Recompile and run serialization.AuthenticationContext.testReadLambda to >>>>> produce: >>>>>> java.io.IOException: unexpected exception type >>>>>> at >>>>> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) >>>>> >>>>>> at >>>>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1110) >>>>>> at >>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807) >>>>> >>>>>> at >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >>>>>> at >>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>>>>> at >>>>> serialization.AuthenticationContext.testReadLambda(AuthenticationContext.java:34) >>>>> >>>>>> ... >>>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>> >>>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> >>>>>> at >>>>> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:222) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>> >>>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> >>>>>> at >>>>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104) >>>>>> ... 30 more >>>>>> Caused by: java.lang.IllegalArgumentException: Invalid lambda >>>>> deserialization >>>>>> at >>>>> serialization.AuthenticationContext.$deserializeLambda$(AuthenticationContext.java:1) >>>>> >>>>>> ... 40 more >>>>>> >>>>>> One does not see the same level of sensitivity to the ordering of the >>>>> serialization fields in a POJO as demonstrated by the >>>>> serialization.AuthenticationContext.testWritePOJO/testReadPOJO cases >>>>> where >>>>> one can reorder the TestPOJO.{user,password} fields without having >>>>> serialization fail. >>>>>> We would like to see at least that level of stability of the serialized >>>>> form of lambda expressions. >>>>>> >>>>> >>>> >>> >> >> >> -- >> - DML > From daniel.smith at oracle.com Thu Aug 8 17:19:30 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 8 Aug 2013 18:19:30 -0600 Subject: Overload resolution simplification Message-ID: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. Benefits of this approach: - Very easy to understand -- it's mostly a syntactic distinction - Consistent between all different patterns of overloading that were previously treated differently - Facilitates a simple declaration-site warning check when method signatures conflict - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas - Avoids re-checking lambdas with different parameter types which means: -- Typing of lambda bodies is easier for users to process -- Implementations don't have to do speculative checking of arbitrary blocks of code -- Bad theoretical complexity goes away We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). Any questions/concerns? --- Here's an example of something we would stop disambiguating: interface I { R map(Function f); int map(ToIntFunction f); long map(ToLongFunction f); double map(ToDoubleFunction f); } someIofString.map(s -> s.length()); Declaration-site workaround: rename the methods. Use-site workaround: explicit parameter type: someIofString.map((String s) -> s.length()); --- Here's an example of something else we would stop disambiguating: static void m(Function f); static void m(ToIntFunction f); m(x -> x.length() > 10 ? 5 : 10); --- And here's something that we never could disambiguate in the first place (due to fundamental design constraints): interface Comparators { > Comparator comparing(Function f); Comparator comparing(ToIntFunction f); } Comparator cs = Comparators.comparing(s -> -s.length()); --- ?Dan From brian.goetz at oracle.com Fri Aug 9 09:14:04 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 09 Aug 2013 12:14:04 -0400 Subject: Overload resolution simplification In-Reply-To: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> Message-ID: <5205154C.7080209@oracle.com> Note that in almost all the cases where we relied on fancy overload selection, we one-by-one accepted that it was better to mangle the names (flatMapToInt) anyway. The number of remaining cases is small, and the sensible cases that fall in the space between what was allowed with the more complex scheme and what is prohibited under this scheme is surprisingly small. Which suggests the return-on-complexity here for the more complex scheme is poor. We were all a bit surprised that the new scheme works, but indeed it does seem to work pretty well. In hindsight, this should not be surprising. Type inference and overloading are ever in opposition; many languages with type inference from day 1 prohibit overloading (except maybe on arity.) So for constructs like implicit lambdas, which require inference, it seems reasonable to give up something in overloading power to increase the range of cases where implicit lambdas can be used. On 8/8/2013 8:19 PM, Dan Smith wrote: > We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. > > The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. > > A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. > > So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: > > Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. > > Benefits of this approach: > - Very easy to understand -- it's mostly a syntactic distinction > - Consistent between all different patterns of overloading that were previously treated differently > - Facilitates a simple declaration-site warning check when method signatures conflict > - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas > - Avoids re-checking lambdas with different parameter types which means: > -- Typing of lambda bodies is easier for users to process > -- Implementations don't have to do speculative checking of arbitrary blocks of code > -- Bad theoretical complexity goes away > > We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). > > Any questions/concerns? > > --- > > Here's an example of something we would stop disambiguating: > > interface I { > R map(Function f); > int map(ToIntFunction f); > long map(ToLongFunction f); > double map(ToDoubleFunction f); > } > > someIofString.map(s -> s.length()); > > Declaration-site workaround: rename the methods. > > Use-site workaround: explicit parameter type: > someIofString.map((String s) -> s.length()); > > --- > > Here's an example of something else we would stop disambiguating: > > static void m(Function f); > static void m(ToIntFunction f); > > m(x -> x.length() > 10 ? 5 : 10); > > --- > > And here's something that we never could disambiguate in the first place (due to fundamental design constraints): > > interface Comparators { > > Comparator comparing(Function f); > Comparator comparing(ToIntFunction f); > } > > Comparator cs = Comparators.comparing(s -> -s.length()); > > --- > > ?Dan > From spullara at gmail.com Fri Aug 9 11:55:12 2013 From: spullara at gmail.com (Sam Pullara) Date: Fri, 9 Aug 2013 11:55:12 -0700 Subject: Overload resolution simplification In-Reply-To: <5205154C.7080209@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> Message-ID: <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> Is there anything we can do about: ExecutorService.submit(object::voidReturningMethod) being ambiguous? Sam On Aug 9, 2013, at 9:14 AM, Brian Goetz wrote: > Note that in almost all the cases where we relied on fancy overload selection, we one-by-one accepted that it was better to mangle the names (flatMapToInt) anyway. The number of remaining cases is small, and the sensible cases that fall in the space between what was allowed with the more complex scheme and what is prohibited under this scheme is surprisingly small. Which suggests the return-on-complexity here for the more complex scheme is poor. > > We were all a bit surprised that the new scheme works, but indeed it does seem to work pretty well. > > In hindsight, this should not be surprising. Type inference and overloading are ever in opposition; many languages with type inference from day 1 prohibit overloading (except maybe on arity.) So for constructs like implicit lambdas, which require inference, it seems reasonable to give up something in overloading power to increase the range of cases where implicit lambdas can be used. > > On 8/8/2013 8:19 PM, Dan Smith wrote: >> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. >> >> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. >> >> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. >> >> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: >> >> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. >> >> Benefits of this approach: >> - Very easy to understand -- it's mostly a syntactic distinction >> - Consistent between all different patterns of overloading that were previously treated differently >> - Facilitates a simple declaration-site warning check when method signatures conflict >> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas >> - Avoids re-checking lambdas with different parameter types which means: >> -- Typing of lambda bodies is easier for users to process >> -- Implementations don't have to do speculative checking of arbitrary blocks of code >> -- Bad theoretical complexity goes away >> >> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). >> >> Any questions/concerns? >> >> --- >> >> Here's an example of something we would stop disambiguating: >> >> interface I { >> R map(Function f); >> int map(ToIntFunction f); >> long map(ToLongFunction f); >> double map(ToDoubleFunction f); >> } >> >> someIofString.map(s -> s.length()); >> >> Declaration-site workaround: rename the methods. >> >> Use-site workaround: explicit parameter type: >> someIofString.map((String s) -> s.length()); >> >> --- >> >> Here's an example of something else we would stop disambiguating: >> >> static void m(Function f); >> static void m(ToIntFunction f); >> >> m(x -> x.length() > 10 ? 5 : 10); >> >> --- >> >> And here's something that we never could disambiguate in the first place (due to fundamental design constraints): >> >> interface Comparators { >> > Comparator comparing(Function f); >> Comparator comparing(ToIntFunction f); >> } >> >> Comparator cs = Comparators.comparing(s -> -s.length()); >> >> --- >> >> ?Dan >> From daniel.smith at oracle.com Fri Aug 9 13:16:18 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 9 Aug 2013 14:16:18 -0600 Subject: Overload resolution simplification In-Reply-To: <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> Message-ID: <0B3ACA62-1CF1-4C0B-8530-F26ECE662C56@oracle.com> If 'voidReturningMethod' is not overloaded, you're okay. (That's my clause about "will ignore overloaded method references".) Also, for Runnable vs. Callable, you're okay with a lambda expression, because 0-ary lambdas are always explicit (they have no parameter types to infer). If we had a case of something like FileFilter vs. Consumer, then an implicit lambda wouldn't always work. Possible we'll special-case a void-/value-returning block body, considering that part of the "shape" or "arity". But an expression lambda could be interpreted either way, depending on typing, so would be ambiguous. ?Dan On Aug 9, 2013, at 12:55 PM, Sam Pullara wrote: > Is there anything we can do about: > > ExecutorService.submit(object::voidReturningMethod) > > being ambiguous? > > Sam > > On Aug 9, 2013, at 9:14 AM, Brian Goetz wrote: > >> Note that in almost all the cases where we relied on fancy overload selection, we one-by-one accepted that it was better to mangle the names (flatMapToInt) anyway. The number of remaining cases is small, and the sensible cases that fall in the space between what was allowed with the more complex scheme and what is prohibited under this scheme is surprisingly small. Which suggests the return-on-complexity here for the more complex scheme is poor. >> >> We were all a bit surprised that the new scheme works, but indeed it does seem to work pretty well. >> >> In hindsight, this should not be surprising. Type inference and overloading are ever in opposition; many languages with type inference from day 1 prohibit overloading (except maybe on arity.) So for constructs like implicit lambdas, which require inference, it seems reasonable to give up something in overloading power to increase the range of cases where implicit lambdas can be used. >> >> On 8/8/2013 8:19 PM, Dan Smith wrote: >>> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. >>> >>> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. >>> >>> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. >>> >>> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: >>> >>> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. >>> >>> Benefits of this approach: >>> - Very easy to understand -- it's mostly a syntactic distinction >>> - Consistent between all different patterns of overloading that were previously treated differently >>> - Facilitates a simple declaration-site warning check when method signatures conflict >>> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas >>> - Avoids re-checking lambdas with different parameter types which means: >>> -- Typing of lambda bodies is easier for users to process >>> -- Implementations don't have to do speculative checking of arbitrary blocks of code >>> -- Bad theoretical complexity goes away >>> >>> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). >>> >>> Any questions/concerns? >>> >>> --- >>> >>> Here's an example of something we would stop disambiguating: >>> >>> interface I { >>> R map(Function f); >>> int map(ToIntFunction f); >>> long map(ToLongFunction f); >>> double map(ToDoubleFunction f); >>> } >>> >>> someIofString.map(s -> s.length()); >>> >>> Declaration-site workaround: rename the methods. >>> >>> Use-site workaround: explicit parameter type: >>> someIofString.map((String s) -> s.length()); >>> >>> --- >>> >>> Here's an example of something else we would stop disambiguating: >>> >>> static void m(Function f); >>> static void m(ToIntFunction f); >>> >>> m(x -> x.length() > 10 ? 5 : 10); >>> >>> --- >>> >>> And here's something that we never could disambiguate in the first place (due to fundamental design constraints): >>> >>> interface Comparators { >>> > Comparator comparing(Function f); >>> Comparator comparing(ToIntFunction f); >>> } >>> >>> Comparator cs = Comparators.comparing(s -> -s.length()); >>> >>> --- >>> >>> ?Dan >>> > From forax at univ-mlv.fr Fri Aug 9 13:21:30 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 09 Aug 2013 22:21:30 +0200 Subject: Overload resolution simplification In-Reply-To: <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> Message-ID: <52054F4A.8030907@univ-mlv.fr> On 08/09/2013 08:55 PM, Sam Pullara wrote: > Is there anything we can do about: > > ExecutorService.submit(object::voidReturningMethod) > > being ambiguous? > > Sam yes, not being able to disambiguate between a lambda returning void a lambda returning a value is a big hurdle, I have the very same issue in my code. Also I've a nice parsing framework that use type specialised lambda to avoid boxing that doesn't compile anymore. public IntStream parse(BufferedReader reader, ToIntFunction fun) { ... } public LongStream parse(BufferedReader reader, ToLongFunction fun) { ... } when called like this: parse(Integer::parseInt). R?mi > > On Aug 9, 2013, at 9:14 AM, Brian Goetz wrote: > >> Note that in almost all the cases where we relied on fancy overload selection, we one-by-one accepted that it was better to mangle the names (flatMapToInt) anyway. The number of remaining cases is small, and the sensible cases that fall in the space between what was allowed with the more complex scheme and what is prohibited under this scheme is surprisingly small. Which suggests the return-on-complexity here for the more complex scheme is poor. >> >> We were all a bit surprised that the new scheme works, but indeed it does seem to work pretty well. >> >> In hindsight, this should not be surprising. Type inference and overloading are ever in opposition; many languages with type inference from day 1 prohibit overloading (except maybe on arity.) So for constructs like implicit lambdas, which require inference, it seems reasonable to give up something in overloading power to increase the range of cases where implicit lambdas can be used. >> >> On 8/8/2013 8:19 PM, Dan Smith wrote: >>> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. >>> >>> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. >>> >>> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. >>> >>> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: >>> >>> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. >>> >>> Benefits of this approach: >>> - Very easy to understand -- it's mostly a syntactic distinction >>> - Consistent between all different patterns of overloading that were previously treated differently >>> - Facilitates a simple declaration-site warning check when method signatures conflict >>> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas >>> - Avoids re-checking lambdas with different parameter types which means: >>> -- Typing of lambda bodies is easier for users to process >>> -- Implementations don't have to do speculative checking of arbitrary blocks of code >>> -- Bad theoretical complexity goes away >>> >>> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). >>> >>> Any questions/concerns? >>> >>> --- >>> >>> Here's an example of something we would stop disambiguating: >>> >>> interface I { >>> R map(Function f); >>> int map(ToIntFunction f); >>> long map(ToLongFunction f); >>> double map(ToDoubleFunction f); >>> } >>> >>> someIofString.map(s -> s.length()); >>> >>> Declaration-site workaround: rename the methods. >>> >>> Use-site workaround: explicit parameter type: >>> someIofString.map((String s) -> s.length()); >>> >>> --- >>> >>> Here's an example of something else we would stop disambiguating: >>> >>> static void m(Function f); >>> static void m(ToIntFunction f); >>> >>> m(x -> x.length() > 10 ? 5 : 10); >>> >>> --- >>> >>> And here's something that we never could disambiguate in the first place (due to fundamental design constraints): >>> >>> interface Comparators { >>> > Comparator comparing(Function f); >>> Comparator comparing(ToIntFunction f); >>> } >>> >>> Comparator cs = Comparators.comparing(s -> -s.length()); >>> >>> --- >>> >>> ?Dan >>> From daniel.smith at oracle.com Fri Aug 9 16:39:44 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 9 Aug 2013 17:39:44 -0600 Subject: Overload resolution simplification In-Reply-To: <52054F4A.8030907@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> Message-ID: <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> On Aug 9, 2013, at 2:21 PM, Remi Forax wrote: > Also I've a nice parsing framework that use type specialised lambda to avoid boxing that doesn't compile anymore. > > public IntStream parse(BufferedReader reader, ToIntFunction fun) { ... } > public LongStream parse(BufferedReader reader, ToLongFunction fun) { ... } > > when called like this: parse(Integer::parseInt). Thanks for the use case. The 'parse' method is essentially the same shape as the 'map' method that was discussed by the EG quite a bit, with the eventual conclusion that it would be clearer to give each method a different name (parseInts, parseLongs, etc.). http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-February/001417.html http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001441.html http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001458.html Doesn't mean that all other developers must follow our lead, but the fact that the EG tried it and then concluded that it didn't want overloading here is a strong argument that this is potentially a bad convention to follow. If somebody likes this convention anyway, then we made a special-case effort to support method references. Unfortunately, Integer::parseInt is overloaded and so outside of the set of supported method references. As I mentioned in the EG meeting, by drawing the line like this, it's great when it works, and annoying when it doesn't and you fall off of a cliff. We considered using arity (e.g., "is this overloaded with arity 1?"), but that just moves the line, rather than solving the problem. So, I don't love the cliff, but I don't have a good alternative, other than just not having any special treatment at all. ?Dan From andrey.breslav at jetbrains.com Sat Aug 10 00:07:55 2013 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Sat, 10 Aug 2013 11:07:55 +0400 Subject: Overload resolution simplification In-Reply-To: <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> Message-ID: The case of overloaded method references worries me as well (lambdas are ok). Note that C# supports overloaded method references (method groups) as arguments and only as arguments. It seems that inference can disambiguate method references rather well if we stick to what Dan proposes about lambdas, because for a method reference there is no body to check. But maybe I'm missing something. Dan, could you post some problematic examples concerning method references? -- Best regards, Andrey Breslav On Aug 10, 2013, at 3:39 AM, Dan Smith wrote: > On Aug 9, 2013, at 2:21 PM, Remi Forax wrote: > >> Also I've a nice parsing framework that use type specialised lambda to avoid boxing that doesn't compile anymore. >> >> public IntStream parse(BufferedReader reader, ToIntFunction fun) { ... } >> public LongStream parse(BufferedReader reader, ToLongFunction fun) { ... } >> >> when called like this: parse(Integer::parseInt). > > Thanks for the use case. > > The 'parse' method is essentially the same shape as the 'map' method that was discussed by the EG quite a bit, with the eventual conclusion that it would be clearer to give each method a different name (parseInts, parseLongs, etc.). > > http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-February/001417.html > http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001441.html > http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001458.html > > Doesn't mean that all other developers must follow our lead, but the fact that the EG tried it and then concluded that it didn't want overloading here is a strong argument that this is potentially a bad convention to follow. > > If somebody likes this convention anyway, then we made a special-case effort to support method references. Unfortunately, Integer::parseInt is overloaded and so outside of the set of supported method references. As I mentioned in the EG meeting, by drawing the line like this, it's great when it works, and annoying when it doesn't and you fall off of a cliff. We considered using arity (e.g., "is this overloaded with arity 1?"), but that just moves the line, rather than solving the problem. > > So, I don't love the cliff, but I don't have a good alternative, other than just not having any special treatment at all. > > ?Dan From forax at univ-mlv.fr Sat Aug 10 04:36:58 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 10 Aug 2013 13:36:58 +0200 Subject: Overload resolution simplification In-Reply-To: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> Message-ID: <520625DA.3080905@univ-mlv.fr> On 08/09/2013 02:19 AM, Dan Smith wrote: > We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. > > The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. > > A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. > > So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: > > Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. > > Benefits of this approach: > - Very easy to understand -- it's mostly a syntactic distinction > - Consistent between all different patterns of overloading that were previously treated differently > - Facilitates a simple declaration-site warning check when method signatures conflict > - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas > - Avoids re-checking lambdas with different parameter types which means: > -- Typing of lambda bodies is easier for users to process > -- Implementations don't have to do speculative checking of arbitrary blocks of code > -- Bad theoretical complexity goes away > > We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). > > Any questions/concerns? Thinking a bit more about this, it change a lot of assumptions we have used to make decisions in the past, by example do we really want a special syntax for implicit lambda with one parameter if we want to encourage use of explicit lambdas ? R?mi From maurizio.cimadamore at oracle.com Sat Aug 10 06:56:35 2013 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Sat, 10 Aug 2013 14:56:35 +0100 Subject: Overload resolution simplification In-Reply-To: References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> Message-ID: <52064693.7090701@oracle.com> On 10/08/13 08:07, Andrey Breslav wrote: > The case of overloaded method references worries me as well (lambdas are ok). Note that C# supports overloaded method references (method groups) as arguments and only as arguments. It seems that inference can disambiguate method references rather well if we stick to what Dan proposes about lambdas, because for a method reference there is no body to check. But maybe I'm missing something. I believe C# is very different w.r.t. Java when it comes to target-typing and overload resolution - as such C# is not subject to all the issues we have here with 'stuck' expression - i.e. expression such as lambda and/or method references that cannot be looked at by the compiler because some type information is missing and the compiler cannot safely go ahead and instantiate the inference variable that would make it possible for the compiler to go ahead. I think 'comparing' is a good example of what can go wrong; even if we added support for overloaded method references (which we had last week), that API cannot be compiled by passing in a method reference, as the inference variable that is keeping the method reference stuck also appears on the 'comparing' return type. Which is, IMHO, a much more subtle explanation than 'just don't use an overloaded method reference here'. If we could have a scheme that worked in all cases, then I'd be totally in favor of having a more complex scheme. But, because of Java legacy, I don't think such an approach exists here. The only incremental improvement I see viable here, one that has been discussed before, would be to add some logic to detect that all overloaded methods force the same choice on the implicit lambda parameter/overloaded mref; that would be enough to get past Remi example - but it doesn't scale too well to generic methods. Maurizio >> On Aug 9, 2013, at 2:21 PM, Remi Forax wrote: >> >>> Also I've a nice parsing framework that use type specialised lambda to avoid boxing that doesn't compile anymore. >>> >>> public IntStream parse(BufferedReader reader, ToIntFunction fun) { ... } >>> public LongStream parse(BufferedReader reader, ToLongFunction fun) { ... } >>> >>> when called like this: parse(Integer::parseInt). >> Thanks for the use case. >> >> The 'parse' method is essentially the same shape as the 'map' method that was discussed by the EG quite a bit, with the eventual conclusion that it would be clearer to give each method a different name (parseInts, parseLongs, etc.). >> >> http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-February/001417.html >> http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001441.html >> http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001458.html >> >> Doesn't mean that all other developers must follow our lead, but the fact that the EG tried it and then concluded that it didn't want overloading here is a strong argument that this is potentially a bad convention to follow. >> >> If somebody likes this convention anyway, then we made a special-case effort to support method references. Unfortunately, Integer::parseInt is overloaded and so outside of the set of supported method references. As I mentioned in the EG meeting, by drawing the line like this, it's great when it works, and annoying when it doesn't and you fall off of a cliff. We considered using arity (e.g., "is this overloaded with arity 1?"), but that just moves the line, rather than solving the problem. >> >> So, I don't love the cliff, but I don't have a good alternative, other than just not having any special treatment at all. >> >> ?Dan From brian.goetz at oracle.com Sat Aug 10 10:29:27 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 10 Aug 2013 13:29:27 -0400 Subject: Overload resolution simplification In-Reply-To: <520625DA.3080905@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> Message-ID: <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> What makes you think the goal is encouraging explicit lambdas? In the absence of overloading, inference works great. What were proposing here is to have less magic in the interplay of overlaod resolution and inference. Sent from my iPhone On Aug 10, 2013, at 7:36 AM, Remi Forax wrote: > On 08/09/2013 02:19 AM, Dan Smith wrote: >> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. >> >> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. >> >> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. >> >> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: >> >> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. >> >> Benefits of this approach: >> - Very easy to understand -- it's mostly a syntactic distinction >> - Consistent between all different patterns of overloading that were previously treated differently >> - Facilitates a simple declaration-site warning check when method signatures conflict >> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas >> - Avoids re-checking lambdas with different parameter types which means: >> -- Typing of lambda bodies is easier for users to process >> -- Implementations don't have to do speculative checking of arbitrary blocks of code >> -- Bad theoretical complexity goes away >> >> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). >> >> Any questions/concerns? > > Thinking a bit more about this, > it change a lot of assumptions we have used to make decisions in the past, > by example do we really want a special syntax for implicit lambda with one parameter > if we want to encourage use of explicit lambdas ? > > R?mi > From forax at univ-mlv.fr Sat Aug 10 13:13:19 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 10 Aug 2013 22:13:19 +0200 Subject: Overload resolution simplification In-Reply-To: <52064693.7090701@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> <52064693.7090701@oracle.com> Message-ID: <52069EDF.5030008@univ-mlv.fr> On 08/10/2013 03:56 PM, Maurizio Cimadamore wrote: > On 10/08/13 08:07, Andrey Breslav wrote: >> The case of overloaded method references worries me as well (lambdas >> are ok). Note that C# supports overloaded method references (method >> groups) as arguments and only as arguments. It seems that inference >> can disambiguate method references rather well if we stick to what >> Dan proposes about lambdas, because for a method reference there is >> no body to check. But maybe I'm missing something. > I believe C# is very different w.r.t. Java when it comes to > target-typing and overload resolution - as such C# is not subject to > all the issues we have here with 'stuck' expression - i.e. expression > such as lambda and/or method references that cannot be looked at by > the compiler because some type information is missing and the compiler > cannot safely go ahead and instantiate the inference variable that > would make it possible for the compiler to go ahead. Taking about C#, C# is also able to disambiguate if a lambda is an Action (T => void) or a Func (T => U). And, I would prefer that the compiler stops if there is a suck expression instead of giving up too early. > > I think 'comparing' is a good example of what can go wrong; even if we > added support for overloaded method references (which we had last > week), that API cannot be compiled by passing in a method reference, > as the inference variable that is keeping the method reference stuck > also appears on the 'comparing' return type. Which is, IMHO, a much > more subtle explanation than 'just don't use an overloaded method > reference here'. yes, we all agree about that. A the same time, overload methods already exists and there are not the issue. Correct me if I'm wrong but the issue comes when you mix overloads and generics methods, just say don't do that. > > If we could have a scheme that worked in all cases, then I'd be > totally in favor of having a more complex scheme. But, because of Java > legacy, I don't think such an approach exists here. > > The only incremental improvement I see viable here, one that has been > discussed before, would be to add some logic to detect that all > overloaded methods force the same choice on the implicit lambda > parameter/overloaded mref; that would be enough to get past Remi > example - but it doesn't scale too well to generic methods. For implicit lambda, I see no issue if the compiler tries to typecheck the same lambda with different parameters but in that case, the compiler should not try go ahead by creating a new context. Again, it means that if one typechecking of the lambda expression is stuck, the compiler should stop. I prefer this rational because this will also avoid the combinatorial explosion but also allows API designer to use overloads if all of them are not generics. > > Maurizio R?mi > >>> On Aug 9, 2013, at 2:21 PM, Remi Forax wrote: >>> >>>> Also I've a nice parsing framework that use type specialised lambda >>>> to avoid boxing that doesn't compile anymore. >>>> >>>> public IntStream parse(BufferedReader reader, ToIntFunction >>>> fun) { ... } >>>> public LongStream parse(BufferedReader reader, >>>> ToLongFunction fun) { ... } >>>> >>>> when called like this: parse(Integer::parseInt). >>> Thanks for the use case. >>> >>> The 'parse' method is essentially the same shape as the 'map' method >>> that was discussed by the EG quite a bit, with the eventual >>> conclusion that it would be clearer to give each method a different >>> name (parseInts, parseLongs, etc.). >>> >>> http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-February/001417.html >>> >>> http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001441.html >>> >>> http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/2013-March/001458.html >>> >>> >>> Doesn't mean that all other developers must follow our lead, but the >>> fact that the EG tried it and then concluded that it didn't want >>> overloading here is a strong argument that this is potentially a bad >>> convention to follow. >>> >>> If somebody likes this convention anyway, then we made a >>> special-case effort to support method references. Unfortunately, >>> Integer::parseInt is overloaded and so outside of the set of >>> supported method references. As I mentioned in the EG meeting, by >>> drawing the line like this, it's great when it works, and annoying >>> when it doesn't and you fall off of a cliff. We considered using >>> arity (e.g., "is this overloaded with arity 1?"), but that just >>> moves the line, rather than solving the problem. >>> >>> So, I don't love the cliff, but I don't have a good alternative, >>> other than just not having any special treatment at all. >>> >>> ?Dan > From maurizio.cimadamore at oracle.com Sat Aug 10 14:10:13 2013 From: maurizio.cimadamore at oracle.com (maurizio cimadamore) Date: Sat, 10 Aug 2013 22:10:13 +0100 Subject: Overload resolution simplification In-Reply-To: <52069EDF.5030008@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> <52064693.7090701@oracle.com> <52069EDF.5030008@univ-mlv.fr> Message-ID: <5206AC35.8080701@oracle.com> On 10-Aug-13 9:13 PM, Remi Forax wrote: > For implicit lambda, I see no issue if the compiler tries to typecheck > the same lambda with different parameters but in that case, the > compiler should not try go ahead by creating a new context. Again, it > means that if one typechecking of the lambda expression is stuck, the > compiler should stop. I'm not following this. If you type-check a lambda with multiple parameters then you have combinatorial explosion. What am I missing? Maurizio From forax at univ-mlv.fr Sat Aug 10 14:52:03 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 10 Aug 2013 23:52:03 +0200 Subject: Overload resolution simplification In-Reply-To: <5206AC35.8080701@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> <52064693.7090701@oracle.com> <52069EDF.5030008@univ-mlv.fr> <5206AC35.8080701@oracle.com> Message-ID: <5206B603.60309@univ-mlv.fr> On 08/10/2013 11:10 PM, maurizio cimadamore wrote: > On 10-Aug-13 9:13 PM, Remi Forax wrote: >> For implicit lambda, I see no issue if the compiler tries to >> typecheck the same lambda with different parameters but in that case, >> the compiler should not try go ahead by creating a new context. >> Again, it means that if one typechecking of the lambda expression is >> stuck, the compiler should stop. > I'm not following this. If you type-check a lambda with multiple > parameters then you have combinatorial explosion. What am I missing? You mean if you have overloads that are generics ? > > Maurizio R?mi From forax at univ-mlv.fr Sat Aug 10 14:58:31 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 10 Aug 2013 23:58:31 +0200 Subject: Overload resolution simplification In-Reply-To: <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> Message-ID: <5206B787.3040101@univ-mlv.fr> On 08/10/2013 07:29 PM, Brian Goetz wrote: > What makes you think the goal is encouraging explicit lambdas? Overloads are part of the java legacy. Ignoring them and you will see enterprise guidelines saying that only explicit lambdas should be used. Otherwise, it will break when someone will add an overload. > > In the absence of overloading, inference works great. what do you think a about a language that have numerous features but which not let its users to combine them ? > What were proposing here is to have less magic in the interplay of overlaod resolution and inference. It's not less magic. It's no magic. People expect that lambdas will work reasonably well with overloads. R?mi > > Sent from my iPhone > > On Aug 10, 2013, at 7:36 AM, Remi Forax wrote: > >> On 08/09/2013 02:19 AM, Dan Smith wrote: >>> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. >>> >>> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. >>> >>> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. >>> >>> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: >>> >>> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. >>> >>> Benefits of this approach: >>> - Very easy to understand -- it's mostly a syntactic distinction >>> - Consistent between all different patterns of overloading that were previously treated differently >>> - Facilitates a simple declaration-site warning check when method signatures conflict >>> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas >>> - Avoids re-checking lambdas with different parameter types which means: >>> -- Typing of lambda bodies is easier for users to process >>> -- Implementations don't have to do speculative checking of arbitrary blocks of code >>> -- Bad theoretical complexity goes away >>> >>> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). >>> >>> Any questions/concerns? >> Thinking a bit more about this, >> it change a lot of assumptions we have used to make decisions in the past, >> by example do we really want a special syntax for implicit lambda with one parameter >> if we want to encourage use of explicit lambdas ? >> >> R?mi >> From maurizio.cimadamore at oracle.com Sun Aug 11 02:23:18 2013 From: maurizio.cimadamore at oracle.com (maurizio cimadamore) Date: Sun, 11 Aug 2013 10:23:18 +0100 Subject: Overload resolution simplification In-Reply-To: <5206B603.60309@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> <52064693.7090701@oracle.com> <52069EDF.5030008@univ-mlv.fr> <5206AC35.8080701@oracle.com> <5206B603.60309@univ-mlv.fr> Message-ID: <52075806.1000602@oracle.com> On 10-Aug-13 10:52 PM, Remi Forax wrote: > You mean if you have overloads that are generics ? No - in general; as soon as you start type-checking a lambda once per overload I think it's unavoidable to end up in a combinatorial scenario for lambda such as this: m(x->g(y->f(...))) where m, g, f are overloads. Maurizio From maurizio.cimadamore at oracle.com Sun Aug 11 02:50:54 2013 From: maurizio.cimadamore at oracle.com (maurizio cimadamore) Date: Sun, 11 Aug 2013 10:50:54 +0100 Subject: Overload resolution simplification In-Reply-To: <5206B787.3040101@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> <5206B787.3040101@univ-mlv.fr> Message-ID: <52075E7E.1040904@oracle.com> On 10-Aug-13 10:58 PM, Remi Forax wrote: > It's not less magic. It's no magic. This is a bit unfair. I think there's still plenty of magic going on. Note that the path that led us there is that we were uncomfortable in having the compiler to pick one method over another because of some weird error in some implicit lambdas. Now - if you have something like: m(Function f) m(IntFunction f) And a lambda like m(x->1) it could be trivial to see that the second method is the one you want. On the other hand, consider the following overload of _non generic_ methods: m(Function f) m(IntFunction f) If we start accepting this (overload between non-generic methods - so everything is ok, no?) - then we open up can of worms in which an error in one speculative type-check will cause the method not to be applicable - i.e. : m(x->x.length()) //no member length() in Integer, so the first method is selected m(x->x.intValue()) //no member intValue() in String, so the second method is selected This was what the compiler was doing - and EG expressed some concerns with this - and for a good reason I think. A seemingly innocuous refactoring of a lambda body can trigger a change in overload resolution of the enclosing method call. In other words, even when there's an overload between non-generic methods, it gets very muddy very soon as soon as you consider all consequences of 'adding more magic'. Suddenly you have to start thinking about errors in the lambda body to reason about method applicability, and that's brittle (in fact, even though an earlier version of flatMap was able to disambiguate because of this, we decided NOT to rely on this as too magic/brittle). So, the only meaningful question here is - where should we draw the line? I see three options: 1) Allow non-generic overloads, provided each overload forces same choice on lambda parameter types 2) Expand on 1a, implementing a more complex logic that would also work on generic methods 3) Disallow implicit lambdas when overloads of the same arity are available I think 2 was considered also too complex, as the analysis would have to treat inference variables that depend on the return type in a different way (i.e. Comparator.comparing won't work) - and that's surprising. So that's mostly a dead end. The choice is between 1 and 3. As a result of the current simplification, we are in 3. Now, I'm not totally against 1 - I think it's a legitimate position, but there are things about that approach that worry me: *) inconsistencies between non-generic vs. generic methods *) hard to parse when parameter types are ordered in different ways with different SAM types (see my earlier example on this list) *) will miss ambiguities when passing a lambda to _semantically_ unrelated overloads - i.e. m(Function) m(Predicate) This example passes the rule that all overloads force same choice on lambda parameters. Should we accept a method call like the following? m(x->foo(x)) Well, if we follow that scheme, the answer is yes - and the candidate is chosen depending on the return type of foo(String). I think that this choice, while legitimate, is already 'too magic' - we have to very different targets there - with entirely different semantics; I think in such cases I would be more comfortable in having a more explicit form of disambiguation at the call site. Last point - it looks to me that all cases in which we would prefer 1 over 3 have to do with primitive specialization of SAM. On the one hand, the Java type-system features this split between primitive and reference types - so that's what we get; on the other hand, if (big if :-)) we had some form of primitive vs. reference unification, how many cases would be left that 1 could support that 3 cannot? Are we comfortable in _permanently_ adding more complexity to an already very complex area of the language, that could then be useless (at best, or even fire back) if/when the type-system is made more uniform? Maurizio From brian.goetz at oracle.com Sun Aug 11 12:25:59 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 11 Aug 2013 15:25:59 -0400 Subject: Overload resolution simplification In-Reply-To: <5206B787.3040101@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> <5206B787.3040101@univ-mlv.fr> Message-ID: <5207E547.9060806@oracle.com> >> What makes you think the goal is encouraging explicit lambdas? > > Overloads are part of the java legacy. Ignoring them and you will see > enterprise guidelines saying that only explicit lambdas should be used. > Otherwise, it will break when someone will add an overload. Overloads are part of the legacy, but the number of genuine SAM-SAM conflicts with existing APIs are pretty small -- I think Executor.submit() is the #1 case, and there are only a handful of others. What we're trying to discourage is new, designed-for-lambdas APIs from getting out of hand with overloads. Overloading and type inference work against each other, so if you want to design an API for type inference, you need to dial back on the overloading. (Which we've gradually done throughout the course of designing the java.util.stream API, for a host of reasons.) So: - For new APIs, go easier on the overloading when there are possible SAM conflicts; - For old APIs that have such conflicts, go explicit. Maurizio's latest patch also includes a lint warning for overloads that are asking for trouble, so hopefully API designers will have a way of catching those early. From daniel.smith at oracle.com Mon Aug 12 11:27:40 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 12 Aug 2013 12:27:40 -0600 Subject: Overload resolution simplification In-Reply-To: <5206B787.3040101@univ-mlv.fr> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> <5206B787.3040101@univ-mlv.fr> Message-ID: <4B651074-DA88-4B45-BD61-F9C326181383@oracle.com> On Aug 10, 2013, at 3:58 PM, Remi Forax wrote: >> What were proposing here is to have less magic in the interplay of overlaod resolution and inference. > > It's not less magic. It's no magic. > People expect that lambdas will work reasonably well with overloads. A small comment; generally, I pretty much endorse everything Maurizio had to say. People expect target typing of combinators to work; they also want overloading on functional interface types to work. ("Work" here means "support implicit lambdas".) You can't do both at the same time, given our prime directive: overload resolution is context-independent. So we worked hard to make the compiler smart enough to pick which feature to support on a case-by-case basis. And we've found that, despite our best efforts to reduce complexity, it's still hard for the average user to understand; the typical outcome when it doesn't "just work" will be a complaint that the compiler should have known that the other feature is what was wanted here. We get a much simpler story if we just pick one of the features and stick to it (when inferring types of implicit lambdas). Practical experience has told us that target typing of combinators is a lot more useful than overloading on functional interface types. Hence, the bias towards making that the "magic" that we support. ?Dan From brian.goetz at oracle.com Tue Aug 13 11:15:33 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 13 Aug 2013 14:15:33 -0400 Subject: Reader mail bag Message-ID: <520A77C5.3030704@oracle.com> As you may know, there is a separate -comments list which is intended to be used as a "suggestion box" (i.e., its not for discussion, and hence not subscribable or replyable.) This is the public's primary means for providing input to the EG. Most have already been discussed here. The following comments should be considered "read into the record". If folks want to discuss these, please start a separate thread. Sept 10 2012, Neal Gafter: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2012-September/000000.html Summary: "Please make the -comments list subscribe-able." Disposition: That would undermine a key value of the -comments list. Oct 8, 2012, Alex Buckley: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2012-October/000002.html Summary: Please explicitly note classfile version constraints for classfile changes. Disposition: Incorporated into spec drafts. Oct 1, 2012, Neal Gafter: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2012-October/000001.html Summary: Clarifications about BGGA exception transparency Dec 11, 2012, Alex Buckley: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2012-December/000003.html Summary: Request clarification about annotation of lambda parameter formals Disposition: Lambda formals can have type annotations or parameter annotations; parameter annotations are propagated to the desugared implementation method. Jan 2, 2013, Stephen Colebourne: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2013-January/000004.html Summary: Extend @FunctionalInterface annotation to name primary method Disposition: Discussed, no EG consensus behind this proposal Jan 2, 2013, Stephen Colebourne: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2013-January/000005.html Summary: Suggestions for function type naming Disposition: Considered in EG discussions Jan 3, 2013, Alex Buckley: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2013-January/000006.html Summary: Grammar of constructor ref Disposition: Will be integrated into specification Jan 10, 2013, Michael Ernst: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2013-January/000009.html Summary: Please include explicit examples in 15.27 Disposition: Incorporated into specification Apr 1, 2013, Maurice Naftalin: http://mail.openjdk.java.net/pipermail/lambda-spec-comments/2013-April/000010.html Summary: Suggested wording tweak in 9.8 Disposition: Will incorporate From brian.goetz at oracle.com Tue Aug 13 11:34:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 13 Aug 2013 14:34:32 -0400 Subject: From the issue tracker Message-ID: <520A7C38.8010600@oracle.com> As of today, the following issues are still open on the JSR-335 issue tracker hosted at java.net: https://java.net/jira/browse/JSR_335-5 Summary: What shoudl the specified toString behavior of lambdas be? Disposition: no decision reached https://java.net/jira/browse/JSR_335-12 Summary: Concerns about serialization stability Disposition: Issue re-raised on EG list If anyone has (new) observations about either of these topics, please start a separate thread. From forax at univ-mlv.fr Tue Aug 13 14:03:14 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 13 Aug 2013 23:03:14 +0200 Subject: Overload resolution simplification In-Reply-To: <52075806.1000602@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <5205154C.7080209@oracle.com> <697DB0F2-1CE2-420F-9835-24E06DBC4F18@gmail.com> <52054F4A.8030907@univ-mlv.fr> <49E1E815-FCA9-4B42-83AC-77544DA1AE11@oracle.com> <52064693.7090701@oracle.com> <52069EDF.5030008@univ-mlv.fr> <5206AC35.8080701@oracle.com> <5206B603.60309@univ-mlv.fr> <52075806.1000602@oracle.com> Message-ID: <520A9F12.1030200@univ-mlv.fr> On 08/11/2013 11:23 AM, maurizio cimadamore wrote: > On 10-Aug-13 10:52 PM, Remi Forax wrote: >> You mean if you have overloads that are generics ? > No - in general; as soon as you start type-checking a lambda once per > overload I think it's unavoidable to end up in a combinatorial > scenario for lambda such as this: > > m(x->g(y->f(...))) > > where m, g, f are overloads. oops, so it means that we can't disambiguate lambdas by doing a typechecking. So apart the number of parameters of the lambda, we have no information :( > > Maurizio R?mi From forax at univ-mlv.fr Tue Aug 13 14:07:54 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 13 Aug 2013 23:07:54 +0200 Subject: Overload resolution simplification In-Reply-To: <52075E7E.1040904@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> <5206B787.3040101@univ-mlv.fr> <52075E7E.1040904@oracle.com> Message-ID: <520AA02A.2080509@univ-mlv.fr> On 08/11/2013 11:50 AM, maurizio cimadamore wrote: > On 10-Aug-13 10:58 PM, Remi Forax wrote: >> It's not less magic. It's no magic. > This is a bit unfair. I think there's still plenty of magic going on. > Note that the path that led us there is that we were uncomfortable in > having the compiler to pick one method over another because of some > weird error in some implicit lambdas. Now - if you have something like: > > m(Function f) > m(IntFunction f) > > And a lambda like > > m(x->1) > > it could be trivial to see that the second method is the one you want. > > On the other hand, consider the following overload of _non generic_ > methods: > > m(Function f) > m(IntFunction f) > > If we start accepting this (overload between non-generic methods - so > everything is ok, no?) - then we open up can of worms in which an > error in one speculative type-check will cause the method not to be > applicable - i.e. : > > m(x->x.length()) //no member length() in Integer, so the first method > is selected > m(x->x.intValue()) //no member intValue() in String, so the second > method is selected > > This was what the compiler was doing - and EG expressed some concerns > with this - and for a good reason I think. A seemingly innocuous > refactoring of a lambda body can trigger a change in overload > resolution of the enclosing method call. > > In other words, even when there's an overload between non-generic > methods, it gets very muddy very soon as soon as you consider all > consequences of 'adding more magic'. Suddenly you have to start > thinking about errors in the lambda body to reason about method > applicability, and that's brittle (in fact, even though an earlier > version of flatMap was able to disambiguate because of this, we > decided NOT to rely on this as too magic/brittle). > > So, the only meaningful question here is - where should we draw the > line? I see three options: > > 1) Allow non-generic overloads, provided each overload forces same > choice on lambda parameter types > 2) Expand on 1a, implementing a more complex logic that would also > work on generic methods > 3) Disallow implicit lambdas when overloads of the same arity are > available > > I think 2 was considered also too complex, as the analysis would have > to treat inference variables that depend on the return type in a > different way (i.e. Comparator.comparing won't work) - and that's > surprising. So that's mostly a dead end. > > The choice is between 1 and 3. As a result of the current > simplification, we are in 3. Now, I'm not totally against 1 - I think > it's a legitimate position, but there are things about that approach > that worry me: > > *) inconsistencies between non-generic vs. generic methods > *) hard to parse when parameter types are ordered in different ways > with different SAM types (see my earlier example on this list) > *) will miss ambiguities when passing a lambda to _semantically_ > unrelated overloads - i.e. > > m(Function) > m(Predicate) > > This example passes the rule that all overloads force same choice on > lambda parameters. Should we accept a method call like the following? > > m(x->foo(x)) > > Well, if we follow that scheme, the answer is yes - and the candidate > is chosen depending on the return type of foo(String). I think that > this choice, while legitimate, is already 'too magic' - we have to > very different targets there - with entirely different semantics; I > think in such cases I would be more comfortable in having a more > explicit form of disambiguation at the call site. Well Ia gree, just nitpicking about the semantics, in a perfect world all overloads are semantically equivalent, otherwise you shall not use overloads. > > Last point - it looks to me that all cases in which we would prefer 1 > over 3 have to do with primitive specialization of SAM. On the one > hand, the Java type-system features this split between primitive and > reference types - so that's what we get; on the other hand, if (big if > :-)) we had some form of primitive vs. reference unification, how many > cases would be left that 1 could support that 3 cannot? Are we > comfortable in _permanently_ adding more complexity to an already very > complex area of the language, that could then be useless (at best, or > even fire back) if/when the type-system is made more uniform? yes. > > Maurizio > > > > R?mi From forax at univ-mlv.fr Tue Aug 13 14:24:51 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 13 Aug 2013 23:24:51 +0200 Subject: Overload resolution simplification In-Reply-To: <5207E547.9060806@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> <5206B787.3040101@univ-mlv.fr> <5207E547.9060806@oracle.com> Message-ID: <520AA423.4090709@univ-mlv.fr> On 08/11/2013 09:25 PM, Brian Goetz wrote: >>> What makes you think the goal is encouraging explicit lambdas? >> >> Overloads are part of the java legacy. Ignoring them and you will see >> enterprise guidelines saying that only explicit lambdas should be used. >> Otherwise, it will break when someone will add an overload. > > Overloads are part of the legacy, but the number of genuine SAM-SAM > conflicts with existing APIs are pretty small -- I think > Executor.submit() is the #1 case, and there are only a handful of others. > > What we're trying to discourage is new, designed-for-lambdas APIs from > getting out of hand with overloads. Overloading and type inference > work against each other, so if you want to design an API for type > inference, you need to dial back on the overloading. (Which we've > gradually done throughout the course of designing the java.util.stream > API, for a host of reasons.) > > So: > - For new APIs, go easier on the overloading when there are possible > SAM conflicts; > - For old APIs that have such conflicts, go explicit. > > Maurizio's latest patch also includes a lint warning for overloads > that are asking for trouble, so hopefully API designers will have a > way of catching those early. > Here, we re talking about lambdas and I think I mostly agree, but we have the same issue with method reference but no way to go explicit. R?mi From forax at univ-mlv.fr Wed Aug 14 04:59:46 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 14 Aug 2013 13:59:46 +0200 Subject: Overload resolution simplification In-Reply-To: References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> <520625DA.3080905@univ-mlv.fr> <905DDF15-E061-40A8-A1D2-F1FEE554619B@oracle.com> <5206B787.3040101@univ-mlv.fr> <52075E7E.1040904@oracle.com> Message-ID: <520B7132.5090608@univ-mlv.fr> On 08/14/2013 10:03 AM, Ali Ebrahimi wrote: > Sorry, Maurizio, I can't agree on this. > I think we should be consistent in all cases. So if we want > unification why we add primitive specialization of SAMs and Streams. > May be they are hacks as mangling method names to overcome compiler's > weaknesses and will be deprecated in future. Binary compatibility of such mangling will hamper many possible futures. > So better we have a clear view from future and make decisions based on > that. We have a not totally clear view of the future, but one possible future is to teach the VM that Integer is a value type (exactly Integer constructed using Integer.valueOf will be value type). In this possible future, we hope to do type specialization, so it seems wise to not commit ourselves to a particular type specialization scheme now. About, why there are specialized Stream for int, long, etc, one reason is clearly because the VM is not able to see the whole pipeline as a unit of compilation (this may change in the future), there is another reason, operations like sum or average, have only a meaning on a primitive stream, not on an object stream. > And one thing that I don't get is why java can't go C#'s way in this case? see Maurizio answer. > > > Regards, > Ali Ebrahimi cheers, R?mi From brian.goetz at oracle.com Fri Aug 16 10:47:52 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 16 Aug 2013 13:47:52 -0400 Subject: Serializable lambdas -- where we are, how we got here Message-ID: <520E65C8.4010600@oracle.com> Several concerns have been recently (re)raised again about the stability of serializable lambdas. This attempts to provide an inventory of where we are and how we got here. There were some who initially (wishfully) suggested that it would be best to declare serialization a mistake and not make lambdas serializable at all. While this was a very tempting target, ultimately this conflicted with another decision we made: that of using nominal function types (functional interfaces) to type lambdas. For example, imagine: interface SerializablePredicate extends Predicate, Serializable { } If the user does: SerializablePredicate p = s -> false; or SerializablePredicate p = String::isEmpty; It would violate the principle of least surprise that the resulting objects (whose lambda-heritage should be invisible to anyone who later touches it) to not be serializable. Hence begun our slide down the slippery slope. An intrinsic challenge of serialization is that, when confronted with different class files at deserialization time than were present at serialization time, to make a good-faith effort to figure out what to do. For classes, the default behavior (in the absence of an explicit serial version UID) is to consider any change to the class signatures to invalidate existing serialized forms, but in the presence of a serial version UID, to attempt to deal gracefully with added or removed fields. Inherent in this is the assumption that if the *name* and *signature* of something hasn't changed, its semantics haven't, either. If you change the meaning of a field or a method, but not its name, you're out of luck. Anonymous classes are less forgiving than nominal classes, because (a) their names are generated at compile time and may change if the source changes "too much", and (b) their field names / constructor signature may change based on changes in method bodies even if the class and method signatures don't change. This problem has been with us since 1997. There are two possible failure modes that come out of this: Type 1) An instance may fail to deserialize, due to changes that have nothing to do with the object being serialized; Type 2) An instance may deserialize successfully, but may be bound to the *wrong* implementation due to bad luck. Still, many users successfully deal with serialization and anonymous classes by following a simple rule: have the same bits on both sides of the wire. In reality, the situation is more forgiving than that: if you recompile the same source with the same compiler, things still work -- and users fundamentally expect this to be the case. And the same is true for "lightly modified" versions of the same sources (adding comments, adding debugging statements, etc.) Lambdas are similar to anonymous classes in some ways, and we were aware of these failure modes at the time we first discussed serialization of lambdas. Obviously we would have preferred to prevent these failures if possible, but all the approaches explored were either too restrictive or incomplete. Restrictions that were explored and rejected include: - No serializable lambdas at all - Only serialize static or unbound method refs - Only serialize named, non-capturing lambdas The various hash-the-world options that have been suggested (hash the source or bytecode) are too weird, too brittle, too hard to specify, and will result in users being confounded by, say, recompiling what they perceive as identical sources with an identical compiler and still getting runtime failures, violating (reasonable) user expectations. (It would be almost better to generate a *random* name on every compilation, but we're not going to do that.) In the absence of being able to make it perfect, having exactly the same drawbacks of an existing mechanism, which users are familiar with and have learned to work around, was deemed better than making it imperfect in yet a new way. That said, if there's a possibility to reduce type-2 failures without undermining the usability of serialization or the simplicity of the user model, we're willing to continue to explore these (despite the extreme lateness of the hour). At the recent EG meeting, we specifically discussed whether it would be worthwhile to try and address recovering from capture-order issues. This *is* tractible (subject to the same caveats with nominal classes -- that same-name means same-meaning). But, the sense of the room then was that this doesn't help enough, because there is still the name-induced stability issue, and that fixing one without the other just encourages users to think that they can make arbitrary code changes and expect serialization stability, and makes it even more surprising when we get a failure due to, say, adding a new lambda to a method. However, if we felt we were likely to do named lambdas later, then this approach could close half the problem now and we could close the other half of the problem later. One possibility that has not yet been discussed is to issue a lint warning for serializable lambdas/method refs that are subject to stability issues. Here's where we are: - We're not revisiting the decisions about what lambdas and method references should be serializable. This has been reopened several times with no change in consensus, and no new information has come to light that would change the decision. - "Just like inner classes" is a local maxima. Better to not ask the user to create a new mental model than to require a new one that is just as flawed but in different ways. However, we already make some departures from inner class treatment, so this is a more "spirit of the rule" thing than a "letter of the rule." If we can do *much* better, great, but "slightly better but different" is worse. - We might be able to revisit some translation decisions if they result in significant improvements to stability without cost to usability, but we are almost, if not completely, out of time. - We're open to adding more lint warnings at compile time. Stay tuned for a specific proposal. From forax at univ-mlv.fr Fri Aug 16 13:56:16 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 16 Aug 2013 22:56:16 +0200 Subject: Serializable lambdas -- where we are, how we got here In-Reply-To: <520E65C8.4010600@oracle.com> References: <520E65C8.4010600@oracle.com> Message-ID: <520E91F0.8000201@univ-mlv.fr> On 08/16/2013 07:47 PM, Brian Goetz wrote: > Several concerns have been recently (re)raised again about the > stability of serializable lambdas. This attempts to provide an > inventory of where we are and how we got here. > > There were some who initially (wishfully) suggested that it would be > best to declare serialization a mistake and not make lambdas > serializable at all. While this was a very tempting target, > ultimately this conflicted with another decision we made: that of > using nominal function types (functional interfaces) to type lambdas. > > For example, imagine: > > interface SerializablePredicate > extends Predicate, Serializable { } > > If the user does: > > SerializablePredicate p = s -> false; > or > SerializablePredicate p = String::isEmpty; > > It would violate the principle of least surprise that the resulting > objects (whose lambda-heritage should be invisible to anyone who later > touches it) to not be serializable. Hence begun our slide down the > slippery slope. > > An intrinsic challenge of serialization is that, when confronted with > different class files at deserialization time than were present at > serialization time, to make a good-faith effort to figure out what to > do. For classes, the default behavior (in the absence of an explicit > serial version UID) is to consider any change to the class signatures > to invalidate existing serialized forms, but in the presence of a > serial version UID, to attempt to deal gracefully with added or > removed fields. Inherent in this is the assumption that if the *name* > and *signature* of something hasn't changed, its semantics haven't, > either. If you change the meaning of a field or a method, but not its > name, you're out of luck. > > Anonymous classes are less forgiving than nominal classes, because (a) > their names are generated at compile time and may change if the source > changes "too much", and (b) their field names / constructor signature > may change based on changes in method bodies even if the class and > method signatures don't change. This problem has been with us since > 1997. There are two possible failure modes that come out of this: > Type 1) An instance may fail to deserialize, due to changes that have > nothing to do with the object being serialized; > Type 2) An instance may deserialize successfully, but may be bound to > the *wrong* implementation due to bad luck. > > Still, many users successfully deal with serialization and anonymous > classes by following a simple rule: have the same bits on both sides > of the wire. In reality, the situation is more forgiving than that: > if you recompile the same source with the same compiler, things still > work -- and users fundamentally expect this to be the case. And the > same is true for "lightly modified" versions of the same sources > (adding comments, adding debugging statements, etc.) > > Lambdas are similar to anonymous classes in some ways, and we were > aware of these failure modes at the time we first discussed > serialization of lambdas. Obviously we would have preferred to > prevent these failures if possible, but all the approaches explored > were either too restrictive or incomplete. Restrictions that were > explored and rejected include: > - No serializable lambdas at all > - Only serialize static or unbound method refs > - Only serialize named, non-capturing lambdas > > The various hash-the-world options that have been suggested (hash the > source or bytecode) are too weird, too brittle, too hard to specify, > and will result in users being confounded by, say, recompiling what > they perceive as identical sources with an identical compiler and > still getting runtime failures, violating (reasonable) user > expectations. (It would be almost better to generate a *random* name > on every compilation, but we're not going to do that.) > > In the absence of being able to make it perfect, having exactly the > same drawbacks of an existing mechanism, which users are familiar with > and have learned to work around, was deemed better than making it > imperfect in yet a new way. > > That said, if there's a possibility to reduce type-2 failures without > undermining the usability of serialization or the simplicity of the > user model, we're willing to continue to explore these (despite the > extreme lateness of the hour). > > At the recent EG meeting, we specifically discussed whether it would > be worthwhile to try and address recovering from capture-order issues. > This *is* tractible (subject to the same caveats with nominal classes > -- that same-name means same-meaning). But, the sense of the room > then was that this doesn't help enough, because there is still the > name-induced stability issue, and that fixing one without the other > just encourages users to think that they can make arbitrary code > changes and expect serialization stability, and makes it even more > surprising when we get a failure due to, say, adding a new lambda to a > method. However, if we felt we were likely to do named lambdas later, > then this approach could close half the problem now and we could close > the other half of the problem later. > > One possibility that has not yet been discussed is to issue a lint > warning for serializable lambdas/method refs that are subject to > stability issues. > > Here's where we are: > - We're not revisiting the decisions about what lambdas and method > references should be serializable. This has been reopened several > times with no change in consensus, and no new information has come to > light that would change the decision. > - "Just like inner classes" is a local maxima. Better to not ask the > user to create a new mental model than to require a new one that is > just as flawed but in different ways. However, we already make some > departures from inner class treatment, so this is a more "spirit of > the rule" thing than a "letter of the rule." If we can do *much* > better, great, but "slightly better but different" is worse. > - We might be able to revisit some translation decisions if they > result in significant improvements to stability without cost to > usability, but we are almost, if not completely, out of time. > - We're open to adding more lint warnings at compile time. > > > Stay tuned for a specific proposal. So you want a lint warning saying serialization sucks :) You want a warning when a lambda/method ref capture local variables, it's logical to have the same warning for inner class.too. But in that case you will raise warnings in already written and valid code. Not a good idea, IMO. R?mi From brian.goetz at oracle.com Mon Aug 19 08:37:08 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 19 Aug 2013 11:37:08 -0400 Subject: One final stab at improving lambda serialization Message-ID: <52123BA4.4020700@oracle.com> *Background* The fundamental challenge with serialization is that the code that defined a class at serialization time may have changed by the time deserialization happens. Serialization is defined to be tolerant of change to a certain extent, and admits a degree of customization to allow additional flexibility. For ordinary classes, there are three lines of defense: * serialVersionUID * serialization hooks * default schema evolution Serial version UID for the target class must match exactly. By default, serialization uses a serial version UID which is a hash of the classes signatures. So this default approach means "any significant change to the structure of the class (adding new methods, changing method or field signatures, etc) renders serialized forms invalid". It is a common practice to explicitly assign a serial version UID to a class, thereby disabling this mechanism. Classes that expect to evolve over time may use readObject/writeObject and/or readResolve/writeReplace to customize the mapping between object state and bytestream. If classes do not use this mechanism, serialization uses a default schema evolution mechanism to adjust for changes in fields between serialization and deserialization time; fields that are present in the bytestream but not in the target class are ignored, and fields that are present in the target class but not the bytestream get default values (zero, null, etc.) Anonymous classes follow the same approach and have access to the same mechanisms (serialVersionUID, read/writeObject, etc), but they have two additional sources of instability: * The name is generated as EnclosingClass$nnn. Any change to the set of anonymous classes in the enclosing class may cause sequence numbers to change. * The number and type of fields (appears in bytecode but not source code) are generated based on the set of captured values. Any change to the set or order of captured values can cause these signatures to change (in an unspecified way). If the signatures remain stable, anonymous classes can use serialization hooks to customize the serialized form, just like named classes. The EG has observed that users have largely learned to deal with the problems of serialization of inner classes, either by (a) don't do it, or (b) ensure that essentially the same bits are present on both sides of the pipe, preventing skew from causing instability in either class names or signatures. The EG has set, as a minimum bar, that lambda serialization be "at least as good as" anonymous class serialization. (This is not a high bar.) Further, the EG has concluded that gratuitous deviations from anonymous class serialization are undesirable, because, if users have to deal with an imperfect scheme, having them deal with something that is basically the same as an imperfect scheme they've already gotten used to is preferable to dealing with a new and different scheme. Further, the EG has rejected the idea of arbitrarily restricting access to serialization just because it is dangerous; users who have learned to use it safely should not be unduly encumbered. *Failure modes * For anonymous classes, one of two things will happen when attempting to deserialize after things have changed "too much": 1. A deserialization failure due to either the name or signature not matching, resulting in NoSuchMethodError, IncompatibleClassChangeError, etc. 2. Deserializing to the wrong thing, without any evidence of error. Obviously, a type-2 failure is far worse than a type-1 failure, because no error is raised and an unintended computation is performed. Here are two examples of changes that are behaviorally compatible but which will result in type-2 failures. The first has to do with order-of-declaration. *Old code** * *New code** * *Result** * Runnable r1 = new Runnable() { void run() { System.out.println("one"); } }; Runnable r2 = new Runnable() { void run() { System.out.println("two"); } }; Runnable r2 = new Runnable() { void run() { System.out.println("two"); } }; Runnable r1 = new Runnable() { void run() { System.out.println("one"); } }; Deserialized r1 (across skew) prints "two". This fails because in both cases, we get classes called Foo$1 and Foo$2, but in the old code, these correspond to r1 and r2, but in the new code, these correspond to r2 and r1. The other failure has to do with order-of-capture. *Old code** * *New code** * *Result** * String s1 = "foo"; String s2 = "bar"; Runnable r = new Runnable() { void run() { foo(s1, s2); } }; String s1 = "foo"; String s2 = "bar"; Runnable r = new Runnable() { void run() { String s = s2; foo(s1, s); } }; On deserialization, s1 and s2 are effectively swapped. This fails because the order of arguments in the implicitly generated constructor of the inner class changes due to the order in which the compiler encounters captured variables. If the reordered variables were of different types, this would cause a type-1 failure, but if they are the same type, it causes a type-2 failure. *User expectations* While experienced users are quick to state the "same bits on both sides" rule for reliable deserialization, a bit of investigation reveals that user expectations are actually higher than that. For example, if the compiler generated a /random/ name for each lambda at compile time, then recompiling the same source with the same compiler, and using the result for deserialization, would fail. This is too restrictive; user expectations are not tied to "same bits", but to a vaguer notion of "I compiled essentially the same source with essentially the same compiler, and therefore didn't change anything significant." For example, users would balk if adding a comment or changing whitespace were to affect deserialization. Users likely expect (in part, due to behavior of anonymous classes) changes to code that doesn't affect the lambda directly or indirectly (e.g., add or remove a debugging println) also would not affect the serialized form. In the absence of the user being able to explicitly name the lambda /and/ its captures (as C++ does), there is no perfect solution. Instead, our goal can only be to minimize type-2 failures while not unduly creating type-1 failures when "no significant code change" happened. This means we have to put a stake in the ground as to what constitutes "significant" code change. The de-facto (and likely accidental) definition of "significant" used by inner classes here is: * Adding, removing, or reordering inner class instances earlier in the source file; * Changes to the number, order, or type of captured arguments This permits changes to code that has nothing to do with inner classes, and many common refactorings as long as they do not affect the order of inner class instances or their captures. *Current Lambda behavior* Lambda serialization currently behaves very similarly to anonymous class serialization. Where anonymous classes have stable method names but unstable class names, lambdas are the dual; unstable method names but stable class names. But since both are used together, the resulting naming stability is largely the same. We do one thing to increase naming stability for lambdas: we hash the name and signature of the enclosing method in the lambda name. This insulates lambda naming from the addition, removal, or reordering of methods within a class file, but naming stability remains sensitive to the order of lambdas within the method. Similarly, order-of-capture issues are largely similar to inner classes. Lambdas bodies are desugared to methods named in the following form: lambda$/mmm/$/nnn/, where /mmm/ is a hash of the method name and signature, and /nnn/ is a sequence number of lambdas that have the same /mmm/ hash. Because lambdas are instantiated via invokedynamic rather than invoking a constructor directly, there is also slightly more leniency to changes to the /types/ of captured argument; changing a captured argument from, say, String to Object, would be a breaking change for anonymous classes (it changes the constructor signature) but not for lambdas. This leniency is largely an accidental artifact of translation, rather than a deliberate design decision. *Possible improvements* We can start by recognizing the role of the hash of the enclosing method in the lambda method name. This reduces the set of lambdas that could collide from "all the lambdas in the file" to "all the lambdas in the method." This reduces the set of changes that cause both type-1 and type-2 errors. An additional observation is that there is a tension between trying to /recover from/ skew (rather than simply trying to detect it, and failing deserialization) and complexity. So I think we should focus primarily on detecting skew and failing deserialization (turning type-2 failures into type-1) while at the same time not unduly increasing the set of changes that cause type-1 errors, with the goal of settling on an informal guideline of what constitutes "too much" change. We can do this by increasing the number of things that affect the /mmm/ hash, effectively constructing the lambda-equivalent of the serialization version UID. The more context we add to this hash, the smaller the set of lambdas that hash to the same bucket gets, which reduces the space of possible collisions. The following table shows possible candidates for inclusion, along with examples of code that illustrate dependence on this item. *Item** * *Old Code** ------------------------------ * *New Code** **------------------------------* *Effect** * *Rationale** * Names of captured arguments int x = ... f(() -> x); int y = ... f(() -> y); Including the names of captured arguments in the hash would cause rename-refactors of captured arguments to be considered a serialization-breaking change. While alpha-renaming is generally considered to be semantic-preserving, serialization has always keyed off of names (such as field names) as being clues to developer intent. It seems reasonable to say "If you change the names involved, we have to assume a semantic change occurred." We cannot tell if a name change is a simple alpha-rename or capturing a completely different variable, so this is erring on the safe side. Types of captured arguments String x = ... f(() -> x); Object x = ... f(() -> x); It seems reasonable to say that, if you capture arguments of a different type, you've made a semantic change. Order of captured arguments () -> { int a = f(x); int b = g(y); return h(a,b); }; () -> { int b = g(y); int a = f(x); return h(a,b); }; Changing the order of capture would become a type-1 failure rather than possibly a type-2 failure. Since we cannot detect whether the ordering change is semantically meaningful or not, it is best to be conservative and say: change to capture order is likely a semantic change. Variable assignment target (if present) Runnable r1 = Foo::f; Runnable r2 = Foo::g; Runnable r2 = Foo::g; Runnable r1 = Foo::f; Including variable target name would render this reordering recoverable and correct If the user has gone to the effort of providing a name, we can use this as a hint to the meaning of the lambda. Runnable r = Foo::f; Runnable runnable = Foo::f; Including variable target name would render this change (previously recoverable and correct) a deserialiation failure If the user has changed the name, it seems reasonable to treat that as possibly meaning something else. Target type Predicate p = String::isEmpty; Function p = String::isEmpty; Including target type reduces the space of potential sequence number collisions. If you've changed the target type, it is a different lambda. This list is not exhaustive, and there are others we might consider. (For example, for lambdas that appear in method invocation context rather than assignment context, we might include the hash of the invoked method name and signature, or even the parameter index or name. This is where it starts to exhibit diminishing returns and increasing brittleness.) Taken in total, the effect is: * All order-of-capture issues become type-1 failures, rather than type-2 failures (modulo hash collisions). * Order of declaration issues are still present, but they are dramatically reduced, turning many type-2 failures into type-1 failures. * Some new type-1 failures are introduced, mostly those deriving from rename-refactors. The remaining type-2 failures could be dealt with if we added named lambdas in the future. (They are also prevented if users always assign lambdas to local variables whose names are unique within the method; in this way, the local-variable trick becomes a sort of poor-man's named lambda.) We can reduce the probability of collision further by using a different (and simpler) scheme for non-serializable lambdas (lambda$nnn), so that serializable lambdas can only accidentally collide with each other. However, there are some transformations which we will still not be able to avoid under this scheme. For example: *Old code** * *New code** * *Result** * Supplier s = foo ? () -> 1 : () -> 2; Supplier s = !foo ? () -> 2 : () -> 1; This change is behaviorally compatible but could result in type-2 failure, since both lambdas have the same target type, capture arity, etc. However^2, we can still detect this risk and warn the user. If for any /mmm/, we issue more than one sequence number /nnn/, we are at risk for a type-2 failure, and can issue a lint warning in that case, suggesting the user refactor to something more stable. (Who knows what that diagnostic message will look like.) With all the hash information above, it seems likely that the number of potentially colliding lambdas will be small enough that this warning would not come along too often. The impact of this change in the implementation is surprisingly small. It does not affect the serialized form (java.lang.invoke.SerializedLambda), or the generated deserialization code ($deserialize$). It only affects the code which generates the lambda method name, which needs access to a small additional bit of information -- the assignment target name. Similarly, detecting the condition required for warning is easy -- "sequence number != 1". Qualitatively, the result is still similar in feel to inner classes -- you can make "irrelevant" changes but we make no heroic attempts to recover from things like changes in capture order -- but we do a better job of detecting them (and, if you follow some coding discipline, you can avoid them entirely.) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-spec-experts/attachments/20130819/48ba02a4/attachment-0001.html From brian.goetz at oracle.com Mon Aug 19 08:58:30 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 19 Aug 2013 11:58:30 -0400 Subject: One final stab at improving lambda serialization In-Reply-To: <52123BA4.4020700@oracle.com> References: <52123BA4.4020700@oracle.com> Message-ID: <521240A6.6010604@oracle.com> Those of you receiving this in plain text (which probably is everyone on the -observers lists) will have some trouble deciphering. Mailman helpfully put the attachment here but not in a very useful format: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130819/48ba02a4/attachment-0001.html I've uploaded a more readable copy to: http://cr.openjdk.java.net/~briangoetz/eg-attachments/lambda-serialization.html On 8/19/2013 11:37 AM, Brian Goetz wrote: > *Background* > > The fundamental challenge with serialization is that the code that > defined a class at serialization time may have changed by the time > deserialization happens. Serialization is defined to be tolerant of > change to a certain extent, and admits a degree of customization to > allow additional flexibility. > > ... From dl at cs.oswego.edu Mon Aug 19 10:43:50 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 19 Aug 2013 13:43:50 -0400 Subject: One final stab at improving lambda serialization In-Reply-To: <52123BA4.4020700@oracle.com> References: <52123BA4.4020700@oracle.com> Message-ID: <52125956.9040200@cs.oswego.edu> This is the sort of scheme I had in mind in my reply to David Lloyd (that might have in part led to this effort). And the details seem reasonable to me. Looks good to go, assuming no unexpected implementation snags. -Doug On 08/19/2013 11:37 AM, Brian Goetz wrote: > *Background* > > The fundamental challenge with serialization is that the code that defined a > class at serialization time may have changed by the time deserialization > happens. Serialization is defined to be tolerant of change to a certain extent, > and admits a degree of customization to allow additional flexibility. > > For ordinary classes, there are three lines of defense: > > * serialVersionUID > * serialization hooks > * default schema evolution > > Serial version UID for the target class must match exactly. By default, > serialization uses a serial version UID which is a hash of the classes > signatures. So this default approach means "any significant change to the > structure of the class (adding new methods, changing method or field signatures, > etc) renders serialized forms invalid". It is a common practice to explicitly > assign a serial version UID to a class, thereby disabling this mechanism. > > Classes that expect to evolve over time may use readObject/writeObject and/or > readResolve/writeReplace to customize the mapping between object state and > bytestream. If classes do not use this mechanism, serialization uses a default > schema evolution mechanism to adjust for changes in fields between serialization > and deserialization time; fields that are present in the bytestream but not in > the target class are ignored, and fields that are present in the target class > but not the bytestream get default values (zero, null, etc.) > > Anonymous classes follow the same approach and have access to the same > mechanisms (serialVersionUID, read/writeObject, etc), but they have two > additional sources of instability: > > * The name is generated as EnclosingClass$nnn. Any change to the set of > anonymous classes in the enclosing class may cause sequence numbers to change. > * The number and type of fields (appears in bytecode but not source code) are > generated based on the set of captured values. Any change to the set or > order of captured values can cause these signatures to change (in an > unspecified way). > > If the signatures remain stable, anonymous classes can use serialization hooks > to customize the serialized form, just like named classes. > > The EG has observed that users have largely learned to deal with the problems of > serialization of inner classes, either by (a) don't do it, or (b) ensure that > essentially the same bits are present on both sides of the pipe, preventing skew > from causing instability in either class names or signatures. > > The EG has set, as a minimum bar, that lambda serialization be "at least as good > as" anonymous class serialization. (This is not a high bar.) Further, the EG > has concluded that gratuitous deviations from anonymous class serialization are > undesirable, because, if users have to deal with an imperfect scheme, having > them deal with something that is basically the same as an imperfect scheme > they've already gotten used to is preferable to dealing with a new and > different scheme. > > Further, the EG has rejected the idea of arbitrarily restricting access to > serialization just because it is dangerous; users who have learned to use it > safely should not be unduly encumbered. > > *Failure modes > * > > For anonymous classes, one of two things will happen when attempting to > deserialize after things have changed "too much": > > 1. A deserialization failure due to either the name or signature not matching, > resulting in NoSuchMethodError, IncompatibleClassChangeError, etc. > 2. Deserializing to the wrong thing, without any evidence of error. > > Obviously, a type-2 failure is far worse than a type-1 failure, because no error > is raised and an unintended computation is performed. Here are two examples of > changes that are behaviorally compatible but which will result in type-2 > failures. The first has to do with order-of-declaration. > > *Old code** > * *New code** > * *Result** > * > Runnable r1 = new Runnable() { > void run() { > System.out.println("one"); > } > }; > Runnable r2 = new Runnable() { > void run() { > System.out.println("two"); > } > }; > Runnable r2 = new Runnable() { > void run() { > System.out.println("two"); > } > }; > Runnable r1 = new Runnable() { > void run() { > System.out.println("one"); > } > }; > Deserialized r1 (across skew) prints "two". > > This fails because in both cases, we get classes called Foo$1 and Foo$2, but in > the old code, these correspond to r1 and r2, but in the new code, these > correspond to r2 and r1. > > The other failure has to do with order-of-capture. > > *Old code** > * *New code** > * *Result** > * > String s1 = "foo"; > String s2 = "bar"; > Runnable r = new Runnable() { > void run() { > foo(s1, s2); > } > }; > > String s1 = "foo"; > String s2 = "bar"; > Runnable r = new Runnable() { > void run() { > String s = s2; > foo(s1, s); > } > }; > On deserialization, s1 and s2 are effectively swapped. > > This fails because the order of arguments in the implicitly generated > constructor of the inner class changes due to the order in which the compiler > encounters captured variables. If the reordered variables were of different > types, this would cause a type-1 failure, but if they are the same type, it > causes a type-2 failure. > > *User expectations* > > While experienced users are quick to state the "same bits on both sides" rule > for reliable deserialization, a bit of investigation reveals that user > expectations are actually higher than that. For example, if the compiler > generated a /random/ name for each lambda at compile time, then recompiling the > same source with the same compiler, and using the result for deserialization, > would fail. This is too restrictive; user expectations are not tied to "same > bits", but to a vaguer notion of "I compiled essentially the same source with > essentially the same compiler, and therefore didn't change anything > significant." For example, users would balk if adding a comment or changing > whitespace were to affect deserialization. Users likely expect (in part, due to > behavior of anonymous classes) changes to code that doesn't affect the lambda > directly or indirectly (e.g., add or remove a debugging println) also would not > affect the serialized form. > > In the absence of the user being able to explicitly name the lambda /and/ its > captures (as C++ does), there is no perfect solution. Instead, our goal can > only be to minimize type-2 failures while not unduly creating type-1 failures > when "no significant code change" happened. This means we have to put a stake > in the ground as to what constitutes "significant" code change. > > The de-facto (and likely accidental) definition of "significant" used by inner > classes here is: > > * Adding, removing, or reordering inner class instances earlier in the source > file; > * Changes to the number, order, or type of captured arguments > > This permits changes to code that has nothing to do with inner classes, and many > common refactorings as long as they do not affect the order of inner class > instances or their captures. > > *Current Lambda behavior* > > Lambda serialization currently behaves very similarly to anonymous class > serialization. Where anonymous classes have stable method names but unstable > class names, lambdas are the dual; unstable method names but stable class > names. But since both are used together, the resulting naming stability is > largely the same. > > We do one thing to increase naming stability for lambdas: we hash the name and > signature of the enclosing method in the lambda name. This insulates lambda > naming from the addition, removal, or reordering of methods within a class file, > but naming stability remains sensitive to the order of lambdas within the > method. Similarly, order-of-capture issues are largely similar to inner classes. > > Lambdas bodies are desugared to methods named in the following form: > lambda$/mmm/$/nnn/, where /mmm/ is a hash of the method name and signature, and > /nnn/ is a sequence number of lambdas that have the same /mmm/ hash. > > Because lambdas are instantiated via invokedynamic rather than invoking a > constructor directly, there is also slightly more leniency to changes to the > /types/ of captured argument; changing a captured argument from, say, String to > Object, would be a breaking change for anonymous classes (it changes the > constructor signature) but not for lambdas. This leniency is largely an > accidental artifact of translation, rather than a deliberate design decision. > > *Possible improvements* > > We can start by recognizing the role of the hash of the enclosing method in the > lambda method name. This reduces the set of lambdas that could collide from > "all the lambdas in the file" to "all the lambdas in the method." This reduces > the set of changes that cause both type-1 and type-2 errors. > > An additional observation is that there is a tension between trying to /recover > from/ skew (rather than simply trying to detect it, and failing deserialization) > and complexity. So I think we should focus primarily on detecting skew and > failing deserialization (turning type-2 failures into type-1) while at the same > time not unduly increasing the set of changes that cause type-1 errors, with the > goal of settling on an informal guideline of what constitutes "too much" change. > > We can do this by increasing the number of things that affect the /mmm/ hash, > effectively constructing the lambda-equivalent of the serialization version > UID. The more context we add to this hash, the smaller the set of lambdas that > hash to the same bucket gets, which reduces the space of possible collisions. > The following table shows possible candidates for inclusion, along with examples > of code that illustrate dependence on this item. > > *Item** > * *Old Code** > ------------------------------ > * *New Code** > **------------------------------* > *Effect** > * *Rationale** > * > Names of captured arguments > int x = ... > f(() -> x); > int y = ... > f(() -> y); Including the names of captured arguments in the hash would cause > rename-refactors of captured arguments to be considered a serialization-breaking > change. > While alpha-renaming is generally considered to be semantic-preserving, > serialization has always keyed off of names (such as field names) as being clues > to developer intent. It seems reasonable to say "If you change the names > involved, we have to assume a semantic change occurred." We cannot tell if a > name change is a simple alpha-rename or capturing a completely different > variable, so this is erring on the safe side. > Types of captured arguments > String x = ... > f(() -> x); Object x = ... > f(() -> x); > It seems reasonable to say that, if you capture arguments of a different type, > you've made a semantic change. > Order of captured arguments > () -> { > int a = f(x); > int b = g(y); > return h(a,b); > }; > () -> { > int b = g(y); > int a = f(x); > return h(a,b); > }; Changing the order of capture would become a type-1 failure rather than > possibly a type-2 failure. > Since we cannot detect whether the ordering change is semantically meaningful > or not, it is best to be conservative and say: change to capture order is likely > a semantic change. > Variable assignment target (if present) > Runnable r1 = Foo::f; > Runnable r2 = Foo::g; > Runnable r2 = Foo::g; > Runnable r1 = Foo::f; > > Including variable target name would render this reordering recoverable and correct > If the user has gone to the effort of providing a name, we can use this as a > hint to the meaning of the lambda. > > Runnable r = Foo::f; Runnable runnable = Foo::f; Including variable target > name would render this change (previously recoverable and correct) a > deserialiation failure > If the user has changed the name, it seems reasonable to treat that as possibly > meaning something else. > Target type > Predicate p = String::isEmpty; > Function p = String::isEmpty; Including target type reduces > the space of potential sequence number collisions. > If you've changed the target type, it is a different lambda. > > This list is not exhaustive, and there are others we might consider. (For > example, for lambdas that appear in method invocation context rather than > assignment context, we might include the hash of the invoked method name and > signature, or even the parameter index or name. This is where it starts to > exhibit diminishing returns and increasing brittleness.) > > Taken in total, the effect is: > > * All order-of-capture issues become type-1 failures, rather than type-2 > failures (modulo hash collisions). > * Order of declaration issues are still present, but they are dramatically > reduced, turning many type-2 failures into type-1 failures. > * Some new type-1 failures are introduced, mostly those deriving from > rename-refactors. > > The remaining type-2 failures could be dealt with if we added named lambdas in > the future. (They are also prevented if users always assign lambdas to local > variables whose names are unique within the method; in this way, the > local-variable trick becomes a sort of poor-man's named lambda.) > > We can reduce the probability of collision further by using a different (and > simpler) scheme for non-serializable lambdas (lambda$nnn), so that serializable > lambdas can only accidentally collide with each other. > > However, there are some transformations which we will still not be able to avoid > under this scheme. For example: > > *Old code** > * *New code** > * *Result** > * > Supplier s = > foo ? () -> 1 > : () -> 2; > Supplier s = > !foo ? () -> 2 > : () -> 1; This change is behaviorally compatible but could result in > type-2 failure, since both lambdas have the same target type, capture arity, etc. > > However^2, we can still detect this risk and warn the user. If for any /mmm/, > we issue more than one sequence number /nnn/, we are at risk for a type-2 > failure, and can issue a lint warning in that case, suggesting the user refactor > to something more stable. (Who knows what that diagnostic message will look > like.) With all the hash information above, it seems likely that the number of > potentially colliding lambdas will be small enough that this warning would not > come along too often. > > The impact of this change in the implementation is surprisingly small. It does > not affect the serialized form (java.lang.invoke.SerializedLambda), or the > generated deserialization code ($deserialize$). It only affects the code which > generates the lambda method name, which needs access to a small additional bit > of information -- the assignment target name. Similarly, detecting the > condition required for warning is easy -- "sequence number != 1". > > Qualitatively, the result is still similar in feel to inner classes -- you can > make "irrelevant" changes but we make no heroic attempts to recover from things > like changes in capture order -- but we do a better job of detecting them (and, > if you follow some coding discipline, you can avoid them entirely.) > > From andrey.breslav at jetbrains.com Wed Aug 21 04:17:11 2013 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Wed, 21 Aug 2013 15:17:11 +0400 Subject: One final stab at improving lambda serialization In-Reply-To: <52123BA4.4020700@oracle.com> References: <52123BA4.4020700@oracle.com> Message-ID: <1B173070-3CCE-4CA8-9A76-293F8FA1A1E7@jetbrains.com> > Qualitatively, the result is still similar in feel to inner classes -- you can make "irrelevant" changes but we make no heroic attempts to recover from things like changes in capture order -- but we do a better job of detecting them (and, if you follow some coding discipline, you can avoid them entirely.) > I think it is a good balance. I'll be glad to see it implemented this way. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-spec-experts/attachments/20130821/532f43d9/attachment.html From Vladimir.Zakharov at gs.com Wed Aug 21 19:25:39 2013 From: Vladimir.Zakharov at gs.com (Zakharov, Vladimir) Date: Wed, 21 Aug 2013 22:25:39 -0400 Subject: One final stab at improving lambda serialization In-Reply-To: <52125956.9040200@cs.oswego.edu> References: <52123BA4.4020700@oracle.com> <52125956.9040200@cs.oswego.edu> Message-ID: I agree. Looks like this proposal strikes just the right balance: it is not being unnecessarily (and unexpectedly) restrictive while avoiding "silent" failures. I am supportive of serialization behavior implemented as described below. Thank you, Vlad The Goldman Sachs Group, Inc. All rights reserved. See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it. This message may contain confidential or privileged information. If you are not the intended recipient, please advise us immediately and delete this message. See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication. If you cannot access these links, please notify us by reply message and we will send the contents to you. -----Original Message----- From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Doug Lea Sent: Monday, August 19, 2013 1:44 PM To: Brian Goetz Cc: lambda-libs-spec-experts at openjdk.java.net; lambda-spec-experts at openjdk.java.net Subject: Re: One final stab at improving lambda serialization This is the sort of scheme I had in mind in my reply to David Lloyd (that might have in part led to this effort). And the details seem reasonable to me. Looks good to go, assuming no unexpected implementation snags. -Doug On 08/19/2013 11:37 AM, Brian Goetz wrote: > *Background* > > The fundamental challenge with serialization is that the code that defined a > class at serialization time may have changed by the time deserialization > happens. Serialization is defined to be tolerant of change to a certain extent, > and admits a degree of customization to allow additional flexibility. > > For ordinary classes, there are three lines of defense: > > * serialVersionUID > * serialization hooks > * default schema evolution > > Serial version UID for the target class must match exactly. By default, > serialization uses a serial version UID which is a hash of the classes > signatures. So this default approach means "any significant change to the > structure of the class (adding new methods, changing method or field signatures, > etc) renders serialized forms invalid". It is a common practice to explicitly > assign a serial version UID to a class, thereby disabling this mechanism. > > Classes that expect to evolve over time may use readObject/writeObject and/or > readResolve/writeReplace to customize the mapping between object state and > bytestream. If classes do not use this mechanism, serialization uses a default > schema evolution mechanism to adjust for changes in fields between serialization > and deserialization time; fields that are present in the bytestream but not in > the target class are ignored, and fields that are present in the target class > but not the bytestream get default values (zero, null, etc.) > > Anonymous classes follow the same approach and have access to the same > mechanisms (serialVersionUID, read/writeObject, etc), but they have two > additional sources of instability: > > * The name is generated as EnclosingClass$nnn. Any change to the set of > anonymous classes in the enclosing class may cause sequence numbers to change. > * The number and type of fields (appears in bytecode but not source code) are > generated based on the set of captured values. Any change to the set or > order of captured values can cause these signatures to change (in an > unspecified way). > > If the signatures remain stable, anonymous classes can use serialization hooks > to customize the serialized form, just like named classes. > > The EG has observed that users have largely learned to deal with the problems of > serialization of inner classes, either by (a) don't do it, or (b) ensure that > essentially the same bits are present on both sides of the pipe, preventing skew > from causing instability in either class names or signatures. > > The EG has set, as a minimum bar, that lambda serialization be "at least as good > as" anonymous class serialization. (This is not a high bar.) Further, the EG > has concluded that gratuitous deviations from anonymous class serialization are > undesirable, because, if users have to deal with an imperfect scheme, having > them deal with something that is basically the same as an imperfect scheme > they've already gotten used to is preferable to dealing with a new and > different scheme. > > Further, the EG has rejected the idea of arbitrarily restricting access to > serialization just because it is dangerous; users who have learned to use it > safely should not be unduly encumbered. > > *Failure modes > * > > For anonymous classes, one of two things will happen when attempting to > deserialize after things have changed "too much": > > 1. A deserialization failure due to either the name or signature not matching, > resulting in NoSuchMethodError, IncompatibleClassChangeError, etc. > 2. Deserializing to the wrong thing, without any evidence of error. > > Obviously, a type-2 failure is far worse than a type-1 failure, because no error > is raised and an unintended computation is performed. Here are two examples of > changes that are behaviorally compatible but which will result in type-2 > failures. The first has to do with order-of-declaration. > > *Old code** > * *New code** > * *Result** > * > Runnable r1 = new Runnable() { > void run() { > System.out.println("one"); > } > }; > Runnable r2 = new Runnable() { > void run() { > System.out.println("two"); > } > }; > Runnable r2 = new Runnable() { > void run() { > System.out.println("two"); > } > }; > Runnable r1 = new Runnable() { > void run() { > System.out.println("one"); > } > }; > Deserialized r1 (across skew) prints "two". > > This fails because in both cases, we get classes called Foo$1 and Foo$2, but in > the old code, these correspond to r1 and r2, but in the new code, these > correspond to r2 and r1. > > The other failure has to do with order-of-capture. > > *Old code** > * *New code** > * *Result** > * > String s1 = "foo"; > String s2 = "bar"; > Runnable r = new Runnable() { > void run() { > foo(s1, s2); > } > }; > > String s1 = "foo"; > String s2 = "bar"; > Runnable r = new Runnable() { > void run() { > String s = s2; > foo(s1, s); > } > }; > On deserialization, s1 and s2 are effectively swapped. > > This fails because the order of arguments in the implicitly generated > constructor of the inner class changes due to the order in which the compiler > encounters captured variables. If the reordered variables were of different > types, this would cause a type-1 failure, but if they are the same type, it > causes a type-2 failure. > > *User expectations* > > While experienced users are quick to state the "same bits on both sides" rule > for reliable deserialization, a bit of investigation reveals that user > expectations are actually higher than that. For example, if the compiler > generated a /random/ name for each lambda at compile time, then recompiling the > same source with the same compiler, and using the result for deserialization, > would fail. This is too restrictive; user expectations are not tied to "same > bits", but to a vaguer notion of "I compiled essentially the same source with > essentially the same compiler, and therefore didn't change anything > significant." For example, users would balk if adding a comment or changing > whitespace were to affect deserialization. Users likely expect (in part, due to > behavior of anonymous classes) changes to code that doesn't affect the lambda > directly or indirectly (e.g., add or remove a debugging println) also would not > affect the serialized form. > > In the absence of the user being able to explicitly name the lambda /and/ its > captures (as C++ does), there is no perfect solution. Instead, our goal can > only be to minimize type-2 failures while not unduly creating type-1 failures > when "no significant code change" happened. This means we have to put a stake > in the ground as to what constitutes "significant" code change. > > The de-facto (and likely accidental) definition of "significant" used by inner > classes here is: > > * Adding, removing, or reordering inner class instances earlier in the source > file; > * Changes to the number, order, or type of captured arguments > > This permits changes to code that has nothing to do with inner classes, and many > common refactorings as long as they do not affect the order of inner class > instances or their captures. > > *Current Lambda behavior* > > Lambda serialization currently behaves very similarly to anonymous class > serialization. Where anonymous classes have stable method names but unstable > class names, lambdas are the dual; unstable method names but stable class > names. But since both are used together, the resulting naming stability is > largely the same. > > We do one thing to increase naming stability for lambdas: we hash the name and > signature of the enclosing method in the lambda name. This insulates lambda > naming from the addition, removal, or reordering of methods within a class file, > but naming stability remains sensitive to the order of lambdas within the > method. Similarly, order-of-capture issues are largely similar to inner classes. > > Lambdas bodies are desugared to methods named in the following form: > lambda$/mmm/$/nnn/, where /mmm/ is a hash of the method name and signature, and > /nnn/ is a sequence number of lambdas that have the same /mmm/ hash. > > Because lambdas are instantiated via invokedynamic rather than invoking a > constructor directly, there is also slightly more leniency to changes to the > /types/ of captured argument; changing a captured argument from, say, String to > Object, would be a breaking change for anonymous classes (it changes the > constructor signature) but not for lambdas. This leniency is largely an > accidental artifact of translation, rather than a deliberate design decision. > > *Possible improvements* > > We can start by recognizing the role of the hash of the enclosing method in the > lambda method name. This reduces the set of lambdas that could collide from > "all the lambdas in the file" to "all the lambdas in the method." This reduces > the set of changes that cause both type-1 and type-2 errors. > > An additional observation is that there is a tension between trying to /recover > from/ skew (rather than simply trying to detect it, and failing deserialization) > and complexity. So I think we should focus primarily on detecting skew and > failing deserialization (turning type-2 failures into type-1) while at the same > time not unduly increasing the set of changes that cause type-1 errors, with the > goal of settling on an informal guideline of what constitutes "too much" change. > > We can do this by increasing the number of things that affect the /mmm/ hash, > effectively constructing the lambda-equivalent of the serialization version > UID. The more context we add to this hash, the smaller the set of lambdas that > hash to the same bucket gets, which reduces the space of possible collisions. > The following table shows possible candidates for inclusion, along with examples > of code that illustrate dependence on this item. > > *Item** > * *Old Code** > ------------------------------ > * *New Code** > **------------------------------* > *Effect** > * *Rationale** > * > Names of captured arguments > int x = ... > f(() -> x); > int y = ... > f(() -> y); Including the names of captured arguments in the hash would cause > rename-refactors of captured arguments to be considered a serialization-breaking > change. > While alpha-renaming is generally considered to be semantic-preserving, > serialization has always keyed off of names (such as field names) as being clues > to developer intent. It seems reasonable to say "If you change the names > involved, we have to assume a semantic change occurred." We cannot tell if a > name change is a simple alpha-rename or capturing a completely different > variable, so this is erring on the safe side. > Types of captured arguments > String x = ... > f(() -> x); Object x = ... > f(() -> x); > It seems reasonable to say that, if you capture arguments of a different type, > you've made a semantic change. > Order of captured arguments > () -> { > int a = f(x); > int b = g(y); > return h(a,b); > }; > () -> { > int b = g(y); > int a = f(x); > return h(a,b); > }; Changing the order of capture would become a type-1 failure rather than > possibly a type-2 failure. > Since we cannot detect whether the ordering change is semantically meaningful > or not, it is best to be conservative and say: change to capture order is likely > a semantic change. > Variable assignment target (if present) > Runnable r1 = Foo::f; > Runnable r2 = Foo::g; > Runnable r2 = Foo::g; > Runnable r1 = Foo::f; > > Including variable target name would render this reordering recoverable and correct > If the user has gone to the effort of providing a name, we can use this as a > hint to the meaning of the lambda. > > Runnable r = Foo::f; Runnable runnable = Foo::f; Including variable target > name would render this change (previously recoverable and correct) a > deserialiation failure > If the user has changed the name, it seems reasonable to treat that as possibly > meaning something else. > Target type > Predicate p = String::isEmpty; > Function p = String::isEmpty; Including target type reduces > the space of potential sequence number collisions. > If you've changed the target type, it is a different lambda. > > This list is not exhaustive, and there are others we might consider. (For > example, for lambdas that appear in method invocation context rather than > assignment context, we might include the hash of the invoked method name and > signature, or even the parameter index or name. This is where it starts to > exhibit diminishing returns and increasing brittleness.) > > Taken in total, the effect is: > > * All order-of-capture issues become type-1 failures, rather than type-2 > failures (modulo hash collisions). > * Order of declaration issues are still present, but they are dramatically > reduced, turning many type-2 failures into type-1 failures. > * Some new type-1 failures are introduced, mostly those deriving from > rename-refactors. > > The remaining type-2 failures could be dealt with if we added named lambdas in > the future. (They are also prevented if users always assign lambdas to local > variables whose names are unique within the method; in this way, the > local-variable trick becomes a sort of poor-man's named lambda.) > > We can reduce the probability of collision further by using a different (and > simpler) scheme for non-serializable lambdas (lambda$nnn), so that serializable > lambdas can only accidentally collide with each other. > > However, there are some transformations which we will still not be able to avoid > under this scheme. For example: > > *Old code** > * *New code** > * *Result** > * > Supplier s = > foo ? () -> 1 > : () -> 2; > Supplier s = > !foo ? () -> 2 > : () -> 1; This change is behaviorally compatible but could result in > type-2 failure, since both lambdas have the same target type, capture arity, etc. > > However^2, we can still detect this risk and warn the user. If for any /mmm/, > we issue more than one sequence number /nnn/, we are at risk for a type-2 > failure, and can issue a lint warning in that case, suggesting the user refactor > to something more stable. (Who knows what that diagnostic message will look > like.) With all the hash information above, it seems likely that the number of > potentially colliding lambdas will be small enough that this warning would not > come along too often. > > The impact of this change in the implementation is surprisingly small. It does > not affect the serialized form (java.lang.invoke.SerializedLambda), or the > generated deserialization code ($deserialize$). It only affects the code which > generates the lambda method name, which needs access to a small additional bit > of information -- the assignment target name. Similarly, detecting the > condition required for warning is easy -- "sequence number != 1". > > Qualitatively, the result is still similar in feel to inner classes -- you can > make "irrelevant" changes but we make no heroic attempts to recover from things > like changes in capture order -- but we do a better job of detecting them (and, > if you follow some coding discipline, you can avoid them entirely.) > > From stephan.herrmann at berlin.de Thu Aug 22 15:36:33 2013 From: stephan.herrmann at berlin.de (Stephan Herrmann) Date: Fri, 23 Aug 2013 00:36:33 +0200 Subject: abstract method in indirect superclass In-Reply-To: References: <51D95F4B.2070708@berlin.de> Message-ID: <52169271.2000508@berlin.de> On 07/12/2013 08:59 PM, Dan Smith wrote: > I immediately thought of this bug: > http://bugs.sun.com/view_bug.do?bug_id=8010681 Is the following example related, or covered by some previous discussion? Sorry, if this is off-topic for this list: class A {} class B extends A {} class C extends B {} abstract class Test1 { abstract void foo(T1 param1); } abstract class Test2 extends Test1 { @Override void foo(B param2) { System.out.println("2 " + param2); } } public class Test3 extends Test2 { @Override void foo(C param3) { System.out.println("3 " + param3); } } Here Test3 has two *independent* overrides of the same root, yet none of the compilers complains. If the override in Test2 cannot be undone, as Dan says, how can we override the same method Test1.foo again? Of course, Test3.foo(C) cannot override Test2.foo(B), but shouldn't it then be interpreted as an entirely new method, not overriding anything? Note that bridges for these unrelated methods do override each other. The outcome of this snippet is determined by this accidental overriding of bridges: Test1 t = new Test3(); // either raw Test1 or Test1, same byte code produced t.foo(new C()); // invokes Test3.foo via its bridge, which overrides the bridge in Test2 This looks scary to me... Stephan From daniel.smith at oracle.com Mon Aug 26 14:54:26 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 26 Aug 2013 15:54:26 -0600 Subject: Overload resolution simplification In-Reply-To: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> Message-ID: Recall that I outlined a simplification to overload resolution a few weeks ago. There has been some useful discussion about the implications; I'd like now to finalize the decision. I do not sense a general pushback from EG members, but please speak up if at this point you're uncomfortable with following the approach I presented. There's been a long discussion on the lambda-spec-observers list related to this, which Maurizio has been kind enough to provide an expert voice in. Those who dislike the proposed overloading behavior generally express discomfort with some of the compiler's limitations: 1) can't disambiguate based on parameter usage in the lambda body 2) can't see "obvious" information from return context and check lambdas early 3) can't see "obvious" matching parameter types and check lambdas early Of course, there are trade-offs between simplicity and expressiveness, and the proposal sides with simplicity in each case. We can examine them more closely: 1) I've gotten pretty definitive feedback from the EG that we do not want overload resolution to depend on how a lambda parameter is (or is not) used in a lambda body. Too many Bad Things happen: simple refactorings can break things, users struggle to understand multiple typings of the same expression, lots of technical problems. 2) The canonical example here is Comparators.comparing. Java has always made overload resolution choices independent of context, and we do not propose to change that basic design principle now. Thus, there is simply no way to choose between overloads when the only clue for disambiguating is buried in an implicit lambda body that can't yet be type-checked. (To be clear: _this is not new_. The overloading limitation has always been there, and no concrete proposal explored by this JSR has done away with it.) 3) This one is more subtle. The idea is that, in principle, we could carve out exceptions for (1) and (2), but still rely on implicit lambdas for overload disambiguation if neither one applies. I proposed an analysis at the August 1 meeting that would do this: carving out (1) by raising an error if an attempt was made to type the same lambda with multiple sets of parameter types, and carving out (2) by distinguishing between invocation-context-dependent lambdas and invocation-context-independent lambdas. The feedback I got is that we needed something simpler. Hence, the simplified proposal that defers all implicit lambda typing until after overload resolution. Is the sacrifice of expressiveness important? Zhong Yu described three reasonable overloading patterns, to give a sense of what is being lost [1]: > 1. primitive vs boxed > > map( T -> Integer ) > map( T -> int ) > > 2. flat map > > Future > Future then( T -> S ); > Future then( T -> Future ); > > 3. void vs Void > > Future then( T -> S ); // where S=Void > Future then( T->void ); > // in the 2nd version, the lambda body does not need to return (Void)null I argued in my initial mail that our experience with the Stream APIs suggests to us that, even without the language restriction, it's often helpful to provide disambiguating names anyway. Otherwise, the distinctions might be too subtle. (For example, we renamed 'map' to 'mapToInt', etc., _without_ being required to do so by the compiler, in part because it provided useful information about otherwise-subtle behavior.) We also found in the Stream API that deceptively-similar patterns can arise in cases in which (1) or (2) apply, and so i) it's not obvious when overloaded declarations like this are "okay", and ii) it's hard to avoid sometimes having to use a set of distinct names for a (conceptually) single overloaded operation, leading to naming inconsistencies. (For example, 'flatMap' and 'comparing' had to be renamed. And then it seemed silly to have a 'map' that didn't match 'flatMap'.) One suggestion in lambda-spec-observers was that we could support examples like these three, while avoiding some of the complexity of a more general approach to (3), by checking implicit lambdas "early" if all remaining overloads agree on the parameter types for the lambdas (and never when inference variables appear in parameter types). While reducing some complexity, this still has the two problems I described in the previous paragraph. Maurizio (Aug 19) had some additional critiques [2]: > the restriction that I've seen popping up more frequently is: only do > type-checking if _all_ overloads agree on the implicit lambda parameter > types, correct? There would be no combinatorial explosion then. > > This is generally a good strategy (in fact the one I've been proposing > since the beginning as a possible incremental update). But it's not > issue free. It requires global reasoning for the user. To answer the > question: will this lambda be checked during overload (so that most > specific will get better treatment)? You really have to work hard, look > at all overloads, and see if they agree on the lambda parameter type. As > I was trying to explain last week, this is way harder than it looks on > the surface, as those target types can be nested within other generic > types, type-variable declarations can be subtly swapped, you can have > wildcards which require their own handling. All this _will_ make things > more complex for the user (in the current model a lambda is type-checked > during overload only if it has explicit parameters - rather easy to see, > no?). > > Another problem is that the approach will create an asymmetry between > generic and non-generic method overloads --- Last, there's been some confusion about how method references fit into this story, so I thought I'd try to clarify. We put method references in two categories: "exact" and "inexact" (trying out this terminology; feel free to suggest alternatives). Exact method references always refer to the same method, with the same arity and parameter/return types, independent of context. To meet this standard, they must not be: - Overloaded (some other method exists in the same type with the same name) - Generic (declares type parameters) - Varargs (This design is informed by the fact that a high percentage of method declarations meet this standard. The remaining cases might eventually be handled with some combination of i) stronger inference, and/or ii) explicit parameter type syntax.) (A method reference like Foo::m is a special case -- it could be an unbound instance method reference or a static method reference; but since there's only one method, we can determine the arity of this reference in a context-independent way by seeing whether the method is declared static or not.) Exact method references are analogous to explicit lambdas: the "meaning" of the expression is context-independent, even though its type (is it a Function? a Predicate?) is not. During overload resolution, exact method references can be used to disambiguate, like explicit lambdas. Inference constraints can be produced from the referenced method's parameter and return types. Most-specific logic that prefers ToIntFunction over Function can be employed. Inexact method references behave like implicit lambdas. The only thing checked during overload resolution is that they have the right "shape" (an arity check -- but note that, unlike implicit lambdas, an inexact method reference can support multiple arities, so we test whether any possible referenced declaration has an appropriate arity). If an inexact method reference is passed as an argument to an overloaded method where multiple targeted functional interfaces have a compatible arity, an ambiguity generally occurs, unless some other invocation argument disambiguates. The "meaning" of the invocation, like the meaning of a lambda body, remains unknown until a set of concrete parameter types can be provided by a target type. While we don't have the benefit of two different syntaxes to visually distinguish the two forms, the goal is for the same rules that apply to implicit/explicit lambdas to also apply to method references. Hopefully this makes reasoning about overloading behavior in the presence of method references tractable for programmers. --- Conclusion: the discussion (and time to mull over and experiment with things) has not dissuaded us Oracle folks from thinking the proposed path is a good idea. Nor do I sense substantial pushback from the EG, while at the same time this solves some of the hard complexity problems that the EG was uncomfortable with. The prototype (in the OpenJDK Lambda repository) seems to be working. We have a much cleaner story to tell users. And, finally, this conservative path leaves us room to change our mind and add extra power in a later version, if needed. So it looks like all signs point to adopting this plan. Please chime in if you feel like there's anything we've overlooked... ?Dan [1] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000422.html [2] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000474.html On Aug 8, 2013, at 6:19 PM, Dan Smith wrote: > We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. > > The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. > > A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. > > So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: > > Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. > > Benefits of this approach: > - Very easy to understand -- it's mostly a syntactic distinction > - Consistent between all different patterns of overloading that were previously treated differently > - Facilitates a simple declaration-site warning check when method signatures conflict > - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas > - Avoids re-checking lambdas with different parameter types which means: > -- Typing of lambda bodies is easier for users to process > -- Implementations don't have to do speculative checking of arbitrary blocks of code > -- Bad theoretical complexity goes away > > We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). > > Any questions/concerns? > > --- > > Here's an example of something we would stop disambiguating: > > interface I { > R map(Function f); > int map(ToIntFunction f); > long map(ToLongFunction f); > double map(ToDoubleFunction f); > } > > someIofString.map(s -> s.length()); > > Declaration-site workaround: rename the methods. > > Use-site workaround: explicit parameter type: > someIofString.map((String s) -> s.length()); > > --- > > Here's an example of something else we would stop disambiguating: > > static void m(Function f); > static void m(ToIntFunction f); > > m(x -> x.length() > 10 ? 5 : 10); > > --- > > And here's something that we never could disambiguate in the first place (due to fundamental design constraints): > > interface Comparators { > > Comparator comparing(Function f); > Comparator comparing(ToIntFunction f); > } > > Comparator cs = Comparators.comparing(s -> -s.length()); > > --- > > ?Dan From daniel.smith at oracle.com Mon Aug 26 15:36:06 2013 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 26 Aug 2013 16:36:06 -0600 Subject: abstract method in indirect superclass In-Reply-To: <52169271.2000508@berlin.de> References: <51D95F4B.2070708@berlin.de> <52169271.2000508@berlin.de> Message-ID: On Aug 22, 2013, at 4:36 PM, Stephan Herrmann wrote: > On 07/12/2013 08:59 PM, Dan Smith wrote: >> I immediately thought of this bug: >> http://bugs.sun.com/view_bug.do?bug_id=8010681 > > Is the following example related, or covered by some previous > discussion? Sorry, if this is off-topic for this list: > > class A {} > class B extends A {} > class C extends B {} > > abstract class Test1 { > abstract void foo(T1 param1); > } > > abstract class Test2 extends Test1 { > @Override > void foo(B param2) { > System.out.println("2 " + param2); > } > } > > public class Test3 extends Test2 { > @Override > void foo(C param3) { > System.out.println("3 " + param3); > } > } > > Here Test3 has two *independent* overrides of the same root, > yet none of the compilers complains. > If the override in Test2 cannot be undone, as Dan says, > how can we override the same method Test1.foo again? > > Of course, Test3.foo(C) cannot override Test2.foo(B), > but shouldn't it then be interpreted as an entirely > new method, not overriding anything? > > Note that bridges for these unrelated methods > do override each other. The outcome of this snippet > is determined by this accidental overriding of bridges: > Test1 t = new Test3(); // either raw Test1 or Test1, same byte code produced > t.foo(new C()); // invokes Test3.foo via its bridge, which overrides the bridge in Test2 > > This looks scary to me... Similar, yes, and weird, yes, but here I don't see a problem. When you invoke Test1.foo, the behavior is to execute Test3.foo, which makes sense because Test3.foo overrides Test1.foo, and it's the "bottom-most" method to do so. On the other hand, Test3.foo _does not_ override Test2.foo, and, in fact, if you try to invoke Test2.foo, that's what will execute. (There is no bridge in Test3 that overrides foo(B).) So everything seems to be consistent... (It's true that 15.12.4.4 doesn't account for generics/erasure/bridges in its description of the behavior of invocation, so this is all a little fuzzy.) ?Dan From david.lloyd at redhat.com Tue Aug 27 14:01:35 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 27 Aug 2013 16:01:35 -0500 Subject: One final stab at improving lambda serialization In-Reply-To: <52123BA4.4020700@oracle.com> References: <52123BA4.4020700@oracle.com> Message-ID: <521D13AF.1000005@redhat.com> On 08/19/2013 10:37 AM, Brian Goetz wrote: > [...] > The other failure has to do with order-of-capture. > > *Old code** > * *New code** > * *Result** > * > String s1 = "foo"; > String s2 = "bar"; > Runnable r = new Runnable() { > void run() { > foo(s1, s2); > } > }; > > String s1 = "foo"; > String s2 = "bar"; > Runnable r = new Runnable() { > void run() { > String s = s2; > foo(s1, s); > } > }; > On deserialization, s1 and s2 are effectively swapped. > > This fails because the order of arguments in the implicitly generated > constructor of the inner class changes due to the order in which the > compiler encounters captured variables. If the reordered variables were > of different types, this would cause a type-1 failure, but if they are > the same type, it causes a type-2 failure. This appears to not strictly be accurate. In my tests I've found that yes the constructor parameter ordering may indeed differ (based on this or (potentially) even any other compiler-specific factor), however since the constructor is not actually used during deserialize (the mandatory no-arg constructor of the first non-serializable superclass is used instead), this difference is irrelevant (to this particular problem). The serialization values are stored by field name in the serialization stream and filled in via reflection; every Java compiler I have access to names these fields e.g. "val$s1" and "val$s2" with the local variable name stored in the field name. From my testing, this case seems like it would work "as expected" (and, given the apparent complete ubiquity of compiler implementation, probably ought to just be formalized in at least the serialization spec by now, if not the JLS itself, but that's really a tangential topic). Also worth noting is that the outer class instance lives in a field called "this$0". Anyway because this has always "just worked", I think that the definition of "significant [change]" used later on in your proposal should probably be changed to removing the "order" (and even "number") of captured arguments; the former is handled as above, and the latter works predictably by simply extrapolating the serialization spec rules about adding/removing fields to captured variables. This is the basis of my earlier gripes about only using capture order (i.e. an array, essentially) for captures instead of names (i.e. a map like serialization does). More upcoming... -- - DML From david.lloyd at redhat.com Tue Aug 27 15:21:06 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Tue, 27 Aug 2013 17:21:06 -0500 Subject: One final stab at improving lambda serialization In-Reply-To: <521D13AF.1000005@redhat.com> References: <52123BA4.4020700@oracle.com> <521D13AF.1000005@redhat.com> Message-ID: <521D2652.7000800@redhat.com> Just want to clarify that this correction applies to the background information regarding anonymous classes, *not* to the actual proposal itself (specifically the "The de-facto (and likely accidental) definition of "significant" used by inner classes here is:..." section under "User expectations"). I think I just about stopped Brian's heart. :-) On 08/27/2013 04:01 PM, David M. Lloyd wrote: > On 08/19/2013 10:37 AM, Brian Goetz wrote: > > [...] >> The other failure has to do with order-of-capture. >> >> *Old code** >> * *New code** >> * *Result** >> * >> String s1 = "foo"; >> String s2 = "bar"; >> Runnable r = new Runnable() { >> void run() { >> foo(s1, s2); >> } >> }; >> >> String s1 = "foo"; >> String s2 = "bar"; >> Runnable r = new Runnable() { >> void run() { >> String s = s2; >> foo(s1, s); >> } >> }; >> On deserialization, s1 and s2 are effectively swapped. >> >> This fails because the order of arguments in the implicitly generated >> constructor of the inner class changes due to the order in which the >> compiler encounters captured variables. If the reordered variables were >> of different types, this would cause a type-1 failure, but if they are >> the same type, it causes a type-2 failure. > > This appears to not strictly be accurate. In my tests I've found that > yes the constructor parameter ordering may indeed differ (based on this > or (potentially) even any other compiler-specific factor), however since > the constructor is not actually used during deserialize (the mandatory > no-arg constructor of the first non-serializable superclass is used > instead), this difference is irrelevant (to this particular problem). > The serialization values are stored by field name in the serialization > stream and filled in via reflection; every Java compiler I have access > to names these fields e.g. "val$s1" and "val$s2" with the local variable > name stored in the field name. From my testing, this case seems like it > would work "as expected" (and, given the apparent complete ubiquity of > compiler implementation, probably ought to just be formalized in at > least the serialization spec by now, if not the JLS itself, but that's > really a tangential topic). > > Also worth noting is that the outer class instance lives in a field > called "this$0". > > Anyway because this has always "just worked", I think that the > definition of "significant [change]" used later on in your proposal > should probably be changed to removing the "order" (and even "number") > of captured arguments; the former is handled as above, and the latter > works predictably by simply extrapolating the serialization spec rules > about adding/removing fields to captured variables. This is the basis > of my earlier gripes about only using capture order (i.e. an array, > essentially) for captures instead of names (i.e. a map like > serialization does). > > More upcoming... > -- > - DML > -- - DML From david.lloyd at redhat.com Thu Aug 29 12:26:18 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Thu, 29 Aug 2013 14:26:18 -0500 Subject: One final stab at improving lambda serialization In-Reply-To: <521D13AF.1000005@redhat.com> References: <52123BA4.4020700@oracle.com> <521D13AF.1000005@redhat.com> Message-ID: <521FA05A.7070900@redhat.com> On 08/27/2013 04:01 PM, David M. Lloyd wrote: > On 08/19/2013 10:37 AM, Brian Goetz wrote: > > [...] >> The other failure has to do with order-of-capture. >> >> *Old code** >> * *New code** >> * *Result** >> * >> String s1 = "foo"; >> String s2 = "bar"; >> Runnable r = new Runnable() { >> void run() { >> foo(s1, s2); >> } >> }; >> >> String s1 = "foo"; >> String s2 = "bar"; >> Runnable r = new Runnable() { >> void run() { >> String s = s2; >> foo(s1, s); >> } >> }; >> On deserialization, s1 and s2 are effectively swapped. >> >> This fails because the order of arguments in the implicitly generated >> constructor of the inner class changes due to the order in which the >> compiler encounters captured variables. If the reordered variables were >> of different types, this would cause a type-1 failure, but if they are >> the same type, it causes a type-2 failure. > > This appears to not strictly be accurate. In my tests I've found that > yes the constructor parameter ordering may indeed differ (based on this > or (potentially) even any other compiler-specific factor), however since > the constructor is not actually used during deserialize (the mandatory > no-arg constructor of the first non-serializable superclass is used > instead), this difference is irrelevant (to this particular problem). > The serialization values are stored by field name in the serialization > stream and filled in via reflection; every Java compiler I have access > to names these fields e.g. "val$s1" and "val$s2" with the local variable > name stored in the field name. From my testing, this case seems like it > would work "as expected" (and, given the apparent complete ubiquity of > compiler implementation, probably ought to just be formalized in at > least the serialization spec by now, if not the JLS itself, but that's > really a tangential topic). > > Also worth noting is that the outer class instance lives in a field > called "this$0". > > Anyway because this has always "just worked", I think that the > definition of "significant [change]" used later on in your proposal > should probably be changed to removing the "order" (and even "number") > of captured arguments; the former is handled as above, and the latter > works predictably by simply extrapolating the serialization spec rules > about adding/removing fields to captured variables. This is the basis > of my earlier gripes about only using capture order (i.e. an array, > essentially) for captures instead of names (i.e. a map like > serialization does). After internal review, and apart from this background factual clarification, Red Hat is satisfied that the proposed improvements will avoid the vast majority of problems that we are concerned about. -- - DML From forax at univ-mlv.fr Sat Aug 31 04:12:18 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 31 Aug 2013 13:12:18 +0200 Subject: Overload resolution simplification In-Reply-To: References: <71B94777-2E9D-4F9A-A7E4-86294BC5F107@oracle.com> Message-ID: <5221CF92.5020100@univ-mlv.fr> I disagree that the actual overload resolution is independent on the context, the choice of the most specific method is but the choice of applicable methods is not independent of the context. so when there are several overloads and an implicitly typed lambda, we can first try to select a subset of all overloads (like we currently select applicable methods) and then use the fact that the inferred signature must be the same for all applicable overloads. if we do that, it think it solves the problem we have with comparing(). cheers, R?mi On 08/26/2013 11:54 PM, Dan Smith wrote: > Recall that I outlined a simplification to overload resolution a few weeks ago. There has been some useful discussion about the implications; I'd like now to finalize the decision. I do not sense a general pushback from EG members, but please speak up if at this point you're uncomfortable with following the approach I presented. > > There's been a long discussion on the lambda-spec-observers list related to this, which Maurizio has been kind enough to provide an expert voice in. Those who dislike the proposed overloading behavior generally express discomfort with some of the compiler's limitations: > 1) can't disambiguate based on parameter usage in the lambda body > 2) can't see "obvious" information from return context and check lambdas early > 3) can't see "obvious" matching parameter types and check lambdas early > > Of course, there are trade-offs between simplicity and expressiveness, and the proposal sides with simplicity in each case. We can examine them more closely: > > 1) I've gotten pretty definitive feedback from the EG that we do not want overload resolution to depend on how a lambda parameter is (or is not) used in a lambda body. Too many Bad Things happen: simple refactorings can break things, users struggle to understand multiple typings of the same expression, lots of technical problems. > > 2) The canonical example here is Comparators.comparing. Java has always made overload resolution choices independent of context, and we do not propose to change that basic design principle now. Thus, there is simply no way to choose between overloads when the only clue for disambiguating is buried in an implicit lambda body that can't yet be type-checked. (To be clear: _this is not new_. The overloading limitation has always been there, and no concrete proposal explored by this JSR has done away with it.) > > 3) This one is more subtle. The idea is that, in principle, we could carve out exceptions for (1) and (2), but still rely on implicit lambdas for overload disambiguation if neither one applies. I proposed an analysis at the August 1 meeting that would do this: carving out (1) by raising an error if an attempt was made to type the same lambda with multiple sets of parameter types, and carving out (2) by distinguishing between invocation-context-dependent lambdas and invocation-context-independent lambdas. The feedback I got is that we needed something simpler. Hence, the simplified proposal that defers all implicit lambda typing until after overload resolution. > > Is the sacrifice of expressiveness important? Zhong Yu described three reasonable overloading patterns, to give a sense of what is being lost [1]: > >> 1. primitive vs boxed >> >> map( T -> Integer ) >> map( T -> int ) >> >> 2. flat map >> >> Future >> Future then( T -> S ); >> Future then( T -> Future ); >> >> 3. void vs Void >> >> Future then( T -> S ); // where S=Void >> Future then( T->void ); >> // in the 2nd version, the lambda body does not need to return (Void)null > I argued in my initial mail that our experience with the Stream APIs suggests to us that, even without the language restriction, it's often helpful to provide disambiguating names anyway. Otherwise, the distinctions might be too subtle. (For example, we renamed 'map' to 'mapToInt', etc., _without_ being required to do so by the compiler, in part because it provided useful information about otherwise-subtle behavior.) > > We also found in the Stream API that deceptively-similar patterns can arise in cases in which (1) or (2) apply, and so i) it's not obvious when overloaded declarations like this are "okay", and ii) it's hard to avoid sometimes having to use a set of distinct names for a (conceptually) single overloaded operation, leading to naming inconsistencies. (For example, 'flatMap' and 'comparing' had to be renamed. And then it seemed silly to have a 'map' that didn't match 'flatMap'.) > > One suggestion in lambda-spec-observers was that we could support examples like these three, while avoiding some of the complexity of a more general approach to (3), by checking implicit lambdas "early" if all remaining overloads agree on the parameter types for the lambdas (and never when inference variables appear in parameter types). While reducing some complexity, this still has the two problems I described in the previous paragraph. > > Maurizio (Aug 19) had some additional critiques [2]: > >> the restriction that I've seen popping up more frequently is: only do >> type-checking if _all_ overloads agree on the implicit lambda parameter >> types, correct? There would be no combinatorial explosion then. >> >> This is generally a good strategy (in fact the one I've been proposing >> since the beginning as a possible incremental update). But it's not >> issue free. It requires global reasoning for the user. To answer the >> question: will this lambda be checked during overload (so that most >> specific will get better treatment)? You really have to work hard, look >> at all overloads, and see if they agree on the lambda parameter type. As >> I was trying to explain last week, this is way harder than it looks on >> the surface, as those target types can be nested within other generic >> types, type-variable declarations can be subtly swapped, you can have >> wildcards which require their own handling. All this _will_ make things >> more complex for the user (in the current model a lambda is type-checked >> during overload only if it has explicit parameters - rather easy to see, >> no?). >> >> Another problem is that the approach will create an asymmetry between >> generic and non-generic method overloads > --- > > Last, there's been some confusion about how method references fit into this story, so I thought I'd try to clarify. > > We put method references in two categories: "exact" and "inexact" (trying out this terminology; feel free to suggest alternatives). > > Exact method references always refer to the same method, with the same arity and parameter/return types, independent of context. To meet this standard, they must not be: > - Overloaded (some other method exists in the same type with the same name) > - Generic (declares type parameters) > - Varargs > > (This design is informed by the fact that a high percentage of method declarations meet this standard. The remaining cases might eventually be handled with some combination of i) stronger inference, and/or ii) explicit parameter type syntax.) > > (A method reference like Foo::m is a special case -- it could be an unbound instance method reference or a static method reference; but since there's only one method, we can determine the arity of this reference in a context-independent way by seeing whether the method is declared static or not.) > > Exact method references are analogous to explicit lambdas: the "meaning" of the expression is context-independent, even though its type (is it a Function? a Predicate?) is not. > > During overload resolution, exact method references can be used to disambiguate, like explicit lambdas. Inference constraints can be produced from the referenced method's parameter and return types. Most-specific logic that prefers ToIntFunction over Function can be employed. > > Inexact method references behave like implicit lambdas. The only thing checked during overload resolution is that they have the right "shape" (an arity check -- but note that, unlike implicit lambdas, an inexact method reference can support multiple arities, so we test whether any possible referenced declaration has an appropriate arity). If an inexact method reference is passed as an argument to an overloaded method where multiple targeted functional interfaces have a compatible arity, an ambiguity generally occurs, unless some other invocation argument disambiguates. The "meaning" of the invocation, like the meaning of a lambda body, remains unknown until a set of concrete parameter types can be provided by a target type. > > While we don't have the benefit of two different syntaxes to visually distinguish the two forms, the goal is for the same rules that apply to implicit/explicit lambdas to also apply to method references. Hopefully this makes reasoning about overloading behavior in the presence of method references tractable for programmers. > > --- > > Conclusion: the discussion (and time to mull over and experiment with things) has not dissuaded us Oracle folks from thinking the proposed path is a good idea. Nor do I sense substantial pushback from the EG, while at the same time this solves some of the hard complexity problems that the EG was uncomfortable with. The prototype (in the OpenJDK Lambda repository) seems to be working. We have a much cleaner story to tell users. And, finally, this conservative path leaves us room to change our mind and add extra power in a later version, if needed. So it looks like all signs point to adopting this plan. > > Please chime in if you feel like there's anything we've overlooked... > > ?Dan > > [1] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000422.html > [2] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000474.html > > On Aug 8, 2013, at 6:19 PM, Dan Smith wrote: > >> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference). There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity. >> >> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity. In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction. >> >> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery. And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns. >> >> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach: >> >> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references. If the body of a lambda is important for disambiguation, it must have explicit parameter types. >> >> Benefits of this approach: >> - Very easy to understand -- it's mostly a syntactic distinction >> - Consistent between all different patterns of overloading that were previously treated differently >> - Facilitates a simple declaration-site warning check when method signatures conflict >> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas >> - Avoids re-checking lambdas with different parameter types which means: >> -- Typing of lambda bodies is easier for users to process >> -- Implementations don't have to do speculative checking of arbitrary blocks of code >> -- Bad theoretical complexity goes away >> >> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last). >> >> Any questions/concerns? >> >> --- >> >> Here's an example of something we would stop disambiguating: >> >> interface I { >> R map(Function f); >> int map(ToIntFunction f); >> long map(ToLongFunction f); >> double map(ToDoubleFunction f); >> } >> >> someIofString.map(s -> s.length()); >> >> Declaration-site workaround: rename the methods. >> >> Use-site workaround: explicit parameter type: >> someIofString.map((String s) -> s.length()); >> >> --- >> >> Here's an example of something else we would stop disambiguating: >> >> static void m(Function f); >> static void m(ToIntFunction f); >> >> m(x -> x.length() > 10 ? 5 : 10); >> >> --- >> >> And here's something that we never could disambiguate in the first place (due to fundamental design constraints): >> >> interface Comparators { >> > Comparator comparing(Function f); >> Comparator comparing(ToIntFunction f); >> } >> >> Comparator cs = Comparators.comparing(s -> -s.length()); >> >> --- >> >> ?Dan