From brian.goetz at oracle.com Mon Dec 7 20:55:51 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 7 Dec 2015 15:55:51 -0500 Subject: Welcome to valhalla-spec-experts Message-ID: <5665F257.2070007@oracle.com> Welcome to valhalla-spec-experts! This list is an OpenJDK-hosted precursor to (hopefully) an eventual JCP Expert Group. The initial membership includes representation from Oracle, IBM, JetBrains, Red Hat, and Google, as well as individuals from the OpenJDK and Scala communities. Project Valhalla echoes the approach of Project Lambda, which started with a single, simple-sounding idea (add lambda expressions to the Java language) -- but the consequences of adding this "simple feature" were anything but simple, sending ripples throughout the JVM, language, and libraries. In addition to the many technical challenges, the overriding stewardship challenge was to find the natural scope for these ripples; damping them too aggressively would would result in a key feature looking "nailed onto the side", but at the same time, we needed to avoid getting too carried away by the excitement of adding cool new features "because we can" -- because otherwise the project would have exceeded its delivery and complexity budgets. Project Valhalla's core feature is adding user-definable value types to the JVM type system (http://cr.openjdk.java.net/~jrose/values/values-0.html). The consequences of adding this simple-sounding feature, however, are deep and far-ranging, rippling into the language type system, generics, the inheritance model, the bytecode set, and the core library APIs. To illustrate: if we had value types, but couldn't instantiate generic types with a value (i.e., no ArrayList without appealing to boxing), this would be pretty useless; if boxing were a good enough solution here, we wouldn't even have bothered with value types. So pulling on the "value types" string, we get the need to generify over both references and values. And pulling on that string some more, we discover that our existing Collections library has methods (e.g., Map.get()) which bake in assumptions that "T is always an Object", leading us to a reexamination of these core APIs. Which in turn leads us to exploring additional features for API migration, which becomes more relevant as our core APIs approach being old enough to drink. Similarly, as we add value types, there are some core classes (e.g., LocalDateTime, Optional) which always wanted to be values, and would benefit tremendously from being so. It therefore becomes desirable to migrate these to be value types, in a way that's both binary and source compatible with existing clients and subclasses? We've been exploring the bounds of this problem space for more than a year now, and we think we're approaching a reasonable understanding of the scope and breadth of the various design tradeoffs. Over the next few weeks, I'm going to pick some initial issues for discussion and try to frame them so that their overlap with other aspects is minimized. Stay tuned! From brian.goetz at oracle.com Wed Dec 16 17:18:27 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 Dec 2015 12:18:27 -0500 Subject: Migrating methods in Collections Message-ID: <56719CE3.1030600@oracle.com> I'd like to start with the "What about Collections" discussion. While this topic is inextricably linked to all the other topics (value types, generic specialization, nullability, etc), to avoid getting overwhelmed, let's start by trying to keep our focus on the Collections API and the process of migrating it into an anyfied world. So, I'll offer an ahead-of-time warning: there's going to be lots here that you will want to question, and to many of those questions I'm going to answer "OK, but let's come back to that later", because I'd like to stay focused for the time being on the Collection-related issues. (Let's also try to create separate threads for separate discussion topics, though I know this is harder to remember to do.) Most of these are already prototyped in some form in the Valhalla repos (which at least demonstrates that they are plausible.) BackgroundMaterial ------------------- The last "State of the Specialization" (http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html, Dec 2014) outlines an early approach towards the specialization challenges and features. My talk at JVMLS 2015 (https://www.youtube.com/watch?v=uNgAFSUXuwc) outlines the progress that had been made between last December and August. The State of the Values (http://cr.openjdk.java.net/~jrose/values/values-0.html) outlines the basic model for value types. If you've not read/viewed these, that's a good place to start. Core Goals ---------- The key requirement that drove many of the design choices in Java 5 when adding generics is *gradual migration compatibility*. This means it must be possible to evolve a non-generic class to be generic in a manner that is binary-compatible and source-compatible for both clients and subtypes. So an existing client or subtype should continue to work whether it is recompiled or not, and generified or not, and it should be equally practical to generify subtypes and clients at the same time as a library class, or later, or never. We adopt similar goals for "anyfication" -- anyfying a class should be binary- and source-compatible for clients and subclasses, and it should be possible to anyfy a generic class without requiring that either its clients or subclasses be anyfied, and whether or not they are recompiled. Basic Model, Ultra-summarized ----------------------------- Just as the requirement of gradual migration compatibility was a significant design constraint in Java 5 (and which pushed us strongly towards erasure), this requirement is going to be a significant constraint here. Essentially, this means that parameterizations like ArrayList need to continue to be erased (otherwise they'd not be binary compatible), but parameterizations like ArrayList need to be reified (since we cannot erase int to Object without boxing, and if boxing were good enough, we wouldn't be bothering at all.) This pushes us towards an interpretation of anyfied generics that is erased over reference instantiations and reified over value instantiations. At the same time, there is code that is valid under the assumption that T <: Object that would not be valid if T can take on 'int'. These include domain assumptions (e.g., assignment to null) and identity-based assumptions (a T can be used as a monitor lock.) Accordingly, we cannot simply reinterpret existing reference-generic code to be any-generic; we need some indication from the user that they are opting into the broadened genericity domain (and therefore willing to accept the limitations of this broadened domain.) Our working model for this is to annotate the declaration of a type variable with an "any" modifier (e.g., "class Foo"). We sometimes use the abbreviation 'tvar' to describe a type variable, and 'avar' to describe an any-type variable (similarly, 'ivar' for an inference type variable.) Conceptually, the enhanced generics model is simple: - Non-anyfied generic code means exactly the same thing in Valhalla as it does in Java 5-9; - A reference instantiation of an anyfied generic class means exactly the same thing in Valhalla as it does under erased generics; - A class or method tvar declaration can be annotated with 'any', in which case it can range over value types (including primitives) as well as reference types. There are some restrictions on what you can do to an avar-typed variable to reflect the fact that values may not be nullable, have no monitor locks, that they don't participate in a subtyping relationship with Object (instead, they have a boxing conversion to Object), and that a V[] is not a subtype of Object[]; - There is a new wildcard type, Foo, which is a supertype of both value and reference instantiations of Foo. The enhanced model embraces erasure more explicitly than the current model. Migration Challenges -------------------- Ideally, we could just sprinkle 'any' over our codebase, and be done -- and hopefully for many classes, this is all that will be required. But for some classes, there may be migration challenges, which come in two forms: - A method signature is fundamentally incompatible with anyfication (such as Map.get(), which returns null when the mapping is not present, which makes no sense if V=int); - The existing implementation appeals to assumptions of object-ness, and needs to be adjusted. For the purposes of this memo, I'm going to restrict myself to the first category; we'll come back to the second category later. Here's a more-or-less complete list of methods in Collections that may need some sort of migration help. A blank cell indicates "same as above." *Class** * *Method** * *Issues** * Collection contains(Object) Assumes all Rs are castable to Object without boxing. remove(Object) removeAll(Collection) Should take Collection. Collection not a supertype of all instantiations. retainAll(Collection) containsAll(Collection) toArray() Returns Object[], which is not a supertype of V[] for any value type V. toArray(T[]) Not strictly problematic, but relies on runtime checking that E <: T and on reflection for implementation. List remove(int) Not strictly problematic, but if remove(Object) becomes remove(T), will be a confusing overload. indexOf(Object) Same as contains(Object). Also, would be better as a lambda-consuming method, now that we have lambdas. lastIndexOf(Object) Map containsKey(Object) Same as contains(Object) containsValue(Object) remove(Object) put(K,V) Returns what was there before, or null if there was no mapping. Not strictly problematic, but doesn't project so well to non-nullable Vs. putIfAbsent, replace, computeIfAbsent get(K) Uses null to signal no mapping getOrDefault(Object, V) Same as contains(Object) Queue poll(),peek() Uses null to signal no element Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Same as remove(Object) It has often been suggested that "maybe it is time for Collections 2.0"; some would like to declare this to be the end of the road for existing collections. However, redoing collections is only part of the battle; the Collection types are riddled throughout the JDK and other libraries, so we would need a mechanism for migrating all the methods that consume/dispense collections, and if we were to build new collections, saddling them with compatibility with old collections would be a big stone to hang around their neck. So I think this path is less appealing that it might first appear. However, the techniques described here allow us to fix some of the errors of the past. Partial Methods --------------- Our approach is guided by the following observation: not all methods declared in a generic class Foo need be members of all instantiations of Foo; it is reasonable for some members to be restricted to certain instantiations. In particular, the instantiation Foo is an important distinguished instantiation. If a method m() in Foo is a member of all instantiations of Foo, we say m() is a *total* method. Otherwise we call it a *restricted* or *partial* method. For each method in an existing generic class to be anyfied, it could be made a total method in the new anyfied class, or it could be restricted to reference instantiations -- and either approach is fully compatible with existing clients and subclasses. This provides us a migration path to "leave behind" certain problematic methods (making them members only of reference instantiations), and also to add new methods as long as there is a default implementation at least for reference instantiations. As a simple illustration of this approach, let's say that we want to effectively rename the method List.remove(int) to removeAt(int). Clearly we cannot simply take away remove(int); existing clients would fail to link. Similarly we cannot simply add a new method removeAt(int) without a default; then existing subclasses would fail to compile. What we do here is (using a strawman syntax) add a new total method, with a ref default, and demote the existing method to a partial method on ref instantiations: interface List { // A new, total method void removeAt(int i); // demote the old method to ref-only // Just a strawman syntax void remove(int i); // give the new method a default in terms of the old, for ref default void removeAt(int i) { remove(i); } } Now: - Clients of List can invoke either remove(int) or removeAt(int), and both will do the same thing; this allows existing clients to (if they want) migrate away from the old method completely, since removeAt(int) is total, or keep using the old method, at their choice (IDEs can also provide migration help); - New clients of List can only use removeAt(int) -- they won't even *see* remove(int) when they ask their IDE to auto-complete -- and this is not a compatibility issue as there were no existing clients of List; - Existing (ref) subclasses of List still see the same set of abstract methods, and therefore continue to work exactly as before; - When anyfying a subclass of List, now you have to provide a new implementation for removeAt(int), but this is OK because *you* have decided to anyfy your class, and it is reasonable to expect some possible code changes in this situation; - Further, AbstractList can insulate most subclasses from this change. This gives us at least two tactics for dealing with problematic methods: - Migration: leave the old method in the "ref layer", and create a new total method with a ref-default in the "any layer", as with the removeAt example above; - Abandonment: leave the old method in the ref layer, and do nothing else, which may be suitable if an alternate idiom already exists (e.g., we could abandon removeAll because now we can express removeAll(c) as removeIf(c::contains), without adding any new method.) With the controlled-migration option, the primary impediment is that many of the good names are already taken. These two tactics get us much of the way there, but not all the way there. I'm going to close this note with where these tactics get us, and then open a separate note for some of the options for the remaining cases. The following table shows some possibilities. There's plenty of time to bikeshed on the names. *Class** * *Method** * *Possible Approaches** * Collection contains(Object) Migrate to containsElement(E) remove(Object) Migrate to removeElement(E) removeAll(Collection) Migrate to removeElements(Collection). Alternately, abandon in favor of existing removeIf(Predicate). retainAll(Collection) Same containsAll(Collection) Migrate to containsElements(Collection). toArray() Nothing good yet ... toArray(T[]) Leave as is, or abandon in favor of new method toArray(IntFunction), like Stream has. List remove(int) Leave as is, or migrate to removeAt(int). indexOf(Object) Migrate to Optional-bearing findFirst(Predicate) lastIndexOf(Object) Migrate to findLast(Predicate) Map containsKey(Object) Migrate to hasKey(K) containsValue(Object) Migrate to hasValue(V) remove(Object) Migrate to removeMapping(K) put(K,V) Leave as is; accept that we cannot distinguish between "nothing was there before" and "default value was there before" (as is true with null today.) get(K) Migrate to one (or all) of: Optional map(K) mapOrElse(K, V) tryMap(K, Consumer) getOrDefault(Object, V) Migrate to mapOrElse, as above Queue poll(), peek() Migrate to tryPoll(Consumer) *or* optional-bearing method Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Migrate to predicate-accepting method, or simply migrate to new name Notes: For Collection.contains() and remove(), while it may be tempting to try and narrow the argument type from Object to T, this is likely problematic. In the following case, we can't express something that seems pretty natural: Collection animals = ... Collection dogs = ... Dog d = ... // Can't say these for (Animal a : animals) if (dogs.contains(a)) ... animals.removeIf(a -> dogs.contains(a)) Its actually pretty convenient for contains/remove to accept anything that might be a member of the collection, but Object is not the top type we're looking for here (since values require boxing to convert to Object.) So neither the status quo (accept Object) nor recasting to a T-accepting method is all that great. (But see next memo for an alternative.) Its not clear whether all the other Object-accepting methods need the same treatment, or if migration to an E-accepting method is acceptable there. For removeAll/retainAll, we probably just want to abandon in favor of the more powerful removeIf added in 8. (Kevin/Louis -- would be good to get stats on actual usage of removeAll/retainAll.) For Map.get(), it really sucks that the good name is taken but the existing signature is terminally polluted with nullness (even for non-null-supporting maps.) We can migrate to multiple new methods (most of which have defaults in terms of the others): Optional map(K k) V mapOrElse(K k, V defaultV) boolean tryMap(K k, Consumer) With Optional as a value type, there is essentially no cost (either footprint or invocation overhead) for using Optional over a naked reference. So the first sig is not as problematic as it might look. The second has an obvious default in terms of the first. The Optional version, though, has an issue: if the map can contain null values, there's no way to express that. However, I think its OK if this method throws on null values -- the other three get-like methods (the two here plus legacy get()) can all express null values. (So null-lovers don't get to enjoy the Optional goodness. More motivation for them to give up on their nulls!) From brian.goetz at oracle.com Wed Dec 16 17:21:35 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 Dec 2015 12:21:35 -0500 Subject: Migrating methods in Collections Message-ID: <56719D9F.1060406@oracle.com> [ sending again with HTML enabled so the tables don't get mangled ] I'd like to start with the "What about Collections" discussion. While this topic is inextricably linked to all the other topics (value types, generic specialization, nullability, etc), to avoid getting overwhelmed, let's start by trying to keep our focus on the Collections API and the process of migrating it into an anyfied world. So, I'll offer an ahead-of-time warning: there's going to be lots here that you will want to question, and to many of those questions I'm going to answer "OK, but let's come back to that later", because I'd like to stay focused for the time being on the Collection-related issues. (Let's also try to create separate threads for separate discussion topics, though I know this is harder to remember to do.) Most of these are already prototyped in some form in the Valhalla repos (which at least demonstrates that they are plausible.) BackgroundMaterial ------------------- The last "State of the Specialization" (http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html, Dec 2014) outlines an early approach towards the specialization challenges and features. My talk at JVMLS 2015 (https://www.youtube.com/watch?v=uNgAFSUXuwc) outlines the progress that had been made between last December and August. The State of the Values (http://cr.openjdk.java.net/~jrose/values/values-0.html) outlines the basic model for value types. If you've not read/viewed these, that's a good place to start. Core Goals ---------- The key requirement that drove many of the design choices in Java 5 when adding generics is *gradual migration compatibility*. This means it must be possible to evolve a non-generic class to be generic in a manner that is binary-compatible and source-compatible for both clients and subtypes. So an existing client or subtype should continue to work whether it is recompiled or not, and generified or not, and it should be equally practical to generify subtypes and clients at the same time as a library class, or later, or never. We adopt similar goals for "anyfication" -- anyfying a class should be binary- and source-compatible for clients and subclasses, and it should be possible to anyfy a generic class without requiring that either its clients or subclasses be anyfied, and whether or not they are recompiled. Basic Model, Ultra-summarized ----------------------------- Just as the requirement of gradual migration compatibility was a significant design constraint in Java 5 (and which pushed us strongly towards erasure), this requirement is going to be a significant constraint here. Essentially, this means that parameterizations like ArrayList need to continue to be erased (otherwise they'd not be binary compatible), but parameterizations like ArrayList need to be reified (since we cannot erase int to Object without boxing, and if boxing were good enough, we wouldn't be bothering at all.) This pushes us towards an interpretation of anyfied generics that is erased over reference instantiations and reified over value instantiations. At the same time, there is code that is valid under the assumption that T <: Object that would not be valid if T can take on 'int'. These include domain assumptions (e.g., assignment to null) and identity-based assumptions (a T can be used as a monitor lock.) Accordingly, we cannot simply reinterpret existing reference-generic code to be any-generic; we need some indication from the user that they are opting into the broadened genericity domain (and therefore willing to accept the limitations of this broadened domain.) Our working model for this is to annotate the declaration of a type variable with an "any" modifier (e.g., "class Foo"). We sometimes use the abbreviation 'tvar' to describe a type variable, and 'avar' to describe an any-type variable (similarly, 'ivar' for an inference type variable.) Conceptually, the enhanced generics model is simple: - Non-anyfied generic code means exactly the same thing in Valhalla as it does in Java 5-9; - A reference instantiation of an anyfied generic class means exactly the same thing in Valhalla as it does under erased generics; - A class or method tvar declaration can be annotated with 'any', in which case it can range over value types (including primitives) as well as reference types. There are some restrictions on what you can do to an avar-typed variable to reflect the fact that values may not be nullable, have no monitor locks, that they don't participate in a subtyping relationship with Object (instead, they have a boxing conversion to Object), and that a V[] is not a subtype of Object[]; - There is a new wildcard type, Foo, which is a supertype of both value and reference instantiations of Foo. The enhanced model embraces erasure more explicitly than the current model. Migration Challenges -------------------- Ideally, we could just sprinkle 'any' over our codebase, and be done -- and hopefully for many classes, this is all that will be required. But for some classes, there may be migration challenges, which come in two forms: - A method signature is fundamentally incompatible with anyfication (such as Map.get(), which returns null when the mapping is not present, which makes no sense if V=int); - The existing implementation appeals to assumptions of object-ness, and needs to be adjusted. For the purposes of this memo, I'm going to restrict myself to the first category; we'll come back to the second category later. Here's a more-or-less complete list of methods in Collections that may need some sort of migration help. A blank cell indicates "same as above." *Class** * *Method** * *Issues** * Collection contains(Object) Assumes all Rs are castable to Object without boxing. remove(Object) removeAll(Collection) Should take Collection. Collection not a supertype of all instantiations. retainAll(Collection) containsAll(Collection) toArray() Returns Object[], which is not a supertype of V[] for any value type V. toArray(T[]) Not strictly problematic, but relies on runtime checking that E <: T and on reflection for implementation. List remove(int) Not strictly problematic, but if remove(Object) becomes remove(T), will be a confusing overload. indexOf(Object) Same as contains(Object). Also, would be better as a lambda-consuming method, now that we have lambdas. lastIndexOf(Object) Map containsKey(Object) Same as contains(Object) containsValue(Object) remove(Object) put(K,V) Returns what was there before, or null if there was no mapping. Not strictly problematic, but doesn't project so well to non-nullable Vs. putIfAbsent, replace, computeIfAbsent get(K) Uses null to signal no mapping getOrDefault(Object, V) Same as contains(Object) Queue poll(),peek() Uses null to signal no element Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Same as remove(Object) It has often been suggested that "maybe it is time for Collections 2.0"; some would like to declare this to be the end of the road for existing collections. However, redoing collections is only part of the battle; the Collection types are riddled throughout the JDK and other libraries, so we would need a mechanism for migrating all the methods that consume/dispense collections, and if we were to build new collections, saddling them with compatibility with old collections would be a big stone to hang around their neck. So I think this path is less appealing that it might first appear. However, the techniques described here allow us to fix some of the errors of the past. Partial Methods --------------- Our approach is guided by the following observation: not all methods declared in a generic class Foo need be members of all instantiations of Foo; it is reasonable for some members to be restricted to certain instantiations. In particular, the instantiation Foo is an important distinguished instantiation. If a method m() in Foo is a member of all instantiations of Foo, we say m() is a *total* method. Otherwise we call it a *restricted* or *partial* method. For each method in an existing generic class to be anyfied, it could be made a total method in the new anyfied class, or it could be restricted to reference instantiations -- and either approach is fully compatible with existing clients and subclasses. This provides us a migration path to "leave behind" certain problematic methods (making them members only of reference instantiations), and also to add new methods as long as there is a default implementation at least for reference instantiations. As a simple illustration of this approach, let's say that we want to effectively rename the method List.remove(int) to removeAt(int). Clearly we cannot simply take away remove(int); existing clients would fail to link. Similarly we cannot simply add a new method removeAt(int) without a default; then existing subclasses would fail to compile. What we do here is (using a strawman syntax) add a new total method, with a ref default, and demote the existing method to a partial method on ref instantiations: interface List { // A new, total method void removeAt(int i); // demote the old method to ref-only // Just a strawman syntax void remove(int i); // give the new method a default in terms of the old, for ref default void removeAt(int i) { remove(i); } } Now: - Clients of List can invoke either remove(int) or removeAt(int), and both will do the same thing; this allows existing clients to (if they want) migrate away from the old method completely, since removeAt(int) is total, or keep using the old method, at their choice (IDEs can also provide migration help); - New clients of List can only use removeAt(int) -- they won't even *see* remove(int) when they ask their IDE to auto-complete -- and this is not a compatibility issue as there were no existing clients of List; - Existing (ref) subclasses of List still see the same set of abstract methods, and therefore continue to work exactly as before; - When anyfying a subclass of List, now you have to provide a new implementation for removeAt(int), but this is OK because *you* have decided to anyfy your class, and it is reasonable to expect some possible code changes in this situation; - Further, AbstractList can insulate most subclasses from this change. This gives us at least two tactics for dealing with problematic methods: - Migration: leave the old method in the "ref layer", and create a new total method with a ref-default in the "any layer", as with the removeAt example above; - Abandonment: leave the old method in the ref layer, and do nothing else, which may be suitable if an alternate idiom already exists (e.g., we could abandon removeAll because now we can express removeAll(c) as removeIf(c::contains), without adding any new method.) With the controlled-migration option, the primary impediment is that many of the good names are already taken. These two tactics get us much of the way there, but not all the way there. I'm going to close this note with where these tactics get us, and then open a separate note for some of the options for the remaining cases. The following table shows some possibilities. There's plenty of time to bikeshed on the names. *Class** * *Method** * *Possible Approaches** * Collection contains(Object) Migrate to containsElement(E) remove(Object) Migrate to removeElement(E) removeAll(Collection) Migrate to removeElements(Collection). Alternately, abandon in favor of existing removeIf(Predicate). retainAll(Collection) Same containsAll(Collection) Migrate to containsElements(Collection). toArray() Nothing good yet ... toArray(T[]) Leave as is, or abandon in favor of new method toArray(IntFunction), like Stream has. List remove(int) Leave as is, or migrate to removeAt(int). indexOf(Object) Migrate to Optional-bearing findFirst(Predicate) lastIndexOf(Object) Migrate to findLast(Predicate) Map containsKey(Object) Migrate to hasKey(K) containsValue(Object) Migrate to hasValue(V) remove(Object) Migrate to removeMapping(K) put(K,V) Leave as is; accept that we cannot distinguish between "nothing was there before" and "default value was there before" (as is true with null today.) get(K) Migrate to one (or all) of: Optional map(K) mapOrElse(K, V) tryMap(K, Consumer) getOrDefault(Object, V) Migrate to mapOrElse, as above Queue poll(), peek() Migrate to tryPoll(Consumer) *or* optional-bearing method Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Migrate to predicate-accepting method, or simply migrate to new name Notes: For Collection.contains() and remove(), while it may be tempting to try and narrow the argument type from Object to T, this is likely problematic. In the following case, we can't express something that seems pretty natural: Collection animals = ... Collection dogs = ... Dog d = ... // Can't say these for (Animal a : animals) if (dogs.contains(a)) ... animals.removeIf(a -> dogs.contains(a)) Its actually pretty convenient for contains/remove to accept anything that might be a member of the collection, but Object is not the top type we're looking for here (since values require boxing to convert to Object.) So neither the status quo (accept Object) nor recasting to a T-accepting method is all that great. (But see next memo for an alternative.) Its not clear whether all the other Object-accepting methods need the same treatment, or if migration to an E-accepting method is acceptable there. For removeAll/retainAll, we probably just want to abandon in favor of the more powerful removeIf added in 8. (Kevin/Louis -- would be good to get stats on actual usage of removeAll/retainAll.) For Map.get(), it really sucks that the good name is taken but the existing signature is terminally polluted with nullness (even for non-null-supporting maps.) We can migrate to multiple new methods (most of which have defaults in terms of the others): Optional map(K k) V mapOrElse(K k, V defaultV) boolean tryMap(K k, Consumer) With Optional as a value type, there is essentially no cost (either footprint or invocation overhead) for using Optional over a naked reference. So the first sig is not as problematic as it might look. The second has an obvious default in terms of the first. The Optional version, though, has an issue: if the map can contain null values, there's no way to express that. However, I think its OK if this method throws on null values -- the other three get-like methods (the two here plus legacy get()) can all express null values. (So null-lovers don't get to enjoy the Optional goodness. More motivation for them to give up on their nulls!) From brian.goetz at oracle.com Wed Dec 16 17:23:43 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 Dec 2015 12:23:43 -0500 Subject: Migrating methods in Collections Message-ID: <56719E1F.2050504@oracle.com> [ sending again with HTML enabled so the tables don't get mangled ... and again ] I'd like to start with the "What about Collections" discussion. While this topic is inextricably linked to all the other topics (value types, generic specialization, nullability, etc), to avoid getting overwhelmed, let's start by trying to keep our focus on the Collections API and the process of migrating it into an anyfied world. So, I'll offer an ahead-of-time warning: there's going to be lots here that you will want to question, and to many of those questions I'm going to answer "OK, but let's come back to that later", because I'd like to stay focused for the time being on the Collection-related issues. (Let's also try to create separate threads for separate discussion topics, though I know this is harder to remember to do.) Most of these are already prototyped in some form in the Valhalla repos (which at least demonstrates that they are plausible.) BackgroundMaterial ------------------- The last "State of the Specialization" (http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html, Dec 2014) outlines an early approach towards the specialization challenges and features. My talk at JVMLS 2015 (https://www.youtube.com/watch?v=uNgAFSUXuwc) outlines the progress that had been made between last December and August. The State of the Values (http://cr.openjdk.java.net/~jrose/values/values-0.html) outlines the basic model for value types. If you've not read/viewed these, that's a good place to start. Core Goals ---------- The key requirement that drove many of the design choices in Java 5 when adding generics is *gradual migration compatibility*. This means it must be possible to evolve a non-generic class to be generic in a manner that is binary-compatible and source-compatible for both clients and subtypes. So an existing client or subtype should continue to work whether it is recompiled or not, and generified or not, and it should be equally practical to generify subtypes and clients at the same time as a library class, or later, or never. We adopt similar goals for "anyfication" -- anyfying a class should be binary- and source-compatible for clients and subclasses, and it should be possible to anyfy a generic class without requiring that either its clients or subclasses be anyfied, and whether or not they are recompiled. Basic Model, Ultra-summarized ----------------------------- Just as the requirement of gradual migration compatibility was a significant design constraint in Java 5 (and which pushed us strongly towards erasure), this requirement is going to be a significant constraint here. Essentially, this means that parameterizations like ArrayList need to continue to be erased (otherwise they'd not be binary compatible), but parameterizations like ArrayList need to be reified (since we cannot erase int to Object without boxing, and if boxing were good enough, we wouldn't be bothering at all.) This pushes us towards an interpretation of anyfied generics that is erased over reference instantiations and reified over value instantiations. At the same time, there is code that is valid under the assumption that T <: Object that would not be valid if T can take on 'int'. These include domain assumptions (e.g., assignment to null) and identity-based assumptions (a T can be used as a monitor lock.) Accordingly, we cannot simply reinterpret existing reference-generic code to be any-generic; we need some indication from the user that they are opting into the broadened genericity domain (and therefore willing to accept the limitations of this broadened domain.) Our working model for this is to annotate the declaration of a type variable with an "any" modifier (e.g., "class Foo"). We sometimes use the abbreviation 'tvar' to describe a type variable, and 'avar' to describe an any-type variable (similarly, 'ivar' for an inference type variable.) Conceptually, the enhanced generics model is simple: - Non-anyfied generic code means exactly the same thing in Valhalla as it does in Java 5-9; - A reference instantiation of an anyfied generic class means exactly the same thing in Valhalla as it does under erased generics; - A class or method tvar declaration can be annotated with 'any', in which case it can range over value types (including primitives) as well as reference types. There are some restrictions on what you can do to an avar-typed variable to reflect the fact that values may not be nullable, have no monitor locks, that they don't participate in a subtyping relationship with Object (instead, they have a boxing conversion to Object), and that a V[] is not a subtype of Object[]; - There is a new wildcard type, Foo, which is a supertype of both value and reference instantiations of Foo. The enhanced model embraces erasure more explicitly than the current model. Migration Challenges -------------------- Ideally, we could just sprinkle 'any' over our codebase, and be done -- and hopefully for many classes, this is all that will be required. But for some classes, there may be migration challenges, which come in two forms: - A method signature is fundamentally incompatible with anyfication (such as Map.get(), which returns null when the mapping is not present, which makes no sense if V=int); - The existing implementation appeals to assumptions of object-ness, and needs to be adjusted. For the purposes of this memo, I'm going to restrict myself to the first category; we'll come back to the second category later. Here's a more-or-less complete list of methods in Collections that may need some sort of migration help. A blank cell indicates "same as above." *Class** * *Method** * *Issues** * Collection contains(Object) Assumes all Rs are castable to Object without boxing. remove(Object) removeAll(Collection) Should take Collection. Collection not a supertype of all instantiations. retainAll(Collection) containsAll(Collection) toArray() Returns Object[], which is not a supertype of V[] for any value type V. toArray(T[]) Not strictly problematic, but relies on runtime checking that E <: T and on reflection for implementation. List remove(int) Not strictly problematic, but if remove(Object) becomes remove(T), will be a confusing overload. indexOf(Object) Same as contains(Object). Also, would be better as a lambda-consuming method, now that we have lambdas. lastIndexOf(Object) Map containsKey(Object) Same as contains(Object) containsValue(Object) remove(Object) put(K,V) Returns what was there before, or null if there was no mapping. Not strictly problematic, but doesn't project so well to non-nullable Vs. putIfAbsent, replace, computeIfAbsent get(K) Uses null to signal no mapping getOrDefault(Object, V) Same as contains(Object) Queue poll(),peek() Uses null to signal no element Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Same as remove(Object) It has often been suggested that "maybe it is time for Collections 2.0"; some would like to declare this to be the end of the road for existing collections. However, redoing collections is only part of the battle; the Collection types are riddled throughout the JDK and other libraries, so we would need a mechanism for migrating all the methods that consume/dispense collections, and if we were to build new collections, saddling them with compatibility with old collections would be a big stone to hang around their neck. So I think this path is less appealing that it might first appear. However, the techniques described here allow us to fix some of the errors of the past. Partial Methods --------------- Our approach is guided by the following observation: not all methods declared in a generic class Foo need be members of all instantiations of Foo; it is reasonable for some members to be restricted to certain instantiations. In particular, the instantiation Foo is an important distinguished instantiation. If a method m() in Foo is a member of all instantiations of Foo, we say m() is a *total* method. Otherwise we call it a *restricted* or *partial* method. For each method in an existing generic class to be anyfied, it could be made a total method in the new anyfied class, or it could be restricted to reference instantiations -- and either approach is fully compatible with existing clients and subclasses. This provides us a migration path to "leave behind" certain problematic methods (making them members only of reference instantiations), and also to add new methods as long as there is a default implementation at least for reference instantiations. As a simple illustration of this approach, let's say that we want to effectively rename the method List.remove(int) to removeAt(int). Clearly we cannot simply take away remove(int); existing clients would fail to link. Similarly we cannot simply add a new method removeAt(int) without a default; then existing subclasses would fail to compile. What we do here is (using a strawman syntax) add a new total method, with a ref default, and demote the existing method to a partial method on ref instantiations: interface List { // A new, total method void removeAt(int i); // demote the old method to ref-only // Just a strawman syntax void remove(int i); // give the new method a default in terms of the old, for ref default void removeAt(int i) { remove(i); } } Now: - Clients of List can invoke either remove(int) or removeAt(int), and both will do the same thing; this allows existing clients to (if they want) migrate away from the old method completely, since removeAt(int) is total, or keep using the old method, at their choice (IDEs can also provide migration help); - New clients of List can only use removeAt(int) -- they won't even *see* remove(int) when they ask their IDE to auto-complete -- and this is not a compatibility issue as there were no existing clients of List; - Existing (ref) subclasses of List still see the same set of abstract methods, and therefore continue to work exactly as before; - When anyfying a subclass of List, now you have to provide a new implementation for removeAt(int), but this is OK because *you* have decided to anyfy your class, and it is reasonable to expect some possible code changes in this situation; - Further, AbstractList can insulate most subclasses from this change. This gives us at least two tactics for dealing with problematic methods: - Migration: leave the old method in the "ref layer", and create a new total method with a ref-default in the "any layer", as with the removeAt example above; - Abandonment: leave the old method in the ref layer, and do nothing else, which may be suitable if an alternate idiom already exists (e.g., we could abandon removeAll because now we can express removeAll(c) as removeIf(c::contains), without adding any new method.) With the controlled-migration option, the primary impediment is that many of the good names are already taken. These two tactics get us much of the way there, but not all the way there. I'm going to close this note with where these tactics get us, and then open a separate note for some of the options for the remaining cases. The following table shows some possibilities. There's plenty of time to bikeshed on the names. *Class** * *Method** * *Possible Approaches** * Collection contains(Object) Migrate to containsElement(E) remove(Object) Migrate to removeElement(E) removeAll(Collection) Migrate to removeElements(Collection). Alternately, abandon in favor of existing removeIf(Predicate). retainAll(Collection) Same containsAll(Collection) Migrate to containsElements(Collection). toArray() Nothing good yet ... toArray(T[]) Leave as is, or abandon in favor of new method toArray(IntFunction), like Stream has. List remove(int) Leave as is, or migrate to removeAt(int). indexOf(Object) Migrate to Optional-bearing findFirst(Predicate) lastIndexOf(Object) Migrate to findLast(Predicate) Map containsKey(Object) Migrate to hasKey(K) containsValue(Object) Migrate to hasValue(V) remove(Object) Migrate to removeMapping(K) put(K,V) Leave as is; accept that we cannot distinguish between "nothing was there before" and "default value was there before" (as is true with null today.) get(K) Migrate to one (or all) of: Optional map(K) mapOrElse(K, V) tryMap(K, Consumer) getOrDefault(Object, V) Migrate to mapOrElse, as above Queue poll(), peek() Migrate to tryPoll(Consumer) *or* optional-bearing method Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Migrate to predicate-accepting method, or simply migrate to new name Notes: For Collection.contains() and remove(), while it may be tempting to try and narrow the argument type from Object to T, this is likely problematic. In the following case, we can't express something that seems pretty natural: Collection animals = ... Collection dogs = ... Dog d = ... // Can't say these for (Animal a : animals) if (dogs.contains(a)) ... animals.removeIf(a -> dogs.contains(a)) Its actually pretty convenient for contains/remove to accept anything that might be a member of the collection, but Object is not the top type we're looking for here (since values require boxing to convert to Object.) So neither the status quo (accept Object) nor recasting to a T-accepting method is all that great. (But see next memo for an alternative.) Its not clear whether all the other Object-accepting methods need the same treatment, or if migration to an E-accepting method is acceptable there. For removeAll/retainAll, we probably just want to abandon in favor of the more powerful removeIf added in 8. (Kevin/Louis -- would be good to get stats on actual usage of removeAll/retainAll.) For Map.get(), it really sucks that the good name is taken but the existing signature is terminally polluted with nullness (even for non-null-supporting maps.) We can migrate to multiple new methods (most of which have defaults in terms of the others): Optional map(K k) V mapOrElse(K k, V defaultV) boolean tryMap(K k, Consumer) With Optional as a value type, there is essentially no cost (either footprint or invocation overhead) for using Optional over a naked reference. So the first sig is not as problematic as it might look. The second has an obvious default in terms of the first. The Optional version, though, has an issue: if the map can contain null values, there's no way to express that. However, I think its OK if this method throws on null values -- the other three get-like methods (the two here plus legacy get()) can all express null values. (So null-lovers don't get to enjoy the Optional goodness. More motivation for them to give up on their nulls!) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Dec 16 19:43:04 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 Dec 2015 14:43:04 -0500 Subject: Migrating methods in Collections In-Reply-To: <56719E1F.2050504@oracle.com> References: <56719E1F.2050504@oracle.com> Message-ID: <5671BEC8.8010508@oracle.com> The previous memo outlined a tactic for effectively "migrating" a method in a current generic class to a related but different signature in an any-generic class, while retaining source and binary compatibility with existing clients and subclasses, and outlined the list of possibly problematic methods in the Collections framework. The effectiveness of this tactic on this set of methods ranges from "slam-dunk" to "seems like it could work" to "doesn't help." It would be nice to have one hammer that pounds down all the nails, but I'm not sure there is one. This memo outlines a range of complementary tactics that might, in combination with the above approach, enable us to cover the waterfront. I'll start with the method Collection.contains(Object). It might at first seem that we want the signature here to be contains(E), but this gets in the way of cases like: dogs.contains(animal) or of converting dogs::contains to a Predicate(which is what filter/removeIf would want.) Note that this has little to do directly with value types; value types are invariant. But if we want a single contains() method that ranges over any E, it needs to accomodate variance for reference instantiations while not falling back on Object as a top type. One technique for doing so would be to introduce contravariant inference variables. Then we could write contains as: boolean contains(U u) This has three main downsides: - Even though this works, its still not that obvious. - It's a *lot* of work in the spec and compiler; it pushes on all the fragilebits. - If that weren't enough, it is a theoretical minefield. Papers like http://www.cis.upenn.edu/~bcpierce/papers/variance.pdf show that adding contravariance to certain type systems result in subtyping becoming undecidable. On the other hand, I'm sure many library writers would jump for joy to have this in the toolbox; the lack of contravariant tvars seems a notable inconsistency in the language. (But let's not kid ourselves about the costs.) It just so happens that this construct works out; it is binary- and source- compatible to make a non-generic method generic, as long as the erasure of the signature remains the same. So changing contains() or remove() as above would not cause subclasses or clients to fail to either link or recompile (in the subclass recompilation case, it would be reinterpreted as a raw override, which is allowed and compatible.) And we'd end up with a total method that does the right thing both for refs and values. Ignoring the costs and risks, this technique applies to a number of the methods in our rogue's gallery, including toArray(), for which we didn't yet have a solution: U[] toArray() This is compatible with existing clients who are expecting an Object[] to come back from toArray() on a collection of reference types, and collapses to V[] for any value type, so Collection.toArray() returns int[]. (We might still want an unchecked warning if the compiler infers U != Object for reference E, but that's a separate and easily handled consideration.) Let's call this technique "superation" (yes, its an intentional (disgusting) pun. See http://beta.merriam-webster.com/dictionary/suppurate. And think about that the next time you pass a "Super 8" motel on the highway.) With this in our toolbox, the strategy matrix becomes: *Class** * *Method** * *Possible Approaches** * Collection contains(Object) Superateto contains(U) remove(Object) removeAll(Collection) Abandon in favor of existing removeIf(Predicate). retainAll(Collection) containsAll(Collection) Migrate to containsElements(Collection), or abandon. toArray() Superate to U[] toArray() toArray(T[]) Leave as is, superate, or abandon. List remove(int) Migrate to removeAt(int). indexOf(Object) Migrate to Optional-bearing findFirst(Predicate) lastIndexOf(Object) Map containsKey(Object) Superate containsValue(Object) Superate remove(Object) Superate put(K,V) Leave as is get(K) Migrate to one (or all) of: Optional map(K) mapOrElse(K, V) tryMap(K, Consumer) getOrDefault(Object, V) Migrate to mapOrElse, or superate Queue poll(), peek() Migrate to tryPoll(Consumer) *or* optional-bearing method Deque poll(), peek() pollFirst(), pollLast() peekFirst(), peekLast() removeFirstOccurrence(), removeLastOccurrence() Migrate to predicate-accepting method, or superate This is amore satisfying matrix; not only does everything have an acceptable strategy, but some have more than one, and the user impact of superating a method is lower (users might just not notice), so the perception is that fewer methods are affected. Still, super-bounded tvars are a big hammer for such a small foe. Maybe there's an alternate approach that has the effect of superation but doesn't need such a big hammer. Here's one possibility. We already have a notion of partial methods. We could have a pair of methods Object[] toArray() E[] toArray() both of which are reasonable signatures for their restricted domains. Unfortunately, the natural interpretation of this pair of methods is that the first is a member of Collection, and the second is a member of Collection, but there is *no* toArray() method that is a member of Collection! This means that code that is generic in any-T would not see a toArray() method at all. That's a problem (though not as enormous as it initially sounds, there are possible mitigating techniques.) However, it is not unreasonable for the compiler to recognize this situation and deal. Suppose I have some code generic in any-T: void foo(Collection c) { T[] arr = c.toArray(); } Now, the compiler doesn't know whether c is a collection of refs or values, but it knows it's one or the other (ref T and val T form a partition of any T). So it could (and in some cases, has to anyway) do type checking by parts -- it can typecheck the above assignment under the assumption of ref T, and do it again under the assumption of val T, and if both succeed (and something else, see below), accept the method invocation as valid. (In this case, the ref-T fork should result in an unchecked warning, meaning that the merged checking also yields an unchecked warning.) The "something else" part is: when doing overload resolution by parts, both branches must resolve to overloads that are erasure-equivalent to each other. Which is true for toArray() (and for all the cases for which superation would work.) Now, this is a lot of handwaving, and it doesn't even really describe how we think partial methods should actually work (I'd like to get rid of the where-val-T slices entirely, this is a separate discussion.) But its a sketch of an option that achieves the positive result of superation without engaging the complexity of superation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Fri Dec 18 15:50:07 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 18 Dec 2015 10:50:07 -0500 Subject: Migrating methods in Collections In-Reply-To: <56719E1F.2050504@oracle.com> References: <56719E1F.2050504@oracle.com> Message-ID: <56742B2F.3000809@cs.oswego.edu> Before commenting on details, a few preliminaries. 1. It seems unresolved in the current State of Values doc whether value types will have user-definable equals() methods. I think that this needs to be settled soon: If value types don't allow overriding equals, and if the implementation is "has same type and bits", then some of the problems you note almost disappear. For example c.contains(x) could be automatically translated into "false" if c is a collection of a different val type than x, or x is a ref type or null. Which also happens to catch all the type-problematic cases. I'm not sure how a compiler would know to do this though. (The complementary case where c is Collection (non-any), and x is a possibly-boxable value generalizes how Collection currently acts, which doesn't seem to need any change.) 2. It seems irresponsible to spend so much effort on Collections without also somehow addressing 32bit size/index limitations. Yes? 3. Similarly for value-like (aka fluent-immutable, aka persistent) collection methods, possibly in sub or super interfaces, or just extension methods in Collection. As in: Collection adding(T x); Collection removing(T x); (In other words, if collections support values, users will also expect value-like collections/APIs.) Existing collections might just clone-then-mutate, but others (like HAMTs that we don't support in part for lack of API) would do something cheaper. (Default implementations seem possible, but only via messy reflection.) Sorry that (2) and (3) are almost out of scope of this discussion, but merely "almost" -- they seem to interact at least a little. -Doug From brian.goetz at oracle.com Fri Dec 18 16:55:12 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 18 Dec 2015 11:55:12 -0500 Subject: 64 bit collections, and API migration in general (was Re: Migrating methods in Collections) In-Reply-To: <56742B2F.3000809@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> Message-ID: <56743A70.1000401@oracle.com> > 2. It seems irresponsible to spend so much effort on > Collections without also somehow addressing 32bit size/index > limitations. Yes? I think that's really a separate question. While everything said so far is Collection-specific, it's not really "so much effort on Collections" as much as "so much effort to ensure that legacy libraries can be compatibly anyfied", and that Collections is the poster child for that effort. (If we can't migrate Collections, that's evidence that we're still lacking in linguistic tools for supporting the transition to anyfied generics.) So I'll interpret your question as: "These are nice migration tools for migrating erased libraries to anyfied, but there are other migrations we'd like to perform on these aging libraries, please don't forget about them?" The migration in question is whether we can compatibly migrate methods like: get(int index) to get(long index) For API migration, there is a 2x2 compatibility matrix: {source,binary} x {client,subclass}. The hard quadrant of this is almost always "binary compatibility for subclasses"; the others can usually be handled by some combination of bridge methods, defaults, and compiler fu. Essentially, the nasty case comes about when you have some combination of A extends B extends C where some of these have not been recompiled, and someone ends up overriding a bridge instead of the real method, and you can end up invoking the wrong method. I'll provide more details soon, but let's come back to this under the more general topic of signature migration -- which we're going to need in order for Optional and friends to become values anyway. OK? From brian.goetz at oracle.com Fri Dec 18 16:55:18 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 18 Dec 2015 11:55:18 -0500 Subject: Equality (was: Re: Migrating methods in Collections) In-Reply-To: <56742B2F.3000809@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> Message-ID: <56743A76.6000007@oracle.com> > 1. It seems unresolved in the current State of Values doc > whether value types will have user-definable equals() > methods. I think that this needs to be settled soon: > > If value types don't allow overriding equals, and if the implementation > is "has same type and bits", then some of the problems you note > almost disappear. For example c.contains(x) could be automatically > translated into "false" if c is a collection of a different val type > than x, or x is a ref type or null. Which also happens to catch all > the type-problematic cases. I'm not sure how a compiler would know > to do this though. Here's the current thinking on the tools for equality: - The bytecode set will provide sort of 'vcmpeq' instruction, whose behavior is a componentwise recursive comparison (int fields with icmp, value fields with vcmp, etc). - The == operator in the language will correspond to vcmpeq - The default (whether provided by javac or VM) implementation of equals(V) for value types will do an == comparison - Users can override equals(V) The motivation for allowing overriding equals is the same as for objects. Obvious examples include Decimal(1.0) and Decimal(1.00), and Tuple[String,String] that both contain [ foo, bar ] but use different String instances to do so. On the signature of equality, equals() has potentially the same issue as contains(), where you might want to accept a broader set of comparands. Still figuring out the options there. From dl at cs.oswego.edu Fri Dec 18 17:41:24 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 18 Dec 2015 12:41:24 -0500 Subject: Equality In-Reply-To: <56743A76.6000007@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> Message-ID: <56744544.1050603@cs.oswego.edu> On 12/18/2015 11:55 AM, Brian Goetz wrote: > Here's the current thinking on the tools for equality: > > - The bytecode set will provide sort of 'vcmpeq' instruction, whose behavior > is a componentwise recursive comparison (int fields with icmp, value fields with > vcmp, etc). > - The == operator in the language will correspond to vcmpeq > - The default (whether provided by javac or VM) implementation of equals(V) > for value types will do an == comparison > - Users can override equals(V) > > The motivation for allowing overriding equals is the same as for objects. > Obvious examples include Decimal(1.0) and Decimal(1.00), and > Tuple[String,String] that both contain [ foo, bar ] but use different String > instances to do so. > > On the signature of equality, equals() has potentially the same issue as > contains(), where you might want to accept a broader set of comparands. Still > figuring out the options there. Limiting value type V to only override equals(V x) seems to have the same simplifying impact on Collection.contains and others. Yes? -Doug From brian.goetz at oracle.com Fri Dec 18 17:48:35 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 18 Dec 2015 12:48:35 -0500 Subject: Equality In-Reply-To: <56744544.1050603@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> Message-ID: <567446F3.7000904@oracle.com> > Limiting value type V to only override equals(V x) seems to have > the same simplifying impact on Collection.contains and others. Yes? In our attempt on Collections, we found that contains(V), while prettier, might not be enough. In particular, it wasn't enough for providing the skeletal implementation of removeAll(Collection) in AbstractCollection, which looks like: Iterator it = iterator(); while (it.hasNext()) { if (c.contains(it.next())) { it.remove(); modified =true; } } Here, c is a Collection, so its contains method would be contains(capture(? extends E)), and it.next() returns a E, so the compiler doesn't like it. If I found this idiom in the first few minutes of trying to port collections, I'm guessing it will occur elsewhere too. So perhaps what this says is we are going to get pushed in the other direction -- that we'll want to superate equals(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Fri Dec 18 20:21:54 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 18 Dec 2015 15:21:54 -0500 Subject: Equality In-Reply-To: <567446F3.7000904@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> Message-ID: <56746AE2.9070403@cs.oswego.edu> On 12/18/2015 12:48 PM, Brian Goetz wrote: > So perhaps what this says is we are going to get pushed in the other direction > -- that we'll want to superate equals(). > What if plain-equals and value-equals are independently overridable, with defaults: boolean equals(any x) { return (x instanceof ThisClass) && equalValue(x); } boolean equalValue(ThisClass x) { ... check bit equality ... } (The equalValue method could be named "equals" too, but doing so is too confusing for now.) Where operator== calls equalValue, not plain equals. The effects cascade to many of the Collection methods. -Doug From brian.goetz at oracle.com Fri Dec 18 21:05:49 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 18 Dec 2015 16:05:49 -0500 Subject: Equality In-Reply-To: <56746AE2.9070403@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> Message-ID: <5674752D.8050800@oracle.com> Good thought. Let me give it a nudge in a direction: we'd like to get away from approaches that amount to partitioning on ref/val, and instead steer towards "here's the universal version, and here's the special-case version for refs that 'overrides' the universal version", where overriding could include things like perturbing the signature (as covariant overrides do.) Recasting in this light: boolean equals(Something o) { ... } // total method boolean equals(Object o) { ... } // ref-override On 12/18/2015 3:21 PM, Doug Lea wrote: > On 12/18/2015 12:48 PM, Brian Goetz wrote: > >> So perhaps what this says is we are going to get pushed in the other >> direction >> -- that we'll want to superate equals(). >> > > What if plain-equals and value-equals are independently overridable, > with defaults: > boolean equals(any x) { return (x instanceof ThisClass) && > equalValue(x); } > boolean equalValue(ThisClass x) { ... check bit equality ... } > > (The equalValue method could be named "equals" too, but doing so > is too confusing for now.) > Where operator== calls equalValue, not plain equals. > > The effects cascade to many of the Collection methods. > > -Doug > > From john.r.rose at oracle.com Sat Dec 19 10:13:07 2015 From: john.r.rose at oracle.com (John Rose) Date: Sat, 19 Dec 2015 02:13:07 -0800 Subject: Equality (was: Re: Migrating methods in Collections) In-Reply-To: <56743A76.6000007@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> Message-ID: <314B3C96-C984-4512-A155-DFF3AB4C5F67@oracle.com> On Dec 18, 2015, at 8:55 AM, Brian Goetz wrote: > >> 1. It seems unresolved in the current State of Values doc >> whether value types will have user-definable equals() >> methods. I think that this needs to be settled soon: >> >> If value types don't allow overriding equals, and if the implementation >> is "has same type and bits", then some of the problems you note >> almost disappear. For example c.contains(x) could be automatically >> translated into "false" if c is a collection of a different val type >> than x, or x is a ref type or null. Which also happens to catch all >> the type-problematic cases. I'm not sure how a compiler would know >> to do this though. > > Here's the current thinking on the tools for equality: > > - The bytecode set will provide sort of 'vcmpeq' instruction, whose behavior is a componentwise recursive comparison (int fields with icmp, value fields with vcmp, etc). > - The == operator in the language will correspond to vcmpeq > - The default (whether provided by javac or VM) implementation of equals(V) for value types will do an == comparison > - Users can override equals(V) "Codes like a class" => You can override equals. There's not really an open question here. Forbidding overrides to equals would be crippling. "Works like an int" => operator== might be logically adjusted to the value type semantics. (Reminder "Codes like a class, works like an int" is the slogan which best captures in a few words what we are trying to do with value types.) With that as context, I think we have two plausible, logically consistent options: Option 1. (POR, as Brian points out) operator== is hardwired to bitwise comparison (ignoring padding, never calling equals methods) Option 2. operator== is an alias for equals, and vcmpeq is accessible but available under a different name (isSameAs). The choice here must balance two competing influences. "Codes like a class" means that, internally within the implementation of a value type, uses of operator== must be "dumb" approximations to "true" equality. Indeed, probably most occurrences of operator== on references other than null are of the form "p == q || p != null && p.equals(q)". Bad language choice here, IMO That legacy meaning of operator== pushes us towards Option 1. "Works like an int" means that, externally when people use a value type, as if it were a primitive, will just say "v == w" and not even dream that "v.equals(w)" is a thing. Exactly zero occurrences of operator== on non-references are backed up by calls to equals, and users will be surprised if a value type give incomplete answers to v==w. This practicality pushes us towards Option 2. But, if you think about it, it also pushes us towards well-controlled behavior for other operators. If I can write "v == w", what should I expect from "v < w" (if they are comparable)? Does this roll us all the way down the slippery slope to operator overloading? It had better not. There are two obvious places we could stop rolling towards (uncontrolled) operator overloading. First, only "overload" operators which are *already* common to both primitives and references. That means == and !=, and nothing else. Second, retroactively add interfaces to Byte, Boolean, Integer, Long, Float, etc., which reify all the relevant operators as named method calls. And then allow value types to overload those named methods, wiring operator uses into those methods (but continuing to hardwire the primitives to the appropriate bytecodes). I think the POR (Option 1) is reasonable, unless/until we discover evidence to the contrary as we work with generics over primitives. Finally, note that operator overloading is not just an academic or esthetic concern, because enhanced generics demand some sort of unified view of types. When we write a generic method over a type parameter , we expect the method to operate correctly over all valid bindings of T. Today, since T ranges only over references, we can assume that code that touches T will do the Option 1 dance of "v == w || v.equals(w)", across all T, even value types. Tomorrow, when T ranges over primitives, references, and values, there will be a little more pressure to "rationalize" the behavior of op<=, op+, op*, etc., so that they operates correctly over all valid bindings of T. I say "a little more" but experiment will show whether it is significant. If so, we will want to re-interpret op<=, op+, etc., as interface calls, and write generics using bounds like (getting op< <= == != >= >), (getting op+ etc.), and so on. The conservative thing to do, which might be right in the end, is to require all new code that uses etc. to always use method-call syntax on values of type T, and bring primitives into consistency by retroactively assigning the methods in Comparable, etc. (Perhaps only in generic code?) Later on we can reconsider whether rehabilitating the various infix operators (as sugar for those methods) is worth doing. The thing we must *not* do is get to a place where primitives can *only* be operated on via operators like op< op+, but values and references can *only* be operated on via method invocation. One of the two sides has to change so as to overlap (at least for generic code) with the other. ? John From john.r.rose at oracle.com Sat Dec 19 10:17:12 2015 From: john.r.rose at oracle.com (John Rose) Date: Sat, 19 Dec 2015 02:17:12 -0800 Subject: Equality In-Reply-To: <56744544.1050603@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> Message-ID: On Dec 18, 2015, at 9:41 AM, Doug Lea

wrote: > > Limiting value type V to only override equals(V x) seems to have > the same simplifying impact on Collection.contains and others. Yes? Maybe. But, Brian has written about edge cases where the symmetry of Object.equals interacts badly with ad hoc code in collections. Basically, if you write this.equals(x) you might expect to type x:(? extends V) but if you write x.equals(this) you might expect to have to say x:(? super V). After shaking things around, you give up and say x:any, which today is x:Object. In the new world we can maybe get away with non-variant equals(V x), and allow an escape hatch to the legacy world via boxing and the old equals(Object x). -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Dec 19 10:23:14 2015 From: john.r.rose at oracle.com (John Rose) Date: Sat, 19 Dec 2015 02:23:14 -0800 Subject: Equality In-Reply-To: <567446F3.7000904@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> Message-ID: <8E87393D-17B8-4951-AE12-16A2B0A19BEC@oracle.com> On Dec 18, 2015, at 9:48 AM, Brian Goetz wrote: > > So perhaps what this says is we are going to get pushed in the other direction -- that we'll want to superate equals(). Because equals is symmetric, coders use it in both directions, and the net result is you probably end up seeing both directions at once. To put it another way, it is removeAll(Collection) because, even if equals has a "direction", you don't know in which order the operands are passed to equals, within an implementation of removeAll. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Dec 19 10:26:43 2015 From: john.r.rose at oracle.com (John Rose) Date: Sat, 19 Dec 2015 02:26:43 -0800 Subject: Equality In-Reply-To: <56746AE2.9070403@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> Message-ID: <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> On Dec 18, 2015, at 12:21 PM, Doug Lea

wrote: > > On 12/18/2015 12:48 PM, Brian Goetz wrote: > >> So perhaps what this says is we are going to get pushed in the other direction >> -- that we'll want to superate equals(). >> > > What if plain-equals and value-equals are independently overridable, > with defaults: > boolean equals(any x) { return (x instanceof ThisClass) && equalValue(x); } > boolean equalValue(ThisClass x) { ... check bit equality ? } (And of course in that case make "equals" be "final" above value types.) > (The equalValue method could be named "equals" too, but doing so > is too confusing for now.) > Where operator== calls equalValue, not plain equals. That would go with Option 2 in my previous mail. It's pretty and logical, but not clear yet if it is needed. > The effects cascade to many of the Collection methods. A big problem with collection methods is the effect of the symmetry of equals in the presence of subtype polymorphism. You can't tell which operand of equals is going to be a subtype of the other. ? John From john.r.rose at oracle.com Sat Dec 19 10:29:23 2015 From: john.r.rose at oracle.com (John Rose) Date: Sat, 19 Dec 2015 02:29:23 -0800 Subject: Equality In-Reply-To: <5674752D.8050800@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <5674752D.8050800@oracle.com> Message-ID: On Dec 18, 2015, at 1:05 PM, Brian Goetz wrote: > Recasting in this light: > > boolean equals(Something o) { ... } // total method > > > boolean equals(Object o) { ... } // ref-override I don't get this bit: Wouldn't you need to say in order to put the constraint on the type that "hurts" equals? And in that case, of course, you don't need at all, since you know if ThisType is ref or val. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Sat Dec 19 16:02:21 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 19 Dec 2015 11:02:21 -0500 Subject: Equality In-Reply-To: <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> Message-ID: <56757F8D.2030304@cs.oswego.edu> Here's an alternative that evades (or allows to be postponed) John's slippery-slope concerns about overloading "==": Anyfy equals, and adjust default implementation of boolean T.equals(any x) for type (ref or val) T to the intrinsified, otherwise inexpressible: if (bitwiseEqual(x, this)) return true; // pointer equality if ref if (!(x instanceof T)) return false; if (method T.equals(T) is not defined) return false; return equals((T)x); // call specialized override (Various optimizations may be possible.) Any class or val type could still override this, and/or introduce its T.equals(T) specialized override. It might be challenging for people to write correct, symmetric equals methods that span refs and vals, but not impossible (which is OK since it should be rare). I still think that doing something like this removes the need to specially deal with Collection.contains and related methods. -Doug From brian.goetz at oracle.com Sat Dec 19 16:38:37 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 19 Dec 2015 11:38:37 -0500 Subject: Equality In-Reply-To: <56757F8D.2030304@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> Message-ID: <5675880D.4060503@oracle.com> > Anyfy equals, and adjust default implementation of boolean T.equals(any x) I think we may be talking past each other (while basically saying the same thing from opposite directions.) "any" is not a type; it is a modifier that affects the domain of type variables. So we don't have (and I think we don't want) a meaning for equals(any x). But what we do want, is a way of expressing "I'll take anything which could be on the other side of the == operator with me." For refs, that's Object; for a value type V, that's just V. Where we had gotten with the / superation idiom (not suggesting either of these syntaxes is great) is being able to express: - If T is value, then T, else the erasure of T (usually object) ** I'll write this as Sup for short. The convenient thing about Sup is that it conveniently collapses to Object in the places where we want Object, so we could define contains/remove as contains(Sup) and contains will always bottom out at equals(), so equals() similarly needs to be equals(Sup) If this is a valid approach (and I think its the best one we've got so far), then we're looking for how to spell Sup (in all of: type system, language syntax, bytecode descriptors.) > I still think that doing something like this removes the need > to specially deal with Collection.contains and related methods. I don't see it yet; those signatures are still currently contains(Object), which isn't appropriate for value types. So we have to do *something*. ** There's a lot of sloppiness in the ref/val distinction, which is going to need to be cleaned up. Sometimes when we say ref/val, we mean "erased/reified". Sometimes we mean "polymorphic/monomorphic". Sometimes we mean "nullable/non-nullable." From dl at cs.oswego.edu Sat Dec 19 18:45:41 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 19 Dec 2015 13:45:41 -0500 Subject: Equality In-Reply-To: <5675880D.4060503@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> Message-ID: <5675A5D5.6060208@cs.oswego.edu> On 12/19/2015 11:38 AM, Brian Goetz wrote: >> Anyfy equals, and adjust default implementation of boolean T.equals(any x) > "any" is not a type; it is a modifier that affects the domain of type > variables. OK. I should have first asked whether there are plans to allow forms like: boolean equals(T x) including as sugar for some superation-like construction. It would be nice to further abbreviate as "(any x)" but probably syntactically impossible. But still convenient in pseudocode discussion. > So we don't have (and I think we don't want) a meaning for > equals(any x). The equals method is special because it is the only defined Object method that can interact with Values world, and vice versa. So making it parametric across them seems necessary, even if it requires some special JVM magic. >> I still think that doing something like this removes the need >> to specially deal with Collection.contains and related methods. > > I don't see it yet; those signatures are still currently contains(Object), which > isn't appropriate for value types. So we have to do *something*. Right. I agree that the signature must compatibly change, but not necessarily that anything else does. -Doug From brian.goetz at oracle.com Sat Dec 19 22:20:48 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 19 Dec 2015 17:20:48 -0500 Subject: Equality In-Reply-To: <5675A5D5.6060208@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> <5675A5D5.6060208@cs.oswego.edu> Message-ID: <5675D840.7090109@oracle.com> > OK. I should have first asked whether there are plans to allow forms > like: > boolean equals(T x) > including as sugar for some superation-like construction. Yes. That's totally valid (any is a modifier for a type variable declaration, whether introduced at the class or method level.) Its worth noting that there's some degree of extra dispatch cost for generic instance methods (a separate topic, but something to keep in mind) which is a slight negative for going generic on such a critical method as equals(). But yes, that's valid. > Right. I agree that the signature must compatibly change, > but not necessarily that anything else does. OK, we're on the same page. I think we're coming at the "superate" concept from different directions, but both roads lead there. From ron at paralleluniverse.co Sun Dec 20 17:40:40 2015 From: ron at paralleluniverse.co (Ron Pressler) Date: Sun, 20 Dec 2015 19:40:40 +0200 Subject: Migrating methods in Collections Message-ID: (Sorry if this message doesn?t appear in the same thread; I didn't get the older messages on the list). I?d like to suggest a slightly different approach to the containers migration issue (I'm not discussing equality). The partial-method idea seems a potential source of confusion to me. Unlike techniques for manual specialization (e.g. bitfields for ArrayList), here we?re talking added complexity which directly affects any interaction with a generic class ? not just its implementation. It is unencapsulated complexity, so I think it deserves careful consideration. I have a couple of ideas, each can be used in isolation or in a combination with the other, which may (or may not) be simpler: 1. We can simply not specialize the signatures of public collection methods (say, if [T] is the boxed-type of T, the signature of Map.get(Object) will be [V] get(Object)). The JVM?s ability to avoid boxing might be good enough for this to yield the performance we want. New methods can, of course, be added. This approach can be taken in addition to or instead of superation. 2. If methods are to be removed (as in made partial), instead of magically disappearing them at the call site based on usage, perhaps we should consider hiding them by source-code version (not from the class file, of course, only hiding them in javac)? This is an explicit decision to break source compatibility, but it has two mitigating factors: 1/ javac conveniently has a source level (which, I hear, will also result in hiding new methods starting with Java 9) and 2/ Java already breaks source compatibility from time to time. I had quite a few classes that didn?t compile under 8 because 8 changed the name resolution rules wrt static imports (or, more precisely, made them conform to the JLS, whereas they hadn't in prior versions). It took me some time to figure out what was wrong, but hidden methods would be able to give much better error messages. Also, the superation idea seems very interesting, but I don?t understand how it would work for contains/remove(Object), as contains needs to be able to accept both super- *and* subtypes of T (as in, animals.contains(dog)). I believe its type ? like that of equals() -- should be contains(T x) Ron -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Sun Dec 20 22:05:32 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 20 Dec 2015 17:05:32 -0500 Subject: Migrating methods in Collections In-Reply-To: References: Message-ID: <5677262C.1010005@cs.oswego.edu> On 12/20/2015 12:40 PM, Ron Pressler wrote: > 1. We can simply not specialize the signatures of public collection methods > (say, if [T] is the boxed-type of T, the signature of Map.get(Object) will > be [V] get(Object)). One would think that the boxing of T would be an implementation of Optional, which would be incompatibly different as a signature. Although I'm not exactly sure how this would work given the compromises defining Optional necessary to get it into jdk8. -Doug From brian.goetz at oracle.com Mon Dec 21 16:20:35 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 21 Dec 2015 11:20:35 -0500 Subject: Migrating methods in Collections In-Reply-To: References: Message-ID: <567826D3.7050504@oracle.com> Totally fair to ask "have we missed a simpler choice?" But sadly I think we haven't (though I don't fully understand your second idea, I'm interested to hear more.) > here we?re talking added complexity which directly > affects any interaction with a generic class ? not just its > implementation. I think this is either not true, or overstated (depending on what you mean by "any interaction".) The only new thing that this adds (and its not really that new) is the fact that the set of members of a parameterized type depends on the parameterization. For example, List might have members removeAt(int) // new total method remove(int) // legacy method but List would only have removeAt(int) // new total method But, this sort of dependency isn't even new; from the client perspective, the signature of add in List is add(String), whereas its signature in List is add(Number). What's new is that some methods won't appear in some parameterizations. I don't think this rises to the level of "added complexity" (arguably it is even "reduced complexity"?); the user will see this as a context-dependent set of methods when they hit ctrl-space in their IDE. Just as the type signatures are already specialized to the type of the receiver in this context, methods that are not applicable will now be filtered. (Note also that when we migrate a reference-specific method to a new method, the new method is not value-specific, its total, so it can be positioned as "the old method has been deprecated in favor of this new, more flexible method.") > 1. We can simply not specialize the signatures of public collection > methods (say, if [T] is the boxed-type of T, the signature of > Map.get(Object) will be [V] get(Object)). The JVM?s ability to > avoid boxing might be good enough for this to yield the performance we > want. New methods can, of course, be added. This approach can be taken > in addition to or instead of superation. Yes, this was something we considered early on. There are several issues: - Is our box elision going to be good enough? - Nullity - Transparency Elision. Deciding to not specialize the signatures means that we're relying on box elision in the VM being good enough so that boxes are elided "almost all the time." Sadly, I do not think this is going to be the case. There are certainly reasons why box elision could be better with value-boxes (we can be more hostile to their identity, and therefore more freely elide them) than the existing wrappers, but if I have a deep chain of calls that are passing a boxed value through a library (common), there is a real risk of fall-off-the-cliff behavior when we hit our various inlining limits, and can't see that both ends of the chain prefer the unboxed variant. Further, the most important box types -- Integer and friends -- are already deeply polluted with identity (want to bet that no program ever locks on one?) So I think this one goes in the "boy, it would be nice" column, but I don't think its something we can bet the farm on. Nullity. Even if elision were perfect, Map.get is still fundamentally unrescuable, because it uses null as a return to signal non-presence. (Forcing all values to be nullable is a non-starter.) This means that we may never be able to elide the boxing in Map.get(), which would cripple map performance -- non-starter. So some sort of migration strategy is needed for Map.get() anyway -- and in fact, the "peeling" technique was invented in the context of "what about Map.get", and the rest was mostly an exploration of whether we needed any additional hammers beyond that. Transparency. Even if the above two were not issues, I think having box types (or worse, Object) show up in signatures when the user is expecting something involving T is a visible wart that the users will notice. (Users would reasonably expect a List to have methods that truck in int, not Integer, and not Object.) For these reasons, I think *some* intrusion into the API is unavoidable. The work that's gone into this draft is aimed at trying to balance compatibility with the current API (in both letter and spirit) with minimizing the warts perceived by future clients of the anyfied APIs. (Future *implementors* will experience warts, such as having to implement both flavors of remove. However, these are migration-specific warts; as new libraries are written that don't have be migrated from ref-generics, these won't even show up.) > 2. If methods are to be removed (as in made partial), instead of > magically disappearing them at the call site based on usage, perhaps we > should consider hiding them by source-code version (not from the class > file, of course, only hiding them in javac)? This is an explicit > decision to break source compatibility, but it has two mitigating > factors: 1/ javac conveniently has a source level (which, I hear, will > also result in hiding new methods starting with Java 9) and 2/ Java > already breaks source compatibility from time to time. I had quite a few > classes that didn?t compile under 8 because 8 changed the name > resolution rules wrt static imports (or, more precisely, made them > conform to the JLS, whereas they hadn't in prior versions). It took me > some time to figure out what was wrong, but hidden methods would be able > to give much better error messages. I'm not sure I'm following what problem you're trying to solve here? (This sounds a little like the tricks we did with default methods when compiling with the jdk8 compiler in -source 7 mode, where we didn't consider default methods to be members of the class for some purposes when viewed from 7 code?) Can you elaborate? > Also, the superation idea seems very interesting, but I don?t understand > how it would work for contains/remove(Object), as contains needs to be > able to accept both super- /and/ subtypes of T (as in, > animals.contains(dog)). Yeah, this is what I meant by "Even though this works, its still not that obvious." If you have animals.contains(dog) where boolean contains(U) then inference concludes U=Animal, so everything is fine. (The constraints: U :> E, E=Animal, Dog <: U). But as I said, its not obvious. (Dan likens it to F-bounds; for most people, the best they can do is learn "this is the idiom", rather than truly understand it.) Hence, this is a downside of this approach -- that even smart people will look at it and scratch their heads. > I believe its type ? like that of equals() -- > should be contains(T x) Maybe! But I think there's also a bit of Stockholm Syndrome in that thinking, that derives from a pre-generics notion of the world. In a generic world, you can use the type system to exclude the "obviously stupid" candidates, such as those that are known not to be either a subtype or a supertype of the type in question. Secondarily, there's a contingent reason why I'm nervous about such a fundamental method like Object.equals() being defined as a generic method -- when you follow the details of how any-generic methods are implemented, the invocation cost is unavoidably higher. For new code, this is probably acceptable, but for the cornerstone of the castle, it doesn't seem to be. The technique hinted at in the end of my mail is an attempt to get the benefits of superation while not having to reach for either the big contravariance hammer or the generic method hammer. The result would a single, non-generic method whose signature collapses to equals(Object) for reference types and equals(V) for value types. (All the animals.contains(dog) examples only show up when there's variance, and value types are monomorphic, so they don't have to deal with superclasses or subclasses showing up.) If we can make this work, this seems preferable to any of the options explored previously. From brian.goetz at oracle.com Mon Dec 21 16:24:33 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 21 Dec 2015 11:24:33 -0500 Subject: Migrating methods in Collections In-Reply-To: <5677262C.1010005@cs.oswego.edu> References: <5677262C.1010005@cs.oswego.edu> Message-ID: <567827C1.9010900@oracle.com> > One would think that the boxing of T would be an implementation of > Optional, which would be incompatibly different as a signature. Right, that's the thinking towards "migrate map()V to one or more other method." The existing map() is irretrievably null-polluted; write some new value-friendly methods. One of the forms uses Optional; this assumes we can migrate Optional to be a value type in Valhalla (requiring additional migration tools, along the lines alluded to when you brought up collection index sizes.) > Although I'm not exactly sure how this would work given the compromises > defining Optional necessary to get it into jdk8. Right. As a reference type, Optional is a box, and so while more expressive than Integer, is no lighter. As a value type, different story. From dl at cs.oswego.edu Mon Dec 21 16:50:56 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 21 Dec 2015 11:50:56 -0500 Subject: Equality In-Reply-To: <5675D840.7090109@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> <5675A5D5.6060208@cs.oswego.edu> <5675D840.7090109@oracle.com> Message-ID: <56782DF0.20800@cs.oswego.edu> On 12/19/2015 05:20 PM, Brian Goetz wrote: >> Right. I agree that the signature must compatibly change, >> but not necessarily that anything else does. > > OK, we're on the same page. Where I think this page says that interfaces with existing methods accepting Object args can in principle be anyfied without strictly requiring (but usually strongly encouraging) implementation class rework. There is a way to enable/translate analogs of Object methods, in particular equals(). We wouldn't normally recommend blanket anyfication of interfaces, but Collections is the main one that everyone hopes will be somehow doable. The full story on this has a few more quirks though. It gets uncomfortable to cope with synchronized(obj), Object.wait, and Object.notify: Semantically, synchronized(val) and notify would be no-ops, and wait would block forever. Which would be OK, because no sensible general-purpose implementation of say, collection.contains would use any of these. And, as John almost noted, compareTo/Comparable needs treatment similar to my hybrid version of equals. -Doug From brian.goetz at oracle.com Mon Dec 21 17:13:36 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 21 Dec 2015 12:13:36 -0500 Subject: Equality In-Reply-To: <56782DF0.20800@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> <5675A5D5.6060208@cs.oswego.edu> <5675D840.7090109@oracle.com> <56782DF0.20800@cs.oswego.edu> Message-ID: <56783340.2010309@oracle.com> > Where I think this page says that interfaces with existing methods > accepting Object args can in principle be anyfied without strictly > requiring (but usually strongly encouraging) implementation class rework. I want to draw a strict distinction between "migrating APIs" and "migrating implementations." The messages so far have been deliberately restricted to migrating APIs, the linguistic tools available for doing so, and their impact on the Collection APIs. > There is a way to enable/translate analogs of Object methods, > in particular equals(). I was hoping to treat the Object methods in a separate discussion. It does connect to this discussion in that contains() will inevitably appeal to equals(), but it also pulls in a zillion other things (are values objects? can value implement interfaces? is there a base type for values? what is the relationship between the base type for values and for objects? is there a top type? etc.) I think the upshot here is that the problem of "what is the signature of equals" is essentially the same as for contains/remove (contains inevitably calls equals), so we should be mindful that whatever discussions we have for collections are also going to impact equals, and ideally the same hammer pounds down both nails. > We wouldn't normally recommend blanket anyfication of interfaces, > but Collections is the main one that everyone hopes will be somehow > doable. Collections is important both because its fundamental, and because its the canary -- if we can't anyfy Collections, there's a good chance that are tools are still insufficient for migrating other real-world libraries. > The full story on this has a few more quirks though. This is what I was getting at above, with "let's treat the implementation part of the problem separately." There are a pile of idioms that show up in this kind of code whose semantics gets fuzzy when a type variable straddles references and values -- comparison to null, comparison to other objects (particularly 'this', which shows up in AbstractList.toString), synchronization, assignment to null, instanceof/cast, array creation.) My initial porting exercise of Collections leads me to conclude that the tools needed for migrating the APIs and the tools needed for migrating the code are mostly decoupled. Since the API changes are more visible, I thought it sensible to start there. > It gets uncomfortable to cope with synchronized(obj), > Object.wait, and Object.notify: Semantically, synchronized(val) > and notify would be no-ops, and wait would block forever. This is one approach (the "permanently locked" object approach that John described in an earlier Value Objects proposal), but there are others. Let's come back to this. From ron at paralleluniverse.co Mon Dec 21 17:18:18 2015 From: ron at paralleluniverse.co (Ron Pressler) Date: Mon, 21 Dec 2015 09:18:18 -0800 (PST) Subject: Migrating methods in Collections In-Reply-To: <567826D3.7050504@oracle.com> References: <567826D3.7050504@oracle.com> Message-ID: <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> > On Dec 21, 2015, at 6:20 PM, Brian Goetz wrote: > > >> >> >> 2. If methods are to be removed (as in made partial), instead of >> magically disappearing them at the call site based on usage, perhaps we >> should consider hiding them by source-code version (not from the class >> file, of course, only hiding them in javac)? This is an explicit >> decision to break source compatibility, but it has two mitigating >> factors: 1/ javac conveniently has a source level (which, I hear, will >> also result in hiding new methods starting with Java 9) and 2/ Java >> already breaks source compatibility from time to time. I had quite a few >> classes that didn?t compile under 8 because 8 changed the name >> resolution rules wrt static imports (or, more precisely, made them >> conform to the JLS, whereas they hadn't in prior versions). It took me >> some time to figure out what was wrong, but hidden methods would be able >> to give much better error messages. >> > > > > > I'm not sure I'm following what problem you're trying to solve here?? > (This sounds a little like the tricks we did with default methods when? > compiling with the jdk8 compiler in -source 7 mode, where we didn't? > consider default methods to be members of the class for some purposes? > when viewed from 7 code?) Can you elaborate? > Sure. Instead of demoting, say, remove(int) to a partial method, simply hide it from all source level 10 code, which will only be able to access removeAt, even on a List (the method will still be in the class, of course). Cons: breaks source compatibility (but not binary compatibility) in a more major way than ever before, but Java has mechanisms to deal with that (source level), and automatic migration tools should be easy. Pros: less strange than partial methods; simpler to implement; a more general (albeit crude) migration mechanism, or, rather binary-compatible source-deprecation mechanism. Now, it is a dramatic break, but Valhalla is quite dramatic anyway. Partial methods are a migration measure (we wouldn?t have needed them had the APIs been designed with values in mind, right?) but they?ll stay a part of the language forever, and they don?t have the general usefulness of default methods (unless there are non-migration reasons to make use of partial methods that make sense in Java). > Yeah, this is what I meant by "Even though this works, its still not? > that obvious."? > > > inference concludes U=Animal, so everything is fine. > Well, it?s obvious now? :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Dec 21 18:03:42 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 21 Dec 2015 13:03:42 -0500 Subject: Migrating methods in Collections In-Reply-To: <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> Message-ID: <56783EFE.2070004@oracle.com> > Instead of demoting, say, remove(int) to a partial method, simply > hide it from all source level 10 code, which will only be able to access > removeAt, even on a List (the method will still be in the class, > of course). Cons: breaks source compatibility (but not binary > compatibility) in a more major way than ever before, but Java has > mechanisms to deal with that (source level), and automatic migration > tools should be easy. Pros: less strange than partial methods; simpler > to implement; a more general (albeit crude) migration mechanism, or, > rather binary-compatible source-deprecation mechanism. I think people would be pretty ticked off if Map.get() just went away. I think the response would be: "Those idiots decided to change their libraries for their own reasons, I have no intention of ever specializing my Map, and yet I have to change my code anyway." Secondarily, while we might plan to do this to Collections in version N, other generic libraries (including other JDK libraries) might wait until N+3 to anyfy their own. Realistically this means we'd be forced to expose whatever versioning mechanism we use here for general use -- which seems at least as potentially confusing (and open to misuse) as partial methods. While a method-grained versioning mechanism seems like it might solve a lot of problems (for example, we wouldn't have needed to do default methods), so far, we've not seen any satisfactory theory that we'd want to consider building on -- there have been many attempts in the academic literature but I think method versioning in object oriented systems is still an unsolved problem. So I'm wary this could degenerate into something far worse than partial methods -- a bad versioning system. Separately, I think the distaste for partial methods may also be a little bit an allergic reaction to the deliberately-bad syntax we're using. I'll share a caricature of a past interaction on this topic (with someone on this list, actually) that illustrates the power of implicit syntactic biases: Him: This where-clause thing is totally confusing and will be completely foreign to Java developers! Augh! Me: What if I wrote it like this instead: boolean remove(Collection this, int index) Is that less confusing? (oh, and BTW this builds on the *existing Java 8 syntax* that is already there for explicit receiver parameters, which we added so they can be annotated.) Him: That's so much better! Then its clear that the restriction is just part of the method signature. And if there is more than one partial method called foo(), its clear from this that they are distinct overloads. Now, I don't want to devolve into premature syntax bikeshedding, but my point is: I don't think the it is the concept that is fundamentally confusing, its just that we will (in addition to convincing ourselves that the model is sound, which is the task currently in front of us) then additionally have to fit it into a syntactic expression that makes sense to Java users. (Coming up with a good syntactic form is also hard, so I want to first ensure we have a sound theoretical model before taking on unnecessary additional work.) > Now, it is a dramatic break, but Valhalla is quite dramatic anyway. > Partial methods are a migration measure (we wouldn?t have needed them > had the APIs been designed with values in mind, right?) but they?ll stay > a part of the language forever, and they don?t have the general > usefulness of default methods (unless there are non-migration reasons to > make use of partial methods that make sense in Java). Mostly, but not entirely. Partial methods also allow you to do this: interface List { default long sum() { ... } } which is not strictly related to migration. (Personally, I don't love this as a feature, because it's weaker than it first appears (think: "The Expression Problem"), and when you try to shore up these weaknesses with a more powerful slicing mechanism like it starts to get more complex -- but this form of partial method is also part of the current best approach we've got for being able to replace IntStream with Stream, which is easier in some ways, and harder in others, than Collections.) However (picking up the above syntactic form), would you find these signatures terribly confusing? interface Stream { ... int sum(Stream this); long sum(Stream this); double sum(Stream this); } From ron at paralleluniverse.co Mon Dec 21 20:50:21 2015 From: ron at paralleluniverse.co (Ron Pressler) Date: Mon, 21 Dec 2015 12:50:21 -0800 (PST) Subject: Migrating methods in Collections In-Reply-To: <56783EFE.2070004@oracle.com> References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> Message-ID: <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> > On Dec 21, 2015, at 8:03 PM, Brian Goetz wrote: > > I think people would be pretty ticked off if Map.get() just went away.? > I think the response would be: "Those idiots decided to change their? > libraries for their own reasons, I have no intention of ever? > specializing my Map, and yet I have to change my code anyway." > But those same people would also consume an API which returns a List and then find those same methods gone anyway.? Although my suggestion would probably require a tool (a javac plugin?) to migrate sources, though. Go has ?go fix?, but Java has had such a tool for a long time (I think James Gosling worked on it). I know I?m making my suggestion sound even scarier, but I think it beats adding a type system trick for the purpose of source compatibility (more on that later). > Secondarily, while we might plan to do this to Collections in version N,? > other generic libraries (including other JDK libraries) might wait until? > N+3 to anyfy their own. Realistically this means we'd be forced to? > expose whatever versioning mechanism we use here for general use --? > which seems at least as potentially confusing (and open to misuse) as? > partial methods. > I don?t know if it would be as confusing (I don?t think it would), but it may possibly also be used to solve the 64-bit index problem.? > While a method-grained versioning mechanism seems like? > it might solve a lot of problems (for example, we wouldn't have needed? > to do default methods), so far, we've not seen any satisfactory theory? > that we'd want to consider building on -- there have been many attempts? > in the academic literature but I think method versioning in object? > oriented systems is still an unsolved problem. So I'm wary this could? > degenerate into something far worse than partial methods -- a bad? > versioning system. > Default methods were also necessary for binary compatibility. I?m talking of something much simpler (like @AvailableUpTo(9)). > Now, I don't want to devolve into premature syntax bikeshedding, but my? > point is: I don't think the it is the concept that is fundamentally? > confusing, its just that we will (in addition to convincing ourselves? > that the model is sound, which is the task currently in front of us)? > then additionally have to fit it into a syntactic expression that makes? > sense to Java users. (Coming up with a good syntactic form is also? > hard, so I want to first ensure we have a sound theoretical model before? > taking on unnecessary additional work.) > Maybe. But partial methods are a clever deprecation mechanism that?s built into the type system. Not that I categorically oppose type-system cleverness (superation seems great), but source compatibility ? which Java doesn?t always preserve and has a good mechanism to manage anyway ? doesn?t seem like a good enough reason. >> > > > > Partial methods also allow you to do this: > > > interface List { > default long sum() { ... } > } > > > which is not strictly related to migration. (Personally, I don't love? > this as a feature, because it's weaker than it first appears (think:? > "The Expression Problem"), and when you try to shore up these weaknesses? > with a more powerful slicing mechanism like it? > starts to get more complex > Exactly.? > -- but this form of partial method is also? > part of the current best approach we've got for being able to replace? > IntStream with Stream, which is easier in some ways, and harder in? > others, than Collections.) > Is the goal to somehow make IntStream into Stream or to deprecate IntStream? If the latter, I also see no reason why sum (but not other sensible operations) must be part of Stream. In any event, a more general solution would be extension methods (I am not proposing we add those).? > > > However (picking up the above syntactic form), would you find these? > signatures terribly confusing? > > > interface Stream { > ... > int sum(Stream this); > long sum(Stream this); > double sum(Stream this); > } > What about other numeric types? Maybe? ? ?BigInteger sum (Stream this) too? And what if users would be able to add their own numeric value types? It?s a weird way to add what are essentially extension methods, and on the wrong side of the expression problem as you noted. If, OTOH, we?d have a ?numeric? interface on value types and integers (as I think John alluded to), that might make things better. Also, it?s not so much a question of confusion as of ?does it fit with the feel of Java?? Ron -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Mon Dec 21 22:33:57 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 21 Dec 2015 14:33:57 -0800 Subject: Equality In-Reply-To: <56782DF0.20800@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> <5675A5D5.6060208@cs.oswego.edu> <5675D840.7090109@oracle.com> <56782DF0.20800@cs.oswego.edu> Message-ID: <5FCA6B8E-57DC-4F7A-8361-FF1D0A50F4FD@oracle.com> On Dec 21, 2015, at 8:50 AM, Doug Lea

wrote: > > On 12/19/2015 05:20 PM, Brian Goetz wrote: > >>> Right. I agree that the signature must compatibly change, >>> but not necessarily that anything else does. >> >> OK, we're on the same page. > > Where I think this page says that interfaces with existing methods > accepting Object args can in principle be anyfied without strictly > requiring (but usually strongly encouraging) implementation class rework. > There is a way to enable/translate analogs of Object methods, > in particular equals(). I think we can do this. There seem to be more than enough tools at our disposal: Ability to define static intrinsics as needed, freedom to factor tricky code into default methods on interfaces or shared supers (Object, ValueType), reflection (as a last resort), right to appeal to lifted value semantics on auto-boxes, close correspondence between generic any-fied code and specialized code. (It would be inefficient at this point to work out all the implementation details over email, as Brian notes. That work is best done by prototyping. So I'm holding back for now.) > We wouldn't normally recommend blanket anyfication of interfaces, > but Collections is the main one that everyone hopes will be somehow > doable. Ability to retrofit is a big goal for us, of course. > The full story on this has a few more quirks though. > It gets uncomfortable to cope with synchronized(obj), > Object.wait, and Object.notify: Semantically, synchronized(val) > and notify would be no-ops, and wait would block forever. > Which would be OK, because no sensible general-purpose implementation > of say, collection.contains would use any of these. Yes. Those semantics are appropriate for objects which are immutable, for which the "write lock" will never become available. > And, as John almost noted, compareTo/Comparable needs > treatment similar to my hybrid version of equals. Two events like that certainly call for generalization. Algebraists will be eager to suggest other any-fied relations, so we want to support open-ended extension mechanisms. This is one reason value types are envisioned to interoperate with interfaces. On Dec 21, 2015, at 9:13 AM, Brian Goetz wrote: > > This is what I was getting at above, with "let's treat the implementation part of the problem separately." There are a pile of idioms that show up in this kind of code whose semantics gets fuzzy when a type variable straddles references and values -- comparison to null, comparison to other objects (particularly 'this', which shows up in AbstractList.toString), synchronization, assignment to null, instanceof/cast, array creation.) Add reflection to that list. Also auto-boxing (which happens when you cross over to Object). As I said above, I'm very optimistic that we can shape the details of these things so that they are quite useful. For me the most important guiding principle is lifting all value semantics to value boxes, while allowing those boxes to be "heisenboxes" (value-based, aggressively identity-agnostic). Those rules usually assign workable semantics for reference-like operations on values ("as if boxed"). The actual physical cost of boxing can be waved away, either by saying "the JIT can optimize it" or (more aggressively) by lowering the semantics into the value bytecodes. In both cases, the source code looks the same, as if there is autoboxing happening wherever needed, but the user doesn't need to care where. > My initial porting exercise of Collections leads me to conclude that the tools needed for migrating the APIs and the tools needed for migrating the code are mostly decoupled. Since the API changes are more visible, I thought it sensible to start there. > > > It gets uncomfortable to cope with synchronized(obj), > > Object.wait, and Object.notify: Semantically, synchronized(val) > > and notify would be no-ops, and wait would block forever. > > This is one approach (the "permanently locked" object approach that John described in an earlier Value Objects proposal), but there are others. Let's come back to this. (The perma-lock semantics works nicely for frozen arrays. You want to be able to lock a frozen array very quickly in order to read it safely, if you are processing a mix of frozen and under-lock-mutable arrays. But a fail-fast semantics is friendlier for other use cases, where we want to deprecate collections that stupidly lock on their elements. This deserves another thread. My important point right now is optimism: We seem to have more than enough tactics to create a decent design.) ? John From john.r.rose at oracle.com Mon Dec 21 23:16:08 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 21 Dec 2015 15:16:08 -0800 Subject: 64 bit collections, and API migration in general (was Re: Migrating methods in Collections) In-Reply-To: <56743A70.1000401@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A70.1000401@oracle.com> Message-ID: <9F488F38-7803-48D6-B663-2D18BDFDB42E@oracle.com> On Dec 18, 2015, at 8:55 AM, Brian Goetz wrote: > >> 2. It seems irresponsible to spend so much effort on >> Collections without also somehow addressing 32bit size/index >> limitations. Yes? Yes it would be, and we are aware of this question, and expecting to address it later in the light of prior design choices. > I think that's really a separate question. While everything said so far is Collection-specific, it's not really "so much effort on Collections" as much as "so much effort to ensure that legacy libraries can be compatibly anyfied", and that Collections is the poster child for that effort. (If we can't migrate Collections, that's evidence that we're still lacking in linguistic tools for supporting the transition to anyfied generics.) > > So I'll interpret your question as: "These are nice migration tools for migrating erased libraries to anyfied, but there are other migrations we'd like to perform on these aging libraries, please don't forget about them?" > > The migration in question is whether we can compatibly migrate methods like: > > get(int index) > to > get(long index) "Two instances" is always a suspected hiding place for "more than one instance". For example, an APL-like matrix can be viewed as a linear (ravel-able) collection indexed by int-pairs, which are neither ints or longs. I am assuming (until proven wrong) that the move from int to long should be part of a larger move from int to Index, where Index is an any-fied generic parameter. The practical effect of this concern, at the present moment, is that we should not only look at APIs which mention "Object" as suspects for any-fication, but also APIs which use "int" as an index. An "int" index is a candidate for replacement with some type "Index". It is *not* (IMO) likely to be a good candidate for ad hoc introduction of sibling "long" overloadings. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Dec 21 23:55:06 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 21 Dec 2015 18:55:06 -0500 Subject: Migrating methods in Collections In-Reply-To: <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> Message-ID: >> On Dec 21, 2015, at 8:03 PM, Brian Goetz wrote: >> I think people would be pretty ticked off if Map.get() just went away. >> I think the response would be: "Those idiots decided to change their >> libraries for their own reasons, I have no intention of ever >> specializing my Map, and yet I have to change my code anyway." > > But those same people would also consume an API which returns a List and then find those same methods gone anyway. Theres a big difference, though. The difference is, there is no code today that uses List. So no existing code will break. Because anyfying a class or a client is an explicit operation (on the class side, you?re explicitly making a tvar ?any?, on the client side, you?re using a List which by definition didn?t work before.) Breaking existing code is far worse than ?If you want to upgrade your List to be List, you?ll have to adjust a few other things too.? As a general rule, the pain of migrating should go to those who want to migrate, and should not fall on those who don?t want to migrate. So existing Map code that uses reference types and wants to keep using reference types, should be able to completely ignore the changes to the API. > Although my suggestion would probably require a tool (a javac plugin?) to migrate sources, though. Go has ?go fix?, but Java has had such a tool for a long time (I think James Gosling worked on it). I know I?m making my suggestion sound even scarier, but I think it beats adding a type system trick for the purpose of source compatibility (more on that later). I have hopes that the IDEs will provide some sort of ?migrate to new collections? transform. The goal there is to reduce the pain of ?I wanted to migrate to use List?, but even if this were a one-button thing, I am not sure I?d want to impose it on people who have existing codebases that they have no plans of upgrading to values. Another mitigating factor is that the new methods are total. That means, you can migrate your code to be ?any-collections-ready? without actually using any of the anyfied classes, and without changing the semantics of anything. Which gives us a path to eventually deprecating the old methods ? though realistically it would probably be a VERY long time before we removed them. > Is the goal to somehow make IntStream into Stream or to deprecate IntStream? If the latter, I also see no reason why sum (but not other sensible operations) must be part of Stream. In any event, a more general solution would be extension methods (I am not proposing we add those). Something slightly more ambitious. I?d like to deprecate {Int,Long,Double}Stream, but allow Stream to respond to all methods currently supported by IntStream. This provides a path to getting rid of the manual specializations (probably faster than the legacy collection methods) because Stream would be just as good as the old IntStream. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Dec 22 00:02:14 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 21 Dec 2015 19:02:14 -0500 Subject: Migrating methods in Collections In-Reply-To: <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> Message-ID: <1F3C7D5B-19A6-46D3-AF8E-FE166EA94A15@oracle.com> > What about other numeric types? Maybe > > BigInteger sum (Stream this) > > too? And what if users would be able to add their own numeric value types? It?s a weird way to add what are essentially extension methods, and on the wrong side of the expression problem as you noted. If, OTOH, we?d have a ?numeric? interface on value types and integers (as I think John alluded to), that might make things better. BTW, This is exactly what I meant by ?trying to make the feature more powerful by adding new axes of slicing.? Its possible, but it has tradeoffs ? most specifically, it pushes the obligation to do ?most specific? testing to runtime, especially when multiple type variables are involved. Its not out of the question, but I?d like to start with the default position that a receiver selector type for a partial method should be a reifiable runtime type (e.g., Foo, Foo, but not Foo) ? this is a stable position that supports the must-have use cases and also has a clear and simple runtime implementation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron at paralleluniverse.co Tue Dec 22 09:36:24 2015 From: ron at paralleluniverse.co (Ron Pressler) Date: Tue, 22 Dec 2015 01:36:24 -0800 (PST) Subject: Migrating methods in Collections In-Reply-To: References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> Message-ID: <460CA192-08FB-4084-A555-7F005025F13E@paralleluniverse.co> > > On Dec 22, 2015, at 1:55 AM, Brian Goetz wrote: > > > As a general rule, the pain of migrating should go to those who want to migrate, and should not fall on those who don?t want to migrate. So existing Map code that uses reference types and wants to keep using reference types, should be able to completely ignore the changes to the API.? > > > Another mitigating factor is that the new methods are total. That means, you can migrate your code to be ?any-collections-ready? without actually using any of the anyfied classes, and without changing the semantics of anything. Which gives us a path to eventually deprecating the old methods ? though realistically it would probably be a VERY long time before we removed them.? > > > I agree with everything, but it all comes down to the following question: does backwards _source_ compatibility alone, something that Java has good coping mechanisms for (javac source level, IDE/tool support) justify the addition of a feature that is not trivial and not very general (certainly not as general as extension methods)? We?re talking about receiver type-matching that is finer-grained than a class, something that feels foreign (and ?un-simple") in OOP. > Something slightly more ambitious. I?d like to deprecate {Int,Long,Double}Stream, but allow Stream to respond to all methods currently supported by IntStream. This provides a path to getting rid of the manual specializations (probably faster than the legacy collection methods) because Stream would be just as good as the old IntStream.? > > > But couldn?t it be just as good a replacement even if some of the methods were plain static methods, something Java developers are quite familiar with? It will require code migration either way. Yes, it won?t have the same fluent-API, but neither will other methods that people will come up with. Is hand-specializing the _public interface_ (I have no qualms with hand-specializing hidden implementation) a necessary enough feature to justify non-class-based receiver-type-matching? It feels like a new and unfamiliar form of ad-hoc almost-but-not-quite extension methods (sadly, actual extension methods won?t solve the problem). If anything, backwards source compatibility is a stronger argument, as it is very important (though, IMO, not important enough to justify this). Default methods had both the urgency ? binary compatibility ? and the generality. It seems to me that partial methods have neither. I?m not saying they?re not a cool feature or that they don?t solve the problem, but they don?t feel very blue-collar. Anyway, I?ve said my piece on this matter :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron at paralleluniverse.co Tue Dec 22 11:58:34 2015 From: ron at paralleluniverse.co (Ron Pressler) Date: Tue, 22 Dec 2015 03:58:34 -0800 (PST) Subject: Migrating methods in Collections In-Reply-To: <460CA192-08FB-4084-A555-7F005025F13E@paralleluniverse.co> References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> <460CA192-08FB-4084-A555-7F005025F13E@paralleluniverse.co> Message-ID: P.S. 1. While language-level extension methods (no JVM involvement) won?t solve the deprecated-method problem, they do solve the Stream.sum problem, and quite elegantly. That is an orthogonal feature which could be added in a later version, and in a compatible way (static methods could be turned into extension methods) with a simple new static-import statement. Instead of one not-too-general and a little strange solution, we can have two general and completely orthogonal solutions, which are also probably simpler to implement, and, IMO easier to understand. Yes, there?s a big price to pay, but it is stacked against a big chunk of complexity which may be avoided. 2. The method-hiding solution that I suggest might also serve as a nice part of the solution to the 64-bit (or John?s generalized) indexing problem. The int-indexed methods will simply be hidden from Java 10 code. Ron > On Dec 22, 2015, at 11:36 AM, Ron Pressler wrote: > > >> >> On Dec 22, 2015, at 1:55 AM, Brian Goetz wrote: >> >> >> As a general rule, the pain of migrating should go to those who want to migrate, and should not fall on those who don?t want to migrate. So existing Map code that uses reference types and wants to keep using reference types, should be able to completely ignore the changes to the API. >> >> > >> Another mitigating factor is that the new methods are total. That means, you can migrate your code to be ?any-collections-ready? without actually using any of the anyfied classes, and without changing the semantics of anything. Which gives us a path to eventually deprecating the old methods ? though realistically it would probably be a VERY long time before we removed them. >> >> >> > > I agree with everything, but it all comes down to the following question: does backwards _source_ compatibility alone, something that Java has good coping mechanisms for (javac source level, IDE/tool support) justify the addition of a feature that is not trivial and not very general (certainly not as general as extension methods)? We?re talking about receiver type-matching that is finer-grained than a class, something that feels foreign (and ?un-simple") in OOP. > > > >> Something slightly more ambitious. I?d like to deprecate {Int,Long,Double}Stream, but allow Stream to respond to all methods currently supported by IntStream. This provides a path to getting rid of the manual specializations (probably faster than the legacy collection methods) because Stream would be just as good as the old IntStream. >> >> >> > > But couldn?t it be just as good a replacement even if some of the methods were plain static methods, something Java developers are quite familiar with? It will require code migration either way. Yes, it won?t have the same fluent-API, but neither will other methods that people will come up with. Is hand-specializing the _public interface_ (I have no qualms with hand-specializing hidden implementation) a necessary enough feature to justify non-class-based receiver-type-matching? It feels like a new and unfamiliar form of ad-hoc almost-but-not-quite extension methods (sadly, actual extension methods won?t solve the problem). If anything, backwards source compatibility is a stronger argument, as it is very important (though, IMO, not important enough to justify this). > > > Default methods had both the urgency ? binary compatibility ? and the generality. It seems to me that partial methods have neither. I?m not saying they?re not a cool feature or that they don?t solve the problem, but they don?t feel very blue-collar. > > > Anyway, I?ve said my piece on this matter :) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Tue Dec 22 14:01:06 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 22 Dec 2015 09:01:06 -0500 Subject: Equality In-Reply-To: <5FCA6B8E-57DC-4F7A-8361-FF1D0A50F4FD@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> <5675A5D5.6060208@cs.oswego.edu> <5675D840.7090109@oracle.com> <56782DF0.20800@cs.oswego.edu> <5FCA6B8E-57DC-4F7A-8361-FF1D0A50F4FD@oracle.com> Message-ID: <567957A2.4090900@cs.oswego.edu> On 12/21/2015 05:33 PM, John Rose wrote: > Two events like that certainly call for generalization. > Algebraists will be eager to suggest other any-fied relations, > so we want to support open-ended extension mechanisms. > This is one reason value types are envisioned to interoperate > with interfaces. > I'm not sure about generalization. There's only the beginnings of academic work on static analysis and validation of functional properties (like those below pasted from something else I had around). In the mean time, probably the best we could do is add annotations (like @Symmetric) that would have to be trusted in order to be effective. Or, nearer term, focus only on equals and compareTo. for function f, predicate p, and valid arguments a, b, c: Idempotent: f(a) == f(a) Deterministic: if (a == b) then f(a) == f(b) Injective: if (a != b) then f(a) != f(b) Commutative: f(a, b) == f(b, a) Associative: f(f(a, b), c) == f(a, f(b, c)) Monotonic: if (a <= b) then f(a) <= f(b) Reflexive: p(a, a) Irreflexive: !p(a, a) Symmetric: if (a == b) then p(a, b) == p(b, a) Antisymmetric: if (a != b) then p(a, b) != p(b, a) Transitive: if (p(a, b) and p(b, c)) then p(a, c) -Doug From dl at cs.oswego.edu Tue Dec 22 14:18:41 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 22 Dec 2015 09:18:41 -0500 Subject: Equality In-Reply-To: <5FCA6B8E-57DC-4F7A-8361-FF1D0A50F4FD@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> <5675A5D5.6060208@cs.oswego.edu> <5675D840.7090109@oracle.com> <56782DF0.20800@cs.oswego.edu> <5FCA6B8E-57DC-4F7A-8361-FF1D0A50F4FD@oracle.com> Message-ID: <56795BC1.8000307@cs.oswego.edu> I should have noted... On 12/21/2015 05:33 PM, John Rose wrote: > On Dec 21, 2015, at 8:50 AM, Doug Lea

wrote: >> And, as John almost noted, compareTo/Comparable needs >> treatment similar to my hybrid version of equals. > Well, "needs" is too strong. It would be disappointing if values could not be Comparable, but all java.util APIs related to sorted-orders allow a separate Comparator, so would still be usable. -Doug From brian.goetz at oracle.com Tue Dec 22 18:30:56 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 22 Dec 2015 13:30:56 -0500 Subject: Migrating methods in Collections In-Reply-To: References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> <460CA192-08FB-4084-A555-7F005025F13E@paralleluniverse.co> Message-ID: <567996E0.5090607@oracle.com> > 1. While language-level extension methods (no JVM involvement) won?t > solve the deprecated-method problem, they do solve the Stream.sum > problem, and quite elegantly. Sort of -- but only if you buy the partiality. In order for this to work, you have to (as .NET does) inject methods into *types*, not into *classes*. This is just a partial method in disguise! In order for this to work, you have to be comfortable with the idea that the set of methods on a given receiver is dependent on the parameterization of the receiver. And you clearly are, since you don't think extension methods are too complex. FTR, we considered and rejected use-site extension methods in 8, for a philosophical reason that is still equally valid here: an API developers should own their API. What we rejected is the use-site aspect of it; the part we actually liked (but didn't have enough motivation to embrace in 8) was the partiality. Just as default methods support the after-the-fact aspect of API extension that use-site extension would (without the transparency risks), partial methods (including partial defaults) support the specialized-receiver aspect of extension methods that we like but don't yet have. From ron at paralleluniverse.co Tue Dec 22 19:09:18 2015 From: ron at paralleluniverse.co (Ron Pressler) Date: Tue, 22 Dec 2015 11:09:18 -0800 (PST) Subject: Migrating methods in Collections In-Reply-To: <567996E0.5090607@oracle.com> References: <567826D3.7050504@oracle.com> <6C85E814-EE54-4DAF-92F8-650D01E21D18@paralleluniverse.co> <56783EFE.2070004@oracle.com> <1164377D-E173-44EB-97D7-60FF1FE1EA77@paralleluniverse.co> <460CA192-08FB-4084-A555-7F005025F13E@paralleluniverse.co> <567996E0.5090607@oracle.com> Message-ID: > FTR, we considered and rejected use-site extension methods in 8, for a? > philosophical reason that is still equally valid here: an API developers? > should own their API. What we rejected is the use-site aspect of it;? > the part we actually liked (but didn't have enough motivation to embrace? > in 8) was the partiality. Just as default methods support the? > after-the-fact aspect of API extension that use-site extension would? > (without the transparency risks), partial methods (including partial? > defaults) support the specialized-receiver aspect of extension methods? > that we like but don't yet have. > > > Well, I?m not suggesting we add extension methods, only if fluent APIs are that important, and even then we don?t really need extension methods if we (rightly) want people to own their API, just a threading operator, a-la Clojure: ? ? stream => sum() ?(compiled to sum(stream)) BTW, for sum specifically to be fluent it requires neither approach. I think that ? ? stream.reduce(sum()) is just as good as? ? ? stream.sum() In fact, I prefer it, because it feels very general. But anyway, if partiality is something we want for its own sake, then partial methods FTW! :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Dec 22 20:01:20 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 22 Dec 2015 15:01:20 -0500 Subject: Migrating methods in Collections In-Reply-To: <5671BEC8.8010508@oracle.com> References: <56719E1F.2050504@oracle.com> <5671BEC8.8010508@oracle.com> Message-ID: <5679AC10.50109@oracle.com> Returning back to the short-term goal of migrating the Collections API... I realize that a certain degree of "tell me the story again for X?" is needed to meaningfully respond, but I'd like to bring the focus back to the core APIs (and secondarily the impact on anyfying generic APIs in general.) I think Map.get offers us an existence proof that *some* degree of API evolution is needed here. So I don't think there's a credible "do nothing" approach. Still, minimizing the impact seems a reasonable goal. We have a set of credible tools for evolving the APIs -- migrating ref-specific methods to more suitable total alternatives (remove -> removeAt), abandoning some methods to "reference purgatory" where there is already a suitable total alternative (e.g., abandoning removeAll in favor of the existing removeIf added in 8), and some form of "superation". Plenty of additional work is needed to find a way to express superation in way that's not painfully ugly, but we can address that after the concept has proven itself. So, I propose we return to the topic of Collection APIs. In addition to making collections safe for values, we also have some limited opportunity to fix errors in the API, within reason, if there are any that rise to the appropriate threshold. (More intrusive migration -- such as busting the 32-bit index limitations -- is likely to be possible with additional linguistic tools, but I'd like to save that for another round.) Further, let's focus (for now) on the perspective of clients, not implementors. Under any of the proposed changes: - existing binary clients will continue to work unchanged - existing sources can be recompiled and will work without change - if a client is anyfied (either by anyfying a generic method, or by naming a value instantiation of a generic class), then its possible some method calls will no longer exist. However, since the client is changing the source, this is not a compatibility issue, it is only a migration burden (and one that can be mitigated to some degree by tooling.) On 12/16/2015 2:43 PM, Brian Goetz wrote: > The previous memo outlined a tactic for effectively "migrating" a > method in a current generic class to a related but different signature > in an any-generic class, while retaining source and binary > compatibility with existing clients and subclasses, and outlined the > list of possibly problematic methods in the Collections framework. > > The effectiveness of this tactic on this set of methods ranges from > "slam-dunk" to "seems like it could work" to "doesn't help." It would > be nice to have one hammer that pounds down all the nails, but I'm not > sure there is one. This memo outlines a range of complementary > tactics that might, in combination with the above approach, enable us > to cover the waterfront. > > I'll start with the method Collection.contains(Object). It might at > first seem that we want the signature here to be contains(E), but this > gets in the way of cases like: > > dogs.contains(animal) > > or of converting dogs::contains to a Predicate(which > is what filter/removeIf would want.) > > Note that this has little to do directly with value types; value types > are invariant. But if we want a single contains() method that ranges > over any E, it needs to accomodate variance for reference > instantiations while not falling back on Object as a top type. > > One technique for doing so would be to introduce contravariant > inference variables. Then we could write contains as: > > boolean contains(U u) > > This has three main downsides: > - Even though this works, its still not that obvious. > - It's a *lot* of work in the spec and compiler; it pushes on all the > fragilebits. > - If that weren't enough, it is a theoretical minefield. Papers like > http://www.cis.upenn.edu/~bcpierce/papers/variance.pdf show that > adding contravariance to certain type systems result in subtyping > becoming undecidable. > > On the other hand, I'm sure many library writers would jump for joy to > have this in the toolbox; the lack of contravariant tvars seems a > notable inconsistency in the language. (But let's not kid ourselves > about the costs.) > > It just so happens that this construct works out; it is binary- and > source- compatible to make a non-generic method generic, as long as > the erasure of the signature remains the same. So changing contains() > or remove() as above would not cause subclasses or clients to fail to > either link or recompile (in the subclass recompilation case, it would > be reinterpreted as a raw override, which is allowed and compatible.) > And we'd end up with a total method that does the right thing both for > refs and values. > > Ignoring the costs and risks, this technique applies to a number of > the methods in our rogue's gallery, including toArray(), for which we > didn't yet have a solution: > > U[] toArray() > > This is compatible with existing clients who are expecting an Object[] > to come back from toArray() on a collection of reference types, and > collapses to V[] for any value type, so Collection.toArray() > returns int[]. (We might still want an unchecked warning if the > compiler infers U != Object for reference E, but that's a separate and > easily handled consideration.) > > Let's call this technique "superation" (yes, its an intentional > (disgusting) pun. See > http://beta.merriam-webster.com/dictionary/suppurate. And think about > that the next time you pass a "Super 8" motel on the highway.) With > this in our toolbox, the strategy matrix becomes: > > > > *Class** > * *Method** > * *Possible Approaches** > * > Collection > contains(Object) > Superateto contains(U) > > remove(Object) > > > removeAll(Collection) > Abandon in favor of existing removeIf(Predicate). > > retainAll(Collection) > > > containsAll(Collection) > Migrate to containsElements(Collection), or abandon. > > toArray() > Superate to U[] toArray() > > toArray(T[]) > Leave as is, superate, or abandon. > List > remove(int) > Migrate to removeAt(int). > > indexOf(Object) > Migrate to Optional-bearing findFirst(Predicate) > > lastIndexOf(Object) > > Map > containsKey(Object) > Superate > > containsValue(Object) > Superate > > remove(Object) > Superate > > put(K,V) > Leave as is > > get(K) > Migrate to one (or all) of: > Optional map(K) > mapOrElse(K, V) > tryMap(K, Consumer) > > getOrDefault(Object, V) > Migrate to mapOrElse, or superate > Queue > poll(), peek() Migrate to tryPoll(Consumer) *or* optional-bearing method > Deque > poll(), peek() > > > pollFirst(), pollLast() > > > peekFirst(), peekLast() > > > removeFirstOccurrence(), removeLastOccurrence() > Migrate to predicate-accepting method, or superate > > > This is amore satisfying matrix; not only does everything have an > acceptable strategy, but some have more than one, and the user impact > of superating a method is lower (users might just not notice), so the > perception is that fewer methods are affected. Still, super-bounded > tvars are a big hammer for such a small foe. Maybe there's an > alternate approach that has the effect of superation but doesn't need > such a big hammer. > > > Here's one possibility. We already have a notion of partial methods. > We could have a pair of methods > > > Object[] toArray() > > > E[] toArray() > > both of which are reasonable signatures for their restricted domains. > Unfortunately, the natural interpretation of this pair of methods is > that the first is a member of Collection, and the second is a > member of Collection, but there is *no* toArray() method that > is a member of Collection! This means that code that is > generic in any-T would not see a toArray() method at all. That's a > problem (though not as enormous as it initially sounds, there are > possible mitigating techniques.) > > However, it is not unreasonable for the compiler to recognize this > situation and deal. Suppose I have some code generic in any-T: > > void foo(Collection c) { > T[] arr = c.toArray(); > } > > Now, the compiler doesn't know whether c is a collection of refs or > values, but it knows it's one or the other (ref T and val T form a > partition of any T). So it could (and in some cases, has to anyway) > do type checking by parts -- it can typecheck the above assignment > under the assumption of ref T, and do it again under the assumption of > val T, and if both succeed (and something else, see below), accept the > method invocation as valid. (In this case, the ref-T fork should > result in an unchecked warning, meaning that the merged checking also > yields an unchecked warning.) > > The "something else" part is: when doing overload resolution by parts, > both branches must resolve to overloads that are erasure-equivalent to > each other. Which is true for toArray() (and for all the cases for > which superation would work.) > > Now, this is a lot of handwaving, and it doesn't even really describe > how we think partial methods should actually work (I'd like to get rid > of the where-val-T slices entirely, this is a separate discussion.) > But its a sketch of an option that achieves the positive result of > superation without engaging the complexity of superation. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Wed Dec 23 14:10:45 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 23 Dec 2015 09:10:45 -0500 Subject: Migrating methods in Collections In-Reply-To: <5679AC10.50109@oracle.com> References: <56719E1F.2050504@oracle.com> <5671BEC8.8010508@oracle.com> <5679AC10.50109@oracle.com> Message-ID: <567AAB65.6070508@cs.oswego.edu> On 12/22/2015 03:01 PM, Brian Goetz wrote: > Returning back to the short-term goal of migrating the Collections API... If methods accepting Object arguments can be anyfied, this removes some methods from your problem table: Collection.{contains(Object), remove(Object)}, List.{indexOf(Object), lastIndexOf(Object)} and Map.{containsKey(Object), containsValue(Object), remove(Object)}. I realize that there are still a bunch of unresolved issues in pulling this off. But ignoring them for now... One natural follow-on question is that if we can anyfy contains, why can't we do so for containsAll(Collection)? And similarly for removeAll(Collection), retainAll(Collection). In other words, is this or some variant allowed? boolean containsAll(Collection) If so, the main remaining questions surround optionality of results, that I'll answer separately. But there are still others, List.remove(int index) and Collection.toArray() Doing nothing about List.remove(index) seems to be legal option. No existing code will encounter an ambiguity that is not already present due to autoboxing (for List). New code using or implementing List will need some way to disambiguate. But I think that some syntax will be needed to allow anyway. It might be nice introduce method removeAt to reduce need to use this syntax, but doesn't seem necessary? About the two Collection toArray() methods: The no-arg version must return Object[]. I don't see how anyfying (in any way) can guarantee compatible results. The T[] toArray(T[] array) version has worse problems: most current implementations use reflection if the argument array is not big enough (because there is no syntax for "new T[n]"). I don't see offhand how to compatibly mangle reflective code. Plus, the spec explicitly says that if the array is too large, a null is appended to elements. Null is of course not a legal value for non-ref types. I don't see a good alternative to leaving both forms of toArray as-is, and to box results -- requiring that even custom non-ref implementations do so. But this suggests that we should find some other way (possibly in a utility class) to create a val-type array of elements in a val-type collection. -Doug From brian.goetz at oracle.com Wed Dec 23 15:52:25 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 23 Dec 2015 10:52:25 -0500 Subject: Migrating methods in Collections In-Reply-To: <567AAB65.6070508@cs.oswego.edu> References: <56719E1F.2050504@oracle.com> <5671BEC8.8010508@oracle.com> <5679AC10.50109@oracle.com> <567AAB65.6070508@cs.oswego.edu> Message-ID: <567AC339.3000805@oracle.com> Some good thoughts, and some wishful thinking. tl;dr Summary: - I think its a stretch to say that equals() and contains() can be anyfied while still accepting Object. I think there are linguistic solutions so that existing Object-accepting code can continue to run unchanged for reference instantiations, and that the signatures can be generally rescued, but I think that's something different than "methods accepting Objects can be anyfied." - toArray() is indeed a problem. I believe that the same tools for rescuing equals() can also probably be applied towards toArray(). > If methods accepting Object arguments can be anyfied, this removes > some methods from your problem table: Collection.{contains(Object), > remove(Object)}, List.{indexOf(Object), lastIndexOf(Object)} and > Map.{containsKey(Object), containsValue(Object), remove(Object)}. > > I realize that there are still a bunch of unresolved issues > in pulling this off. But ignoring them for now... I agree the same solution should work for all these methods. But I don't think we'll get to the point where the signature of equals() or contains() simply accepts Object. Several major concerns: - Boxing. If these methods accept Object, there is going to be some degree of boxing that we can't eliminate. Whether this is "some" or "a lot", I can't imagine getting it down to the point where we're comfortable. - Intrusion. Do we really want to ask authors to deal with Object in Complex.equals()? I would think these methods would want to start with a V and go from there, not have to reason about "if its anything other than a boxed V, forget about it, otherwise cast and unbox." This is not logic we want the user to have to write for each of these methods. What we want, I think, is for the signature of those methods to be: - x(Object) // for reference instantiations - x(T) // for value instantiations That Object is the erasure of T is a powerful connection we can hang our hat on here. I think there are at least three linguistic approaches to rescuing these methods: - contravariant type args () - some sort of peeling that treats x(T) and x(Object) as separate methods, but usually defaults/bridges one of them, so you just have to implement the appropriate one - some way of expressing a signature that means "T when a value, or Object when a reference" All of these have cons, but we've got a long enough list to suggest that there is *a* solution here, and maybe there's a better one if we pull on that string some more. So let's assume there's *some* way to write equals/contains/etc so the right things happen. Your list above stands, except that there's still some degree of migration. > One natural follow-on question is that if we can anyfy contains, why > can't we do so for containsAll(Collection)? And similarly for > removeAll(Collection), retainAll(Collection). In other words, > is this or some variant allowed? > boolean containsAll(Collection) Good thought! Gavin and I bashed our heads against this one for a while about a year ago. First, note that we only have three such methods: remove/retain/containsAll. And we can "retire" two of them as being inferior to removeIf. Which means there's just one method here to rescue. If we have vars, I think we can do the same trick. But the other tricks don't work as well, because of a (sensible but frustrating) limitation of old generics interop -- if you have a method with generics: void foo(T t) void moo(Foo f) you can do a "raw override" void foo(Object t) // acceptable raw override void moo(Foo f) // acceptable raw override and that's fine, but you can't do the same with a wildcard: void moo(Foo f) // not OK So this wouldn't be source-compatible for existing subclasses of Collection. However, its possible that the third variant in our candidate list above -- which amounts to some way of writing the dependent type "if T is erased, then the bound of T, otherwise T" -- might be able to get us here. Or not. If this is the worst of our problems, we have already won. > If so, the main remaining questions surround optionality of results, > that I'll answer separately. Right, there's a real space of API design here. > Doing nothing about List.remove(index) seems to be legal option. Yes, that's a legal option (just as today, you can overload foo(T) and foo(String)). Not sure if it *should* be a legal option (at the very least, the compiler should warn you of this, as it should also probably with overloads that fail to follow a meet rule.) > No > existing code will encounter an ambiguity that is not already present > due to autoboxing (for List). New code using or implementing > List will need some way to disambiguate. But I think that some > syntax will be needed to allow anyway. It might be nice introduce > method removeAt to reduce need to use this syntax, but doesn't seem > necessary? Can you expand on what you might want for disambiguation here? > About the two Collection toArray() methods: > > The no-arg version must return Object[]. I don't see how anyfying (in > any way) can guarantee compatible results. The T[] toArray(T[] > array) version has worse problems: most current implementations use > reflection if the argument array is not big enough (because there is > no syntax for "new T[n]"). I don't see offhand how to compatibly > mangle reflective code. Plus, the spec explicitly says that if the > array is too large, a null is appended to elements. Null is of course > not a legal value for non-ref types. I think "null" can be compatibly replaced with "the default value for the type", which is the same as "null" for all existing code. So that's not a blocker. Reflection is harder, but its quite possible that this will come out in the "specialization wash". If we can have an anyfied version of Arrays.copyOf -- which seems doable -- then I think that problem goes away too. That said, maybe the second version of toArray() should be abandoned in the ref layer for compatibility only, and we should add the new total method T[] toArray(IntFunction generator) as we did with streams. (I think we should introduce this method regardless, actually, for all the reasons that came up when we were discussing it for streams. This is not a method we could have (credibly) had in 1.2, but with lambdas in the language, its kind of a no brainer.) > I don't see a good alternative to leaving both forms of toArray as-is, > and to box results -- requiring that even custom non-ref > implementations do so. But this suggests that we should find some > other way (possibly in a utility class) to create a val-type array of > elements in a val-type collection. Speaking only about *signatures* now, I think the same techniques that allow us to rescue contains(Object) may do the same for toArray(). - U[] toArray() could work; - peeling into separate Object[] toArray() for ref / T[] toArray() for val could work; - expressing the dependent type (T.erased ? T.bound : T) would also work. From dl at cs.oswego.edu Wed Dec 23 16:13:26 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 23 Dec 2015 11:13:26 -0500 Subject: Migrating methods in Collections In-Reply-To: <567AC339.3000805@oracle.com> References: <56719E1F.2050504@oracle.com> <5671BEC8.8010508@oracle.com> <5679AC10.50109@oracle.com> <567AAB65.6070508@cs.oswego.edu> <567AC339.3000805@oracle.com> Message-ID: <567AC826.9060206@cs.oswego.edu> Isolating one small issue for now: On 12/23/2015 10:52 AM, Brian Goetz wrote: >> Doing nothing about List.remove(index) seems to be legal option. > > Yes, that's a legal option (just as today, you can overload foo(T) and > foo(String)). Not sure if it *should* be a legal option (at the very least, the > compiler should warn you of this, as it should also probably with overloads that > fail to follow a meet rule.) > >> No >> existing code will encounter an ambiguity that is not already present >> due to autoboxing (for List). New code using or implementing >> List will need some way to disambiguate. But I think that some >> syntax will be needed to allow anyway. It might be nice introduce >> method removeAt to reduce need to use this syntax, but doesn't seem >> necessary? > > Can you expand on what you might want for disambiguation here? > Not sure; possibly nothing considering that users already (since jdk5) live with this compiling without warning: import java.util.*; public class RemoveInteger { public static void main(String[] args) { List c = new ArrayList(); c.add(1); c.remove(1); System.out.println(c); } } Running ... Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:659) at java.util.ArrayList.remove(ArrayList.java:495) at RemoveInteger.main(RemoveInteger.java:8) From peter.levart at gmail.com Wed Dec 23 16:30:55 2015 From: peter.levart at gmail.com (Peter Levart) Date: Wed, 23 Dec 2015 17:30:55 +0100 Subject: Equality In-Reply-To: <5675880D.4060503@oracle.com> References: <56719E1F.2050504@oracle.com> <56742B2F.3000809@cs.oswego.edu> <56743A76.6000007@oracle.com> <56744544.1050603@cs.oswego.edu> <567446F3.7000904@oracle.com> <56746AE2.9070403@cs.oswego.edu> <9FA0A474-0CE4-4F13-BE46-5C4515D2CD94@oracle.com> <56757F8D.2030304@cs.oswego.edu> <5675880D.4060503@oracle.com> Message-ID: <567ACC3F.1030303@gmail.com> Hi, Sometimes I miss a feature in generic type declarations where one could refer to a type variable of a super type without repeating it in the declaration of the generic type. For example, currently one has to write: interface Snapshooter> { C snapshot(); boolean addElement(E e); } Snapshooter> strings = ...; List snapshot1 = strings.snapshot(); strings.addElement("abc"); List snapshot2 = strings.snapshot(); What if E could be implicitly declared like: interface Snapshooter> { C snapshot(); boolean addElement(E e); } Snapshooter> strings = ...; Which would be similar to declaring: interface Snapshooter> { ... } ...with added benefit that one could refer to E from Collection. I don't know if this could be soundly incorporated into the language type system, but if it could be, and if interfaces could also be implemented by value types, then... On 12/19/2015 05:38 PM, Brian Goetz wrote: >> Anyfy equals, and adjust default implementation of boolean >> T.equals(any x) > > I think we may be talking past each other (while basically saying the > same thing from opposite directions.) > > "any" is not a type; it is a modifier that affects the domain of type > variables. So we don't have (and I think we don't want) a meaning for > equals(any x). But what we do want, is a way of expressing "I'll take > anything which could be on the other side of the == operator with me." > For refs, that's Object; for a value type V, that's just V. > > Where we had gotten with the / superation idiom (not > suggesting either of these syntaxes is great) is being able to express: > > - If T is value, then T, else the erasure of T (usually object) ** > > I'll write this as Sup for short. The convenient thing about > Sup is that it conveniently collapses to Object in the places where > we want Object, so we could define contains/remove as > > contains(Sup) > > and contains will always bottom out at equals(), so equals() similarly > needs to be > > equals(Sup) > > If this is a valid approach (and I think its the best one we've got so > far), then we're looking for how to spell Sup (in all of: type > system, language syntax, bytecode descriptors.) ....there could be a special interface: public interface Any> { boolean equals(T x); } ...implemented by Object: public class Object implements Any