Migrating methods in Collections

Tue Dec 22 20:01:20 UTC 2015

Returning back to the short-term goal of migrating the Collections 
API...  I realize that a certain degree of "tell me the story again for 
X?" is needed to meaningfully respond, but I'd like to bring the focus 
back to the core APIs (and secondarily the impact on anyfying generic 
APIs in general.)

I think Map.get offers us an existence proof that *some* degree of API 
evolution is needed here.  So I don't think there's a credible "do 
nothing" approach.  Still, minimizing the impact seems a reasonable goal.

We have a set of credible tools for evolving the APIs -- migrating 
ref-specific methods to more suitable total alternatives (remove -> 
removeAt), abandoning some methods to "reference purgatory" where there 
is already a suitable total alternative (e.g., abandoning removeAll in 
favor of the existing removeIf added in 8), and some form of 
"superation".  Plenty of additional work is needed to find a way to 
express superation in way that's not painfully ugly, but we can address 
that after the concept has proven itself.

So, I propose we return to the topic of Collection APIs.  In addition to 
making collections safe for values, we also have some limited 
opportunity to fix errors in the API, within reason, if there are any 
that rise to the appropriate threshold.  (More intrusive migration -- 
such as busting the 32-bit index limitations -- is likely to be possible 
with additional linguistic tools, but I'd like to save that for another 
round.) Further, let's focus (for now) on the perspective of clients, 
not implementors.

Under any of the proposed changes:
  - existing binary clients will continue to work unchanged
  - existing sources can be recompiled and will work without change
  - if a client is anyfied (either by anyfying a generic method, or by 
naming a value instantiation of a generic class), then its possible some 
method calls will no longer exist.  However, since the client is 
changing the source, this is not a compatibility issue, it is only a 
migration burden (and one that can be mitigated to some degree by tooling.)

On 12/16/2015 2:43 PM, Brian Goetz wrote:
> The previous memo outlined a tactic for effectively "migrating" a 
> method in a current generic class to a related but different signature 
> in an any-generic class, while retaining source and binary 
> compatibility with existing clients and subclasses, and outlined the 
> list of possibly problematic methods in the Collections framework.
>
> The effectiveness of this tactic on this set of methods ranges from 
> "slam-dunk" to "seems like it could work" to "doesn't help."  It would 
> be nice to have one hammer that pounds down all the nails, but I'm not 
> sure there is one.  This memo outlines a range of complementary 
> tactics that might, in combination with the above approach, enable us 
> to cover the waterfront.
>
> I'll start with the method Collection.contains(Object). It might at 
> first seem that we want the signature here to be contains(E), but this 
> gets in the way of cases like:
>
>     dogs.contains(animal)
>
> or of converting dogs::contains to a Predicate<? super Animal>(which 
> is what filter/removeIf would want.)
>
> Note that this has little to do directly with value types; value types 
> are invariant.  But if we want a single contains() method that ranges 
> over any E, it needs to accomodate variance for reference 
> instantiations while not falling back on Object as a top type.
>
> One technique for doing so would be to introduce contravariant 
> inference variables.  Then we could write contains as:
>
>     <U super E> boolean contains(U u)
>
> This has three main downsides:
>  - Even though this works, its still not that obvious.
>  - It's a *lot* of work in the spec and compiler; it pushes on all the 
> fragilebits.
>  - If that weren't enough, it is a theoretical minefield.  Papers like 
> http://www.cis.upenn.edu/~bcpierce/papers/variance.pdf show that 
> adding contravariance to certain type systems result in subtyping 
> becoming undecidable.
>
> On the other hand, I'm sure many library writers would jump for joy to 
> have this in the toolbox; the lack of contravariant tvars seems a 
> notable inconsistency in the language.  (But let's not kid ourselves 
> about the costs.)
>
> It just so happens that this construct works out; it is binary- and 
> source- compatible to make a non-generic method generic, as long as 
> the erasure of the signature remains the same.  So changing contains() 
> or remove() as above would not cause subclasses or clients to fail to 
> either link or recompile (in the subclass recompilation case, it would 
> be reinterpreted as a raw override, which is allowed and compatible.)  
> And we'd end up with a total method that does the right thing both for 
> refs and values.
>
> Ignoring the costs and risks, this technique applies to a number of 
> the methods in our rogue's gallery, including toArray(), for which we 
> didn't yet have a solution:
>
>     <U super E> U[] toArray()
>
> This is compatible with existing clients who are expecting an Object[] 
> to come back from toArray() on a collection of reference types, and 
> collapses to V[] for any value type, so Collection<int>.toArray() 
> returns int[].  (We might still want an unchecked warning if the 
> compiler infers U != Object for reference E, but that's a separate and 
> easily handled consideration.)
>
> Let's call this technique "superation" (yes, its an intentional 
> (disgusting) pun.  See 
> http://beta.merriam-webster.com/dictionary/suppurate. And think about 
> that the next time you pass a "Super 8" motel on the highway.)  With 
> this in our toolbox, the strategy matrix becomes:
>
>
>
> *Class**
> * 	*Method**
> * 	*Possible Approaches**
> *
> Collection
> 	contains(Object)
> 	Superateto <U super E> contains(U)
>
> 	remove(Object)
> 	
>
> 	removeAll(Collection<?>)
> 	Abandon in favor of existing removeIf(Predicate<? super E>).
>
> 	retainAll(Collection<?>)
> 	
>
> 	containsAll(Collection<?>)
> 	Migrate to containsElements(Collection<? extends E>), or abandon.
>
> 	toArray()
> 	Superate to <U super E> U[] toArray()
>
> 	toArray(T[])
> 	Leave as is, superate, or abandon.
> List
> 	remove(int)
> 	Migrate to removeAt(int).
>
> 	indexOf(Object)
> 	Migrate to Optional-bearing findFirst(Predicate<? super E>)
>
> 	lastIndexOf(Object)
> 	
> Map
> 	containsKey(Object)
> 	Superate
>
> 	containsValue(Object)
> 	Superate
>
> 	remove(Object)
> 	Superate
>
> 	put(K,V)
> 	Leave as is
>
> 	get(K)
> 	Migrate to one (or all) of:
> Optional<V> map(K)
> mapOrElse(K, V)
> tryMap(K, Consumer<? super V>)
>
> 	getOrDefault(Object, V)
> 	Migrate to mapOrElse, or superate
> Queue
> 	poll(), peek() 	Migrate to tryPoll(Consumer) *or* optional-bearing method
> Deque
> 	poll(), peek()
> 	
>
> 	pollFirst(), pollLast()
> 	
>
> 	peekFirst(), peekLast()
> 	
>
> 	removeFirstOccurrence(), removeLastOccurrence()
> 	Migrate to predicate-accepting method, or superate
>
>
> This is amore satisfying matrix; not only does everything have an 
> acceptable strategy, but some have more than one, and the user impact 
> of superating a method is lower (users might just not notice), so the 
> perception is that fewer methods are affected.  Still, super-bounded 
> tvars are a big hammer for such a small foe.  Maybe there's an 
> alternate approach that has the effect of superation but doesn't need 
> such a big hammer.
>
>
> Here's one possibility.  We already have a notion of partial methods.  
> We could have a pair of methods
>
>     <where ref E>
>     Object[] toArray()
>
>     <where val E>
>     E[] toArray()
>
> both of which are reasonable signatures for their restricted domains.  
> Unfortunately, the natural interpretation of this pair of methods is 
> that the first is a member of Collection<ref T>, and the second is a 
> member of Collection<val T>, but there is *no* toArray() method that 
> is a member of Collection<any T>!  This means that code that is 
> generic in any-T would not see a toArray() method at all.  That's a 
> problem (though not as enormous as it initially sounds, there are 
> possible mitigating techniques.)
>
> However, it is not unreasonable for the compiler to recognize this 
> situation and deal.  Suppose I have some code generic in any-T:
>
>     <any T> void foo(Collection<T> c) {
>         T[] arr = c.toArray();
>     }
>
> Now, the compiler doesn't know whether c is a collection of refs or 
> values, but it knows it's one or the other (ref T and val T form a 
> partition of any T).  So it could (and in some cases, has to anyway) 
> do type checking by parts -- it can typecheck the above assignment 
> under the assumption of ref T, and do it again under the assumption of 
> val T, and if both succeed (and something else, see below), accept the 
> method invocation as valid.  (In this case, the ref-T fork should 
> result in an unchecked warning, meaning that the merged checking also 
> yields an unchecked warning.)
>
> The "something else" part is: when doing overload resolution by parts, 
> both branches must resolve to overloads that are erasure-equivalent to 
> each other.  Which is true for toArray() (and for all the cases for 
> which superation would work.)
>
> Now, this is a lot of handwaving, and it doesn't even really describe 
> how we think partial methods should actually work (I'd like to get rid 
> of the where-val-T slices entirely, this is a separate discussion.)  
> But its a sketch of an option that achieves the positive result of 
> superation without engaging the complexity of superation.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20151222/cffb1248/attachment-0001.html>