Migrating methods in Collections

Wed Dec 16 19:43:04 UTC 2015

The previous memo outlined a tactic for effectively "migrating" a method 
in a current generic class to a related but different signature in an 
any-generic class, while retaining source and binary compatibility with 
existing clients and subclasses, and outlined the list of possibly 
problematic methods in the Collections framework.

The effectiveness of this tactic on this set of methods ranges from 
"slam-dunk" to "seems like it could work" to "doesn't help."  It would 
be nice to have one hammer that pounds down all the nails, but I'm not 
sure there is one.  This memo outlines a range of complementary tactics 
that might, in combination with the above approach, enable us to cover 
the waterfront.

I'll start with the method Collection.contains(Object).  It might at 
first seem that we want the signature here to be contains(E), but this 
gets in the way of cases like:

     dogs.contains(animal)

or of converting dogs::contains to a Predicate<? super Animal>(which is 
what filter/removeIf would want.)

Note that this has little to do directly with value types; value types 
are invariant.  But if we want a single contains() method that ranges 
over any E, it needs to accomodate variance for reference instantiations 
while not falling back on Object as a top type.

One technique for doing so would be to introduce contravariant inference 
variables.  Then we could write contains as:

     <U super E> boolean contains(U u)

This has three main downsides:
  - Even though this works, its still not that obvious.
  - It's a *lot* of work in the spec and compiler; it pushes on all the 
fragilebits.
  - If that weren't enough, it is a theoretical minefield. Papers like 
http://www.cis.upenn.edu/~bcpierce/papers/variance.pdf show that adding 
contravariance to certain type systems result in subtyping becoming 
undecidable.

On the other hand, I'm sure many library writers would jump for joy to 
have this in the toolbox; the lack of contravariant tvars seems a 
notable inconsistency in the language.  (But let's not kid ourselves 
about the costs.)

It just so happens that this construct works out; it is binary- and 
source- compatible to make a non-generic method generic, as long as the 
erasure of the signature remains the same.  So changing contains() or 
remove() as above would not cause subclasses or clients to fail to 
either link or recompile (in the subclass recompilation case, it would 
be reinterpreted as a raw override, which is allowed and compatible.)  
And we'd end up with a total method that does the right thing both for 
refs and values.

Ignoring the costs and risks, this technique applies to a number of the 
methods in our rogue's gallery, including toArray(), for which we didn't 
yet have a solution:

     <U super E> U[] toArray()

This is compatible with existing clients who are expecting an Object[] 
to come back from toArray() on a collection of reference types, and 
collapses to V[] for any value type, so Collection<int>.toArray() 
returns int[].  (We might still want an unchecked warning if the 
compiler infers U != Object for reference E, but that's a separate and 
easily handled consideration.)

Let's call this technique "superation" (yes, its an intentional 
(disgusting) pun.  See 
http://beta.merriam-webster.com/dictionary/suppurate.  And think about 
that the next time you pass a "Super 8" motel on the highway.)  With 
this in our toolbox, the strategy matrix becomes:

*Class**
* 	*Method**
* 	*Possible Approaches**
*
Collection
	contains(Object)
	Superateto <U super E> contains(U)

	remove(Object)

	removeAll(Collection<?>)
	Abandon in favor of existing removeIf(Predicate<? super E>).

	retainAll(Collection<?>)

	containsAll(Collection<?>)
	Migrate to containsElements(Collection<? extends E>), or abandon.

	toArray()
	Superate to <U super E> U[] toArray()

	toArray(T[])
	Leave as is, superate, or abandon.
List
	remove(int)
	Migrate to removeAt(int).

	indexOf(Object)
	Migrate to Optional-bearing findFirst(Predicate<? super E>)

	lastIndexOf(Object)

Map
	containsKey(Object)
	Superate

	containsValue(Object)
	Superate

	remove(Object)
	Superate

	put(K,V)
	Leave as is

	get(K)
	Migrate to one (or all) of:
Optional<V> map(K)
mapOrElse(K, V)
tryMap(K, Consumer<? super V>)

	getOrDefault(Object, V)
	Migrate to mapOrElse, or superate
Queue
	poll(), peek() 	Migrate to tryPoll(Consumer) *or* optional-bearing method
Deque
	poll(), peek()

	pollFirst(), pollLast()

	peekFirst(), peekLast()

	removeFirstOccurrence(), removeLastOccurrence()
	Migrate to predicate-accepting method, or superate

This is amore satisfying matrix; not only does everything have an 
acceptable strategy, but some have more than one, and the user impact of 
superating a method is lower (users might just not notice), so the 
perception is that fewer methods are affected.  Still, super-bounded 
tvars are a big hammer for such a small foe.  Maybe there's an alternate 
approach that has the effect of superation but doesn't need such a big 
hammer.

Here's one possibility.  We already have a notion of partial methods.  
We could have a pair of methods

     <where ref E>
     Object[] toArray()

     <where val E>
     E[] toArray()

both of which are reasonable signatures for their restricted domains.  
Unfortunately, the natural interpretation of this pair of methods is 
that the first is a member of Collection<ref T>, and the second is a 
member of Collection<val T>, but there is *no* toArray() method that is 
a member of Collection<any T>!  This means that code that is generic in 
any-T would not see a toArray() method at all.  That's a problem (though 
not as enormous as it initially sounds, there are possible mitigating 
techniques.)

However, it is not unreasonable for the compiler to recognize this 
situation and deal.  Suppose I have some code generic in any-T:

     <any T> void foo(Collection<T> c) {
         T[] arr = c.toArray();
     }

Now, the compiler doesn't know whether c is a collection of refs or 
values, but it knows it's one or the other (ref T and val T form a 
partition of any T).  So it could (and in some cases, has to anyway) do 
type checking by parts -- it can typecheck the above assignment under 
the assumption of ref T, and do it again under the assumption of val T, 
and if both succeed (and something else, see below), accept the method 
invocation as valid.  (In this case, the ref-T fork should result in an 
unchecked warning, meaning that the merged checking also yields an 
unchecked warning.)

The "something else" part is: when doing overload resolution by parts, 
both branches must resolve to overloads that are erasure-equivalent to 
each other.  Which is true for toArray() (and for all the cases for 
which superation would work.)

Now, this is a lot of handwaving, and it doesn't even really describe 
how we think partial methods should actually work (I'd like to get rid 
of the where-val-T slices entirely, this is a separate discussion.)  But 
its a sketch of an option that achieves the positive result of 
superation without engaging the complexity of superation.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20151216/27037b17/attachment.html>