Migrating methods in Collections

Wed Dec 23 15:52:25 UTC 2015

Some good thoughts, and some wishful thinking.

tl;dr Summary:
  - I think its a stretch to say that equals() and contains() can be 
anyfied while still accepting Object.  I think there are linguistic 
solutions so that existing Object-accepting code can continue to run 
unchanged for reference instantiations, and that the signatures can be 
generally rescued, but I think that's something different than "methods 
accepting Objects can be anyfied."
  - toArray() is indeed a problem.  I believe that the same tools for 
rescuing equals() can also probably be applied towards toArray().

> If methods accepting Object arguments can be anyfied, this removes
> some methods from your problem table: Collection.{contains(Object),
> remove(Object)}, List.{indexOf(Object), lastIndexOf(Object)} and
> Map.{containsKey(Object), containsValue(Object), remove(Object)}.
>
> I realize that there are still a bunch of unresolved issues
> in pulling this off. But ignoring them for now...

I agree the same solution should work for all these methods.  But I 
don't think we'll get to the point where the signature of equals() or 
contains() simply accepts Object.  Several major concerns:

  - Boxing.  If these methods accept Object, there is going to be some 
degree of boxing that we can't eliminate.  Whether this is "some" or "a 
lot", I can't imagine getting it down to the point where we're 
comfortable.

  - Intrusion.  Do we really want to ask authors to deal with Object in 
Complex.equals()?  I would think these methods would want to start with 
a V and go from there, not have to reason about "if its anything other 
than a boxed V, forget about it, otherwise cast and unbox."  This is not 
logic we want the user to have to write for each of these methods.

What we want, I think, is for the signature of those methods to be:
  - x(Object) // for reference instantiations
  - x(T)      // for value instantiations

That Object is the erasure of T is a powerful connection we can hang our 
hat on here.  I think there are at least three linguistic approaches to 
rescuing these methods:

  - contravariant type args (<U super T>)
  - some sort of peeling that treats x(T) and x(Object) as separate 
methods, but usually defaults/bridges one of them, so you just have to 
implement the appropriate one
  - some way of expressing a signature that means "T when a value, or 
Object when a reference"

All of these have cons, but we've got a long enough list to suggest that 
there is *a* solution here, and maybe there's a better one if we pull on 
that string some more.

So let's assume there's *some* way to write equals/contains/etc so the 
right things happen.  Your list above stands, except that there's still 
some degree of migration.

> One natural follow-on question is that if we can anyfy contains, why
> can't we do so for containsAll(Collection<?>)? And similarly for
> removeAll(Collection<?>), retainAll(Collection<?>). In other words,
> is this or some variant allowed?
>    <any T> boolean containsAll(Collection<T>)

Good thought!  Gavin and I bashed our heads against this one for a while 
about a year ago.

First, note that we only have three such methods: 
remove/retain/containsAll.  And we can "retire" two of them as being 
inferior to removeIf.  Which means there's just one method here to rescue.

If we have <U super T> vars, I think we can do the same trick.  But the 
other tricks don't work as well, because of a (sensible but frustrating) 
limitation of old generics interop -- if you have a method with generics:

     void foo(T t)
     void moo(Foo<T> f)

you can do a "raw override"

     void foo(Object t)  // acceptable raw override
     void moo(Foo f)     // acceptable raw override

and that's fine, but you can't do the same with a wildcard:

     void moo(Foo<?> f)  // not OK

So this wouldn't be source-compatible for existing subclasses of 
Collection.

However, its possible that the third variant in our candidate list above 
-- which amounts to some way of writing the dependent type "if T is 
erased, then the bound of T, otherwise T" -- might be able to get us 
here.  Or not.  If this is the worst of our problems, we have already won.

> If so, the main remaining questions surround optionality of results,
> that I'll answer separately.

Right, there's a real space of API design here.

> Doing nothing about List.remove(index) seems to be legal option.

Yes, that's a legal option (just as today, you can overload foo(T) and 
foo(String)).  Not sure if it *should* be a legal option (at the very 
least, the compiler should warn you of this, as it should also probably 
with overloads that fail to follow a meet rule.)

> No
> existing code will encounter an ambiguity that is not already present
> due to autoboxing (for List<Integer>). New code using or implementing
> List<int> will need some way to disambiguate. But I think that some
> syntax will be needed to allow anyway.  It might be nice introduce
> method removeAt to reduce need to use this syntax, but doesn't seem
> necessary?

Can you expand on what you might want for disambiguation here?

> About the two Collection toArray() methods:
>
> The no-arg version must return Object[]. I don't see how anyfying (in
> any way) can guarantee compatible results.  The <T> T[] toArray(T[]
> array) version has worse problems: most current implementations use
> reflection if the argument array is not big enough (because there is
> no syntax for "new T[n]").  I don't see offhand how to compatibly
> mangle reflective code.  Plus, the spec explicitly says that if the
> array is too large, a null is appended to elements. Null is of course
> not a legal value for non-ref types.

I think "null" can be compatibly replaced with "the default value for 
the type", which is the same as "null" for all existing code.  So that's 
not a blocker.  Reflection is harder, but its quite possible that this 
will come out in the "specialization wash".  If we can have an anyfied 
version of Arrays.copyOf -- which seems doable -- then I think that 
problem goes away too.

That said, maybe the second version of toArray() should be abandoned in 
the ref layer for compatibility only, and we should add the new total method

     T[] toArray(IntFunction<T[]> generator)

as we did with streams.  (I think we should introduce this method 
regardless, actually, for all the reasons that came up when we were 
discussing it for streams.  This is not a method we could have 
(credibly) had in 1.2, but with lambdas in the language, its kind of a 
no brainer.)

> I don't see a good alternative to leaving both forms of toArray as-is,
> and to box results -- requiring that even custom non-ref
> implementations do so. But this suggests that we should find some
> other way (possibly in a utility class) to create a val-type array of
> elements in a val-type collection.

Speaking only about *signatures* now, I think the same techniques that 
allow us to rescue contains(Object) may do the same for toArray().

  - <U super T> U[] toArray() could work;
  - peeling into separate Object[] toArray() for ref / T[] toArray() for 
val could work;
  - expressing the dependent type (T.erased ? T.bound : T) would also work.