Migrating methods in Collections
Brian Goetz
brian.goetz at oracle.com
Wed Dec 23 15:52:25 UTC 2015
Some good thoughts, and some wishful thinking.
tl;dr Summary:
- I think its a stretch to say that equals() and contains() can be
anyfied while still accepting Object. I think there are linguistic
solutions so that existing Object-accepting code can continue to run
unchanged for reference instantiations, and that the signatures can be
generally rescued, but I think that's something different than "methods
accepting Objects can be anyfied."
- toArray() is indeed a problem. I believe that the same tools for
rescuing equals() can also probably be applied towards toArray().
> If methods accepting Object arguments can be anyfied, this removes
> some methods from your problem table: Collection.{contains(Object),
> remove(Object)}, List.{indexOf(Object), lastIndexOf(Object)} and
> Map.{containsKey(Object), containsValue(Object), remove(Object)}.
>
> I realize that there are still a bunch of unresolved issues
> in pulling this off. But ignoring them for now...
I agree the same solution should work for all these methods. But I
don't think we'll get to the point where the signature of equals() or
contains() simply accepts Object. Several major concerns:
- Boxing. If these methods accept Object, there is going to be some
degree of boxing that we can't eliminate. Whether this is "some" or "a
lot", I can't imagine getting it down to the point where we're
comfortable.
- Intrusion. Do we really want to ask authors to deal with Object in
Complex.equals()? I would think these methods would want to start with
a V and go from there, not have to reason about "if its anything other
than a boxed V, forget about it, otherwise cast and unbox." This is not
logic we want the user to have to write for each of these methods.
What we want, I think, is for the signature of those methods to be:
- x(Object) // for reference instantiations
- x(T) // for value instantiations
That Object is the erasure of T is a powerful connection we can hang our
hat on here. I think there are at least three linguistic approaches to
rescuing these methods:
- contravariant type args (<U super T>)
- some sort of peeling that treats x(T) and x(Object) as separate
methods, but usually defaults/bridges one of them, so you just have to
implement the appropriate one
- some way of expressing a signature that means "T when a value, or
Object when a reference"
All of these have cons, but we've got a long enough list to suggest that
there is *a* solution here, and maybe there's a better one if we pull on
that string some more.
So let's assume there's *some* way to write equals/contains/etc so the
right things happen. Your list above stands, except that there's still
some degree of migration.
> One natural follow-on question is that if we can anyfy contains, why
> can't we do so for containsAll(Collection<?>)? And similarly for
> removeAll(Collection<?>), retainAll(Collection<?>). In other words,
> is this or some variant allowed?
> <any T> boolean containsAll(Collection<T>)
Good thought! Gavin and I bashed our heads against this one for a while
about a year ago.
First, note that we only have three such methods:
remove/retain/containsAll. And we can "retire" two of them as being
inferior to removeIf. Which means there's just one method here to rescue.
If we have <U super T> vars, I think we can do the same trick. But the
other tricks don't work as well, because of a (sensible but frustrating)
limitation of old generics interop -- if you have a method with generics:
void foo(T t)
void moo(Foo<T> f)
you can do a "raw override"
void foo(Object t) // acceptable raw override
void moo(Foo f) // acceptable raw override
and that's fine, but you can't do the same with a wildcard:
void moo(Foo<?> f) // not OK
So this wouldn't be source-compatible for existing subclasses of
Collection.
However, its possible that the third variant in our candidate list above
-- which amounts to some way of writing the dependent type "if T is
erased, then the bound of T, otherwise T" -- might be able to get us
here. Or not. If this is the worst of our problems, we have already won.
> If so, the main remaining questions surround optionality of results,
> that I'll answer separately.
Right, there's a real space of API design here.
> Doing nothing about List.remove(index) seems to be legal option.
Yes, that's a legal option (just as today, you can overload foo(T) and
foo(String)). Not sure if it *should* be a legal option (at the very
least, the compiler should warn you of this, as it should also probably
with overloads that fail to follow a meet rule.)
> No
> existing code will encounter an ambiguity that is not already present
> due to autoboxing (for List<Integer>). New code using or implementing
> List<int> will need some way to disambiguate. But I think that some
> syntax will be needed to allow anyway. It might be nice introduce
> method removeAt to reduce need to use this syntax, but doesn't seem
> necessary?
Can you expand on what you might want for disambiguation here?
> About the two Collection toArray() methods:
>
> The no-arg version must return Object[]. I don't see how anyfying (in
> any way) can guarantee compatible results. The <T> T[] toArray(T[]
> array) version has worse problems: most current implementations use
> reflection if the argument array is not big enough (because there is
> no syntax for "new T[n]"). I don't see offhand how to compatibly
> mangle reflective code. Plus, the spec explicitly says that if the
> array is too large, a null is appended to elements. Null is of course
> not a legal value for non-ref types.
I think "null" can be compatibly replaced with "the default value for
the type", which is the same as "null" for all existing code. So that's
not a blocker. Reflection is harder, but its quite possible that this
will come out in the "specialization wash". If we can have an anyfied
version of Arrays.copyOf -- which seems doable -- then I think that
problem goes away too.
That said, maybe the second version of toArray() should be abandoned in
the ref layer for compatibility only, and we should add the new total method
T[] toArray(IntFunction<T[]> generator)
as we did with streams. (I think we should introduce this method
regardless, actually, for all the reasons that came up when we were
discussing it for streams. This is not a method we could have
(credibly) had in 1.2, but with lambdas in the language, its kind of a
no brainer.)
> I don't see a good alternative to leaving both forms of toArray as-is,
> and to box results -- requiring that even custom non-ref
> implementations do so. But this suggests that we should find some
> other way (possibly in a utility class) to create a val-type array of
> elements in a val-type collection.
Speaking only about *signatures* now, I think the same techniques that
allow us to rescue contains(Object) may do the same for toArray().
- <U super T> U[] toArray() could work;
- peeling into separate Object[] toArray() for ref / T[] toArray() for
val could work;
- expressing the dependent type (T.erased ? T.bound : T) would also work.
More information about the valhalla-spec-experts
mailing list