Migrating methods in Collections
Brian Goetz
brian.goetz at oracle.com
Wed Dec 16 17:18:27 UTC 2015
I'd like to start with the "What about Collections" discussion. While
this topic is inextricably linked to all the other topics (value types,
generic specialization, nullability, etc), to avoid getting overwhelmed,
let's start by trying to keep our focus on the Collections API and the
process of migrating it into an anyfied world. So, I'll offer an
ahead-of-time warning: there's going to be lots here that you will want
to question, and to many of those questions I'm going to answer "OK, but
let's come back to that later", because I'd like to stay focused for the
time being on the Collection-related issues. (Let's also try to create
separate threads for separate discussion topics, though I know this is
harder to remember to do.)
Most of these are already prototyped in some form in the Valhalla repos
(which at least demonstrates that they are plausible.)
BackgroundMaterial
-------------------
The last "State of the Specialization"
(http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html,
Dec 2014) outlines an early approach towards the specialization
challenges and features. My talk at JVMLS 2015
(https://www.youtube.com/watch?v=uNgAFSUXuwc) outlines the progress that
had been made between last December and August. The State of the Values
(http://cr.openjdk.java.net/~jrose/values/values-0.html) outlines the
basic model for value types. If you've not read/viewed these, that's a
good place to start.
Core Goals
----------
The key requirement that drove many of the design choices in Java 5 when
adding generics is *gradual migration compatibility*. This means it
must be possible to evolve a non-generic class to be generic in a manner
that is binary-compatible and source-compatible for both clients and
subtypes. So an existing client or subtype should continue to work
whether it is recompiled or not, and generified or not, and it should be
equally practical to generify subtypes and clients at the same time as a
library class, or later, or never.
We adopt similar goals for "anyfication" -- anyfying a class should be
binary- and source-compatible for clients and subclasses, and it should
be possible to anyfy a generic class without requiring that either its
clients or subclasses be anyfied, and whether or not they are recompiled.
Basic Model, Ultra-summarized
-----------------------------
Just as the requirement of gradual migration compatibility was a
significant design constraint in Java 5 (and which pushed us strongly
towards erasure), this requirement is going to be a significant
constraint here. Essentially, this means that parameterizations like
ArrayList<String> need to continue to be erased (otherwise they'd not be
binary compatible), but parameterizations like ArrayList<int> need to be
reified (since we cannot erase int to Object without boxing, and if
boxing were good enough, we wouldn't be bothering at all.) This pushes
us towards an interpretation of anyfied generics that is erased over
reference instantiations and reified over value instantiations.
At the same time, there is code that is valid under the assumption that
T <: Object that would not be valid if T can take on 'int'. These
include domain assumptions (e.g., assignment to null) and identity-based
assumptions (a T can be used as a monitor lock.) Accordingly, we cannot
simply reinterpret existing reference-generic code to be any-generic; we
need some indication from the user that they are opting into the
broadened genericity domain (and therefore willing to accept the
limitations of this broadened domain.) Our working model for this is to
annotate the declaration of a type variable with an "any" modifier
(e.g., "class Foo<any T>"). We sometimes use the abbreviation 'tvar' to
describe a type variable, and 'avar' to describe an any-type variable
(similarly, 'ivar' for an inference type variable.)
Conceptually, the enhanced generics model is simple:
- Non-anyfied generic code means exactly the same thing in Valhalla as
it does in Java 5-9;
- A reference instantiation of an anyfied generic class means exactly
the same thing in Valhalla as it does under erased generics;
- A class or method tvar declaration can be annotated with 'any', in
which case it can range over value types (including primitives) as well
as reference types. There are some restrictions on what you can do to
an avar-typed variable to reflect the fact that values may not be
nullable, have no monitor locks, that they don't participate in a
subtyping relationship with Object (instead, they have a boxing
conversion to Object), and that a V[] is not a subtype of Object[];
- There is a new wildcard type, Foo<any>, which is a supertype of both
value and reference instantiations of Foo.
The enhanced model embraces erasure more explicitly than the current model.
Migration Challenges
--------------------
Ideally, we could just sprinkle 'any' over our codebase, and be done --
and hopefully for many classes, this is all that will be required. But
for some classes, there may be migration challenges, which come in two
forms:
- A method signature is fundamentally incompatible with anyfication
(such as Map.get(), which returns null when the mapping is not present,
which makes no sense if V=int);
- The existing implementation appeals to assumptions of object-ness,
and needs to be adjusted.
For the purposes of this memo, I'm going to restrict myself to the first
category; we'll come back to the second category later. Here's a
more-or-less complete list of methods in Collections that may need some
sort of migration help. A blank cell indicates "same as above."
*Class**
* *Method**
* *Issues**
*
Collection
contains(Object)
Assumes all Rs are castable to Object without boxing.
remove(Object)
removeAll(Collection<?>)
Should take Collection<? extends E>. Collection<?> not a supertype of
all instantiations.
retainAll(Collection<?>)
containsAll(Collection<?>)
toArray()
Returns Object[], which is not a supertype of V[] for any value type V.
toArray(T[])
Not strictly problematic, but relies on runtime checking that E <: T
and on reflection for implementation.
List
remove(int)
Not strictly problematic, but if remove(Object) becomes remove(T), will
be a confusing overload.
indexOf(Object)
Same as contains(Object). Also, would be better as a lambda-consuming
method, now that we have lambdas.
lastIndexOf(Object)
Map
containsKey(Object)
Same as contains(Object)
containsValue(Object)
remove(Object)
put(K,V)
Returns what was there before, or null if there was no mapping. Not
strictly problematic, but doesn't project so well to non-nullable Vs.
putIfAbsent, replace, computeIfAbsent
get(K)
Uses null to signal no mapping
getOrDefault(Object, V)
Same as contains(Object)
Queue
poll(),peek() Uses null to signal no element
Deque
poll(), peek()
pollFirst(), pollLast()
peekFirst(), peekLast()
removeFirstOccurrence(), removeLastOccurrence()
Same as remove(Object)
It has often been suggested that "maybe it is time for Collections 2.0";
some would like to declare this to be the end of the road for existing
collections. However, redoing collections is only part of the battle;
the Collection types are riddled throughout the JDK and other libraries,
so we would need a mechanism for migrating all the methods that
consume/dispense collections, and if we were to build new collections,
saddling them with compatibility with old collections would be a big
stone to hang around their neck. So I think this path is less appealing
that it might first appear. However, the techniques described here
allow us to fix some of the errors of the past.
Partial Methods
---------------
Our approach is guided by the following observation: not all methods
declared in a generic class Foo<any T> need be members of all
instantiations of Foo; it is reasonable for some members to be
restricted to certain instantiations. In particular, the instantiation
Foo<all type vars instantiated with erased ref> is an important
distinguished instantiation.
If a method m() in Foo<any T> is a member of all instantiations of
Foo<T>, we say m() is a *total* method. Otherwise we call it a
*restricted* or *partial* method. For each method in an existing
generic class to be anyfied, it could be made a total method in the new
anyfied class, or it could be restricted to reference instantiations --
and either approach is fully compatible with existing clients and
subclasses. This provides us a migration path to "leave behind" certain
problematic methods (making them members only of reference
instantiations), and also to add new methods as long as there is a
default implementation at least for reference instantiations.
As a simple illustration of this approach, let's say that we want to
effectively rename the method List.remove(int) to removeAt(int).
Clearly we cannot simply take away remove(int); existing clients would
fail to link. Similarly we cannot simply add a new method removeAt(int)
without a default; then existing subclasses would fail to compile.
What we do here is (using a strawman syntax) add a new total method,
with a ref default, and demote the existing method to a partial method
on ref instantiations:
interface List<any T> {
// A new, total method
void removeAt(int i);
// demote the old method to ref-only
<where ref T> // Just a strawman syntax
void remove(int i);
// give the new method a default in terms of the old, for ref
<where ref T>
default void removeAt(int i) { remove(i); }
}
Now:
- Clients of List<ref T> can invoke either remove(int) or
removeAt(int), and both will do the same thing; this allows existing
clients to (if they want) migrate away from the old method completely,
since removeAt(int) is total, or keep using the old method, at their
choice (IDEs can also provide migration help);
- New clients of List<any T> can only use removeAt(int) -- they won't
even *see* remove(int) when they ask their IDE to auto-complete -- and
this is not a compatibility issue as there were no existing clients of
List<any T>;
- Existing (ref) subclasses of List<T> still see the same set of
abstract methods, and therefore continue to work exactly as before;
- When anyfying a subclass of List<T>, now you have to provide a new
implementation for removeAt(int), but this is OK because *you* have
decided to anyfy your class, and it is reasonable to expect some
possible code changes in this situation;
- Further, AbstractList can insulate most subclasses from this change.
This gives us at least two tactics for dealing with problematic methods:
- Migration: leave the old method in the "ref layer", and create a new
total method with a ref-default in the "any layer", as with the removeAt
example above;
- Abandonment: leave the old method in the ref layer, and do nothing
else, which may be suitable if an alternate idiom already exists (e.g.,
we could abandon removeAll because now we can express removeAll(c) as
removeIf(c::contains), without adding any new method.)
With the controlled-migration option, the primary impediment is that
many of the good names are already taken.
These two tactics get us much of the way there, but not all the way
there. I'm going to close this note with where these tactics get us,
and then open a separate note for some of the options for the remaining
cases. The following table shows some possibilities. There's plenty of
time to bikeshed on the names.
*Class**
* *Method**
* *Possible Approaches**
*
Collection
contains(Object)
Migrate to containsElement(E)
remove(Object)
Migrate to removeElement(E)
removeAll(Collection<?>)
Migrate to removeElements(Collection<? extends T>).
Alternately, abandon in favor of existing removeIf(Predicate<? super E>).
retainAll(Collection<?>)
Same
containsAll(Collection<?>)
Migrate to containsElements(Collection<? extends E>).
toArray()
Nothing good yet ...
toArray(T[])
Leave as is, or abandon in favor of new method
toArray(IntFunction<E[]>), like Stream has.
List
remove(int)
Leave as is, or migrate to removeAt(int).
indexOf(Object)
Migrate to Optional-bearing findFirst(Predicate<? super E>)
lastIndexOf(Object)
Migrate to findLast(Predicate<? super E>)
Map
containsKey(Object)
Migrate to hasKey(K)
containsValue(Object)
Migrate to hasValue(V)
remove(Object)
Migrate to removeMapping(K)
put(K,V)
Leave as is; accept that we cannot distinguish between "nothing was
there before" and "default value was there before" (as is true with null
today.)
get(K)
Migrate to one (or all) of:
Optional<V> map(K)
mapOrElse(K, V)
tryMap(K, Consumer<? super V>)
getOrDefault(Object, V)
Migrate to mapOrElse, as above
Queue
poll(), peek() Migrate to tryPoll(Consumer) *or* optional-bearing method
Deque
poll(), peek()
pollFirst(), pollLast()
peekFirst(), peekLast()
removeFirstOccurrence(), removeLastOccurrence()
Migrate to predicate-accepting method, or simply migrate to new name
Notes:
For Collection.contains() and remove(), while it may be tempting to try
and narrow the argument type from Object to T, this is likely
problematic. In the following case, we can't express something that
seems pretty natural:
Collection<Animal> animals = ...
Collection<Dog> dogs = ...
Dog d = ...
// Can't say these
for (Animal a : animals)
if (dogs.contains(a)) ...
animals.removeIf(a -> dogs.contains(a))
Its actually pretty convenient for contains/remove to accept anything
that might be a member of the collection, but Object is not the top type
we're looking for here (since values require boxing to convert to
Object.) So neither the status quo (accept Object) nor recasting to a
T-accepting method is all that great. (But see next memo for an
alternative.) Its not clear whether all the other Object-accepting
methods need the same treatment, or if migration to an E-accepting
method is acceptable there.
For removeAll/retainAll, we probably just want to abandon in favor of
the more powerful removeIf added in 8. (Kevin/Louis -- would be good to
get stats on actual usage of removeAll/retainAll.)
For Map.get(), it really sucks that the good name is taken but the
existing signature is terminally polluted with nullness (even for
non-null-supporting maps.) We can migrate to multiple new methods (most
of which have defaults in terms of the others):
Optional<V> map(K k)
V mapOrElse(K k, V defaultV)
boolean tryMap(K k, Consumer<? super V>)
With Optional as a value type, there is essentially no cost (either
footprint or invocation overhead) for using Optional over a naked
reference. So the first sig is not as problematic as it might look.
The second has an obvious default in terms of the first. The Optional
version, though, has an issue: if the map can contain null values,
there's no way to express that. However, I think its OK if this method
throws on null values -- the other three get-like methods (the two here
plus legacy get()) can all express null values. (So null-lovers don't
get to enjoy the Optional goodness. More motivation for them to give up
on their nulls!)
More information about the valhalla-spec-experts
mailing list