Array covariance
Brian Goetz
brian.goetz at oracle.com
Thu Oct 25 19:04:30 UTC 2018
Since the Burlington meeting, Simms has been prodding me to offer an
alternative to supporting array covariance for value arrays. John and I
kicked around the following idea yesterday. At this point, I’d call it a
half-baked sketch, but so far it seems promising.
Why covariance at all?
First, let’s talk about why arrays in Java are covariant in the first
place. And the reason is, unless you have either generics or covariant
arrays, you can’t write methods that operate on all (or almost all)
array types, like Arrays.sort(). With covariance, we can write
|void sort(Object[], Comparator) { … } |
and we can sort any (reference) array. I wasn’t in the room, but I
suspect that array covariance was something that was grudgingly accepted
because it was the only way to write code that worked on arbitrary arrays.
Now, imagine if we had (invariant) generics in Java 1.0. Then we would
have written
|<T> void sort(T[], Comparator<T>) { … } |
This makes more sense from a user-model perspective, but what do we
erase |T[]| to? Unless we also had covariant arrays, we’d not be able to
erase |T[]| to |Object[]|, but there may be a smaller hammer we can use
than covariant arrays for this.
Covariance isn’t bad in itself, but the limitations of Java 1.0 arrays
belie the problem — layout polymorphism. Covariance among arrays of
Object subtypes is fine because they all have the same layout, but
primitive arrays were left out from the beginning. We could extend
covariance to value/primitive arrays — we’ve prove its possible — but
there’s a cost, which is largely: we take what is a clean and
predictable performance model for array access, and pollute it with the
possibility of megamorphic access sites.
What else is missing about arrays?
Just as primitives are outside of the generic type system (we can’t say
|ArrayList<int>|, yet), so are arrays. There’s no way to express “any
array” in the type system; methods like |System.arraycopy| are forced to
use |Object| as a proxy, and then dynamically ensure that what’s passed
is actually an array:
|void arraycopy(Object sourceArray, int sourceStart, Object destArray,
int destStart, int length) |
It would be much nicer if we could say |AnyArray| instead of |Object|
here; it would be saying what we mean, and would allow us to move some
type checking from runtime to compilation time. (It also might give us a
path to not having to write nine versions of |Arrays.fill()|.)
Similarly, there are lots of operations on arrays that are painful,
ad-hoc, and require VM intrinsification to perform well, like
|Arrays.copyOf()|.
(Plus, there’s the usual litany of array problems — no final elements,
no volatile elements, fixed layout, etc.)
Arrays are primitive
The reason for having arrays in the language and VM is the same reason
we have primitives — they are a compromise of OO principles to gain a
practical performance model. There’s few good things about arrays, but
one of them is they have simple, transparent cost models in time and
space. Polluting |Object[]| with layout-polymorphism compromises the
cost model.
The analogy we should be thinking of is:
array is-to collection as primitive is-to object
That is, arrays are the ground types that our higher-typed programs
bottom out in (or specialize to), not necessarily the types we want in
people’s faces.
What would arrays look like in Valhalla?
Obviously, we’d like for value arrays to be supported at least as well
as other arrays. Right now, without array covariance, we’re back in the
same situation the Java 1.0 designers were — that if you want to write
|Arrays.sort()|, you need covariance. Currently we have no way to extend
the various nine-way method sets, even if we were willing to write a
tenth method.
But, in Valhalla, we don’t want 9- or 10-way method sets — we want a
single method. We want for |Arrays.fill()|, for example, to be something
like:
|<any T> void fill(T[] array, T element) |
and not have to have eight siblings to clean up the primitive cases.
But, even this syntax is paying homage to the irregularity of
array-as-primitive. I think what we’d really want is something like
|interface NativeArray<any T> { int length(); T get(int index); void
set(int index, T val); Class<T> componentType(); } |
and then we’d retrofit |Foo[]| to implement |NativeArray<Foo>|. (We’d
likely make NA a restricted interface, that is only implemented by
native arrays, for now.) Then, our single fill method could be (in LW100):
|<any T> void fill(NativeArray<T> array, T element) |
(Impatient readers will want to point out that we could generify over
the index parameter as well. One impossible problem at a time.)
And similarly, arraycopy can be:
|void arraycopy(NativeArray src, int srcStart, NativeArray dest, int
destStart, len) |
Bold claim: In such a world, /there is no need for arrays to be
covariant/. If you want to operate on more than one kind of array, use
generics. If you want covariance, say |NativeArray<? extends T>|.
Getting there in steps
Suppose we start in steps; let’s start with an erased interface,
suitable for later generification:
|interface NativeArray { int length(); Object get(int index); void
set(int index, Object val); Class<?> componentType(); } |
We make NA a restricted interface (reject any classes that explicitly
try to implement it at class load time, as if it were sealed), and
inject it as a super type into all existing array types (including
primitives and values.) The language overloads the |[]| operators to
bind to the |get| and |set| methods and continues to simulate the
synthetic |length| field. Now, someone who wants to write just one
version of |Arrays.fill| can do so:
|void fill(NativeArray a, Object element) { for (int i=0; i<a.length;
i++) a[i] = element; } |
Note that this works for all arrays now — values, primitives, object,
with some boxing. OK, progress.
One obvious wart in the above is that we’ve taken a step back in terms
of type safety. We’d like for NA to be generic in its element type, but
we can’t yet use values or primitives as type parameters. Suppose we
said instead:
|interface NativeArray<T> { int length(); T get(int index); void set(int
index, T val); Class<T> componentType(); } |
and treated |Foo[]| (for reference Foo) as implementing |NA<Foo>|, and
for values and primitives, treat |V[]| as implementing |NA<V.box>|. This
makes our erased generics story work for:
|<T> fill(NativeArray<T> a, T element) |
but lays down some potholes for migration to specialized generics, as
when we get to Valhalla, we’d like for |V[]| to implement |Array<V>|,
not |Array<V.box>|. So, this is a hole to fill, but it seems a promising
direction.
The result is that we have monomorphic type profiles at |aaload| sites,
and potentially polymorphic profiles at |invokeinterface
NativeArray.get| sites — which is more justifiable than polluting aaload
profiles. And when we get to full specialization, many of these
polymorphic profiles may flatten out (as we can safely speculate on
|T[]| at specialized |NativeArray<T>| sites.)
Overloading
The |NA| story works nicely with existing overload rules too. If we
currently have:
|fill(int[], int) fill(Object[], Object) |
we can add in an overload
|<T> fill(NativeArray<T>, T) |
Existing source and binary code that wants one of the original methods
will continue to get it, as |Object[]| is more specific than
|NativeArray<? extends Object>| and so will be selected in preference to
the NA version. So the new method would only be selected by value
instantiations, since none of the existing versions are applicable.
(Over time, we might want to remove some of the hand-specialized
versions, and turn them into bridges for the specialized generic
version, reducing us from 9 methods to one method with 8 bridges.)
Translation
The real stumbling block is: what do we do with |T[]| in generic APIs.
Currently, we erase these to array-of-erasure-of-T, so usually
|Object[]|. This is one of the things pushing us towards covariance.
But, can we change this translation? What if we erased |T[]| to
NativeArray?
Obviously, we’d have a binary compatibility issue, that we’d have to
fill in with bridges. And these bridges might conflict with actual
overloads that take |NativeArray|:
|void m(T[] array) { … } void m(NativeArray array) { … } |
but, maybe since there’s no existing code that uses NA, that’s OK.
This triggers all the issues discussed recently regarding migration —
there’s a 2x2 matrix of { source, binary } x { client, subclass }
compatibility.
When used in argument position, passing an |Object[]| where a NA is
expected is fine, since |Object[]| is a subtype of NA. The problem would
come when |T[]| is used in return position (or as a field.) But, these
are fairly rare, and implementations (at least in the JDK) are well behaved:
|// Stream<T> T[] toArray(IntFunction<T[]> generator) // Arrays <T> T[]
copyOf(T[] array) |
So erasing to NA, with bridges, and performing the same generic type
checks we do (perhaps with some extra casts) may be a viable path for
managing the source compatibility aspects.
OK, so if we erase |T[]| to NativeArray, what does that buy us? It
/almost/ means we can write our |Arrays.*| methods with an extra overload:
|<T> void fill(NativeArray<T>, T) |
But, we don’t have generics over values yet, hrm. So this doesn’t quite
get us to our Arrays.* methods, dang.
OK, let’s put this idea on hold for a bit.
Zag before you zig
We could write an erased version:
|void fill(NativeArray, Object) |
which will catch all the flat value arrays, but we lose the compile-time
type checking that the element type is correct.
But, we may be able to pull a move here. What if we were to allow
any-vars (in L10) /as long as the resulting signature was invariant in
T/? Then we could say (with erased generics!)
|<any T> void fill(NativeArray<T>, T.box element) |
because this would erase to |(NativeArray, Object)V|.In other words, a
small down payment on specialized generics (where it supports
compile-time type checking, but wouldn’t actually affect any bytecode
generation), might allow us to write the generic-over-arrays code we want.
So, the compiler would type-check that the passed value is an instance
of (or convertible to) the box type for which the passed array is an
array, and then erase the array to NativeArray and erase the element to
its box. This would even work for |int[]| arrays.
Then, in L100, we migrate NativeArray to be a true specializable
interface, and we migrate the |fill| method to
|<any T> void fill(NativeArray<T>, T element) |
Summary
There are lots of holes to fill in, but:
* We need a suitable translation target for |T[]| in specialized
generics regardless. One path is full covariance (including for
primitive arrays); another is to migrate the relatively few methods
that truck in |T[]| to a different translation.
* We are going to need some tools for signature migration no matter
what. I’ve outlined two under separate cover; one is minty-bridges
(where a method/field says “I am willing to respond to this
alternate descriptor”), and the other is a technique for migrating
signatures so that if subclasses override the old signature, we have
a way to restore order. We are going to need these to get to L100
regardless.
* We can borrow some bits from the future to provide a programming
model for value arrays that is not tied to covariance, and that
heals the rift between separate array types, which currently are
completely unrelated. It also moves megamorphic call sites to where
they belong — |invokeinterface|.
More information about the valhalla-spec-observers
mailing list