Array covariance

Thu Oct 25 19:46:58 UTC 2018

> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Jeudi 25 Octobre 2018 21:04:30
> Objet: Array covariance

> Since the Burlington meeting, Simms has been prodding me to offer an alternative
> to supporting array covariance for value arrays. John and I kicked around the
> following idea yesterday. At this point, I’d call it a half-baked sketch, but
> so far it seems promising. Why covariance at all?

> First, let’s talk about why arrays in Java are covariant in the first place. And
> the reason is, unless you have either generics or covariant arrays, you can’t
> write methods that operate on all (or almost all) array types, like
> Arrays.sort(). With covariance, we can write
> void sort(Object[], Comparator) { … }

> and we can sort any (reference) array. I wasn’t in the room, but I suspect that
> array covariance was something that was grudgingly accepted because it was the
> only way to write code that worked on arbitrary arrays.

> Now, imagine if we had (invariant) generics in Java 1.0. Then we would have
> written
> <T> void sort(T[], Comparator<T>) { … }

> This makes more sense from a user-model perspective, but what do we erase T[]
> to? Unless we also had covariant arrays, we’d not be able to erase T[] to
> Object[] , but there may be a smaller hammer we can use than covariant arrays
> for this.

> Covariance isn’t bad in itself, but the limitations of Java 1.0 arrays belie the
> problem — layout polymorphism. Covariance among arrays of Object subtypes is
> fine because they all have the same layout, but primitive arrays were left out
> from the beginning. We could extend covariance to value/primitive arrays —
> we’ve prove its possible — but there’s a cost, which is largely: we take what
> is a clean and predictable performance model for array access, and pollute it
> with the possibility of megamorphic access sites.

It's not megamorphic because you have a common code to get the component size for any kind of arrays, value type arrays or object type arrays, arraycopy already does that. 
The issue with a megamorphic method call is that you're loosing inlining. Here, the access is slower but you are not going to go dark like with a method call. 

> What else is missing about arrays?

> Just as primitives are outside of the generic type system (we can’t say
> ArrayList<int> , yet), so are arrays. There’s no way to express “any array” in
> the type system; methods like System.arraycopy are forced to use Object as a
> proxy, and then dynamically ensure that what’s passed is actually an array:
> void arraycopy(Object sourceArray, int sourceStart, Object destArray, int
> destStart, int length)

> It would be much nicer if we could say AnyArray instead of Object here; it would
> be saying what we mean, and would allow us to move some type checking from
> runtime to compilation time. (It also might give us a path to not having to
> write nine versions of Arrays.fill() .)

> Similarly, there are lots of operations on arrays that are painful, ad-hoc, and
> require VM intrinsification to perform well, like Arrays.copyOf() .

> (Plus, there’s the usual litany of array problems — no final elements, no
> volatile elements, fixed layout, etc.) Arrays are primitive

> The reason for having arrays in the language and VM is the same reason we have
> primitives — they are a compromise of OO principles to gain a practical
> performance model. There’s few good things about arrays, but one of them is
> they have simple, transparent cost models in time and space. Polluting Object[]
> with layout-polymorphism compromises the cost model.

Note that you have exactly the same issue with fields and inheritance, 
if you have 
class A<any T> { 
T field; 
} 
class B<any T> extends A<T> { 
String s; 
} 

B<String>.s and B<long>.s may be not at the same offset in memory, so some getfield opcodes are layout-polymorphic too. 

> The analogy we should be thinking of is:
>> array is-to collection as primitive is-to object
> That is, arrays are the ground types that our higher-typed programs bottom out
> in (or specialize to), not necessarily the types we want in people’s faces.
> What would arrays look like in Valhalla?

> Obviously, we’d like for value arrays to be supported at least as well as other
> arrays. Right now, without array covariance, we’re back in the same situation
> the Java 1.0 designers were — that if you want to write Arrays.sort() , you
> need covariance. Currently we have no way to extend the various nine-way method
> sets, even if we were willing to write a tenth method.

> But, in Valhalla, we don’t want 9- or 10-way method sets — we want a single
> method. We want for Arrays.fill() , for example, to be something like:
> <any T> void fill(T[] array, T element)

> and not have to have eight siblings to clean up the primitive cases.

> But, even this syntax is paying homage to the irregularity of
> array-as-primitive. I think what we’d really want is something like
> interface NativeArray<any T> {
>     int length();
>     T get(int index);
>     void set(int index, T val);
>     Class<T> componentType();
> }

> and then we’d retrofit Foo[] to implement NativeArray<Foo> . (We’d likely make
> NA a restricted interface, that is only implemented by native arrays, for now.)
> Then, our single fill method could be (in LW100):
> <any T> void fill(NativeArray<T> array, T element)

> (Impatient readers will want to point out that we could generify over the index
> parameter as well. One impossible problem at a time.)

> And similarly, arraycopy can be:
> void arraycopy(NativeArray src, int srcStart, NativeArray dest, int destStart,
> len)

> Bold claim: In such a world, there is no need for arrays to be covariant . If
> you want to operate on more than one kind of array, use generics. If you want
> covariance, say NativeArray<? extends T> . Getting there in steps

> Suppose we start in steps; let’s start with an erased interface, suitable for
> later generification:
> interface NativeArray {
>     int length();
>     Object get(int index);
>     void set(int index, Object val);
>     Class<?> componentType();
> }

> We make NA a restricted interface (reject any classes that explicitly try to
> implement it at class load time, as if it were sealed), and inject it as a
> super type into all existing array types (including primitives and values.) The
> language overloads the [] operators to bind to the get and set methods and
> continues to simulate the synthetic length field. Now, someone who wants to
> write just one version of Arrays.fill can do so:
> void fill(NativeArray a, Object element) {
>     for (int i=0; i<a.length; i++)
>         a[i] = element;
> }

> Note that this works for all arrays now — values, primitives, object, with some
> boxing. OK, progress.

> One obvious wart in the above is that we’ve taken a step back in terms of type
> safety. We’d like for NA to be generic in its element type, but we can’t yet
> use values or primitives as type parameters. Suppose we said instead:
> interface NativeArray<T> {
>     int length();
>     T get(int index);
>     void set(int index, T val);
>     Class<T> componentType();
> }

> and treated Foo[] (for reference Foo) as implementing NA<Foo> , and for values
> and primitives, treat V[] as implementing NA<V.box> . This makes our erased
> generics story work for:
> <T> fill(NativeArray<T> a, T element)

> but lays down some potholes for migration to specialized generics, as when we
> get to Valhalla, we’d like for V[] to implement Array<V> , not Array<V.box> .
> So, this is a hole to fill, but it seems a promising direction.

> The result is that we have monomorphic type profiles at aaload sites, and
> potentially polymorphic profiles at invokeinterface NativeArray.get sites —
> which is more justifiable than polluting aaload profiles. And when we get to
> full specialization, many of these polymorphic profiles may flatten out (as we
> can safely speculate on T[] at specialized NativeArray<T> sites.) Overloading

> The NA story works nicely with existing overload rules too. If we currently
> have:
> fill(int[], int)
> fill(Object[], Object)

> we can add in an overload
> <T> fill(NativeArray<T>, T)

> Existing source and binary code that wants one of the original methods will
> continue to get it, as Object[] is more specific than NativeArray<? extends
> Object> and so will be selected in preference to the NA version. So the new
> method would only be selected by value instantiations, since none of the
> existing versions are applicable.

> (Over time, we might want to remove some of the hand-specialized versions, and
> turn them into bridges for the specialized generic version, reducing us from 9
> methods to one method with 8 bridges.)

or add fill() as an instance method on NativeArray with the right signature and spread the news that people should use the instance methods instead of the static method of java.util.Arrays to get full performance. 
Only the method toString()/equals() and hashCode() have a compatibility issues on an array. 

> Translation

> The real stumbling block is: what do we do with T[] in generic APIs. Currently,
> we erase these to array-of-erasure-of-T, so usually Object[] . This is one of
> the things pushing us towards covariance. But, can we change this translation?
> What if we erased T[] to NativeArray?

> Obviously, we’d have a binary compatibility issue, that we’d have to fill in
> with bridges. And these bridges might conflict with actual overloads that take
> NativeArray :
> void m(T[] array) { … }
> void m(NativeArray array) { … }

> but, maybe since there’s no existing code that uses NA, that’s OK.

> This triggers all the issues discussed recently regarding migration — there’s a
> 2x2 matrix of { source, binary } x { client, subclass } compatibility.

> When used in argument position, passing an Object[] where a NA is expected is
> fine, since Object[] is a subtype of NA. The problem would come when T[] is
> used in return position (or as a field.) But, these are fairly rare, and
> implementations (at least in the JDK) are well behaved:
> // Stream<T>
> T[] toArray(IntFunction<T[]> generator)

> // Arrays
> <T> T[] copyOf(T[] array)

> So erasing to NA, with bridges, and performing the same generic type checks we
> do (perhaps with some extra casts) may be a viable path for managing the source
> compatibility aspects.

Class.getTypeParameters() being the black sheep ! 

> OK, so if we erase T[] to NativeArray, what does that buy us? It almost means we
> can write our Arrays.* methods with an extra overload:
> <T> void fill(NativeArray<T>, T)

> But, we don’t have generics over values yet, hrm. So this doesn’t quite get us
> to our Arrays.* methods, dang.

[...] 

> Summary

> There are lots of holes to fill in, but:

>    * We need a suitable translation target for T[] in specialized generics
>    regardless. One path is full covariance (including for primitive arrays);
>    another is to migrate the relatively few methods that truck in T[] to a
>     different translation.
>    * We are going to need some tools for signature migration no matter what. I’ve
>    outlined two under separate cover; one is minty-bridges (where a method/field
>    says “I am willing to respond to this alternate descriptor”), and the other is
>    a technique for migrating signatures so that if subclasses override the old
>    signature, we have a way to restore order. We are going to need these to get to
>     L100 regardless.
>    * We can borrow some bits from the future to provide a programming model for
>    value arrays that is not tied to covariance, and that heals the rift between
>    separate array types, which currently are completely unrelated. It also moves
>     megamorphic call sites to where they belong — invokeinterface .
> 

transforming a polymorphic-layout issue to a megamorphic call issue seems like a regression to me, not a win. 

Rémi