Array covariance

Thu Oct 25 19:04:30 UTC 2018

Since the Burlington meeting, Simms has been prodding me to offer an 
alternative to supporting array covariance for value arrays. John and I 
kicked around the following idea yesterday. At this point, I’d call it a 
half-baked sketch, but so far it seems promising.

        Why covariance at all?

First, let’s talk about why arrays in Java are covariant in the first 
place. And the reason is, unless you have either generics or covariant 
arrays, you can’t write methods that operate on all (or almost all) 
array types, like Arrays.sort(). With covariance, we can write

|void sort(Object[], Comparator) { … } |

and we can sort any (reference) array. I wasn’t in the room, but I 
suspect that array covariance was something that was grudgingly accepted 
because it was the only way to write code that worked on arbitrary arrays.

Now, imagine if we had (invariant) generics in Java 1.0. Then we would 
have written

|<T> void sort(T[], Comparator<T>) { … } |

This makes more sense from a user-model perspective, but what do we 
erase |T[]| to? Unless we also had covariant arrays, we’d not be able to 
erase |T[]| to |Object[]|, but there may be a smaller hammer we can use 
than covariant arrays for this.

Covariance isn’t bad in itself, but the limitations of Java 1.0 arrays 
belie the problem — layout polymorphism. Covariance among arrays of 
Object subtypes is fine because they all have the same layout, but 
primitive arrays were left out from the beginning. We could extend 
covariance to value/primitive arrays — we’ve prove its possible — but 
there’s a cost, which is largely: we take what is a clean and 
predictable performance model for array access, and pollute it with the 
possibility of megamorphic access sites.

        What else is missing about arrays?

Just as primitives are outside of the generic type system (we can’t say 
|ArrayList<int>|, yet), so are arrays. There’s no way to express “any 
array” in the type system; methods like |System.arraycopy| are forced to 
use |Object| as a proxy, and then dynamically ensure that what’s passed 
is actually an array:

|void arraycopy(Object sourceArray, int sourceStart, Object destArray, 
int destStart, int length) |

Similarly, there are lots of operations on arrays that are painful, 
ad-hoc, and require VM intrinsification to perform well, like 
|Arrays.copyOf()|.

(Plus, there’s the usual litany of array problems — no final elements, 
no volatile elements, fixed layout, etc.)

        Arrays are primitive

The reason for having arrays in the language and VM is the same reason 
we have primitives — they are a compromise of OO principles to gain a 
practical performance model. There’s few good things about arrays, but 
one of them is they have simple, transparent cost models in time and 
space. Polluting |Object[]| with layout-polymorphism compromises the 
cost model.

The analogy we should be thinking of is:

    array is-to collection as primitive is-to object

That is, arrays are the ground types that our higher-typed programs 
bottom out in (or specialize to), not necessarily the types we want in 
people’s faces.

        What would arrays look like in Valhalla?

Obviously, we’d like for value arrays to be supported at least as well 
as other arrays. Right now, without array covariance, we’re back in the 
same situation the Java 1.0 designers were — that if you want to write 
|Arrays.sort()|, you need covariance. Currently we have no way to extend 
the various nine-way method sets, even if we were willing to write a 
tenth method.

But, in Valhalla, we don’t want 9- or 10-way method sets — we want a 
single method. We want for |Arrays.fill()|, for example, to be something 
like:

|<any T> void fill(T[] array, T element) |

and not have to have eight siblings to clean up the primitive cases.

But, even this syntax is paying homage to the irregularity of 
array-as-primitive. I think what we’d really want is something like

|interface NativeArray<any T> { int length(); T get(int index); void 
set(int index, T val); Class<T> componentType(); } |

and then we’d retrofit |Foo[]| to implement |NativeArray<Foo>|. (We’d 
likely make NA a restricted interface, that is only implemented by 
native arrays, for now.) Then, our single fill method could be (in LW100):

|<any T> void fill(NativeArray<T> array, T element) |

(Impatient readers will want to point out that we could generify over 
the index parameter as well. One impossible problem at a time.)

And similarly, arraycopy can be:

|void arraycopy(NativeArray src, int srcStart, NativeArray dest, int 
destStart, len) |

Bold claim: In such a world, /there is no need for arrays to be 
covariant/. If you want to operate on more than one kind of array, use 
generics. If you want covariance, say |NativeArray<? extends T>|.

        Getting there in steps

Suppose we start in steps; let’s start with an erased interface, 
suitable for later generification:

|interface NativeArray { int length(); Object get(int index); void 
set(int index, Object val); Class<?> componentType(); } |

|void fill(NativeArray a, Object element) { for (int i=0; i<a.length; 
i++) a[i] = element; } |

Note that this works for all arrays now — values, primitives, object, 
with some boxing. OK, progress.

One obvious wart in the above is that we’ve taken a step back in terms 
of type safety. We’d like for NA to be generic in its element type, but 
we can’t yet use values or primitives as type parameters. Suppose we 
said instead:

|interface NativeArray<T> { int length(); T get(int index); void set(int 
index, T val); Class<T> componentType(); } |

|<T> fill(NativeArray<T> a, T element) |

        Overloading

The |NA| story works nicely with existing overload rules too. If we 
currently have:

|fill(int[], int) fill(Object[], Object) |

we can add in an overload

|<T> fill(NativeArray<T>, T) |

Existing source and binary code that wants one of the original methods 
will continue to get it, as |Object[]| is more specific than 
|NativeArray<? extends Object>| and so will be selected in preference to 
the NA version. So the new method would only be selected by value 
instantiations, since none of the existing versions are applicable.

(Over time, we might want to remove some of the hand-specialized 
versions, and turn them into bridges for the specialized generic 
version, reducing us from 9 methods to one method with 8 bridges.)

        Translation

Obviously, we’d have a binary compatibility issue, that we’d have to 
fill in with bridges. And these bridges might conflict with actual 
overloads that take |NativeArray|:

|void m(T[] array) { … } void m(NativeArray array) { … } |

but, maybe since there’s no existing code that uses NA, that’s OK.

This triggers all the issues discussed recently regarding migration — 
there’s a 2x2 matrix of { source, binary } x { client, subclass } 
compatibility.

|// Stream<T> T[] toArray(IntFunction<T[]> generator) // Arrays <T> T[] 
copyOf(T[] array) |

So erasing to NA, with bridges, and performing the same generic type 
checks we do (perhaps with some extra casts) may be a viable path for 
managing the source compatibility aspects.

OK, so if we erase |T[]| to NativeArray, what does that buy us? It 
/almost/ means we can write our |Arrays.*| methods with an extra overload:

|<T> void fill(NativeArray<T>, T) |

But, we don’t have generics over values yet, hrm. So this doesn’t quite 
get us to our Arrays.* methods, dang.

OK, let’s put this idea on hold for a bit.

        Zag before you zig

We could write an erased version:

|void fill(NativeArray, Object) |

which will catch all the flat value arrays, but we lose the compile-time 
type checking that the element type is correct.

But, we may be able to pull a move here. What if we were to allow 
any-vars (in L10) /as long as the resulting signature was invariant in 
T/? Then we could say (with erased generics!)

|<any T> void fill(NativeArray<T>, T.box element) |

because this would erase to |(NativeArray, Object)V|.In other words, a 
small down payment on specialized generics (where it supports 
compile-time type checking, but wouldn’t actually affect any bytecode 
generation), might allow us to write the generic-over-arrays code we want.

So, the compiler would type-check that the passed value is an instance 
of (or convertible to) the box type for which the passed array is an 
array, and then erase the array to NativeArray and erase the element to 
its box. This would even work for |int[]| arrays.

Then, in L100, we migrate NativeArray to be a true specializable 
interface, and we migrate the |fill| method to

|<any T> void fill(NativeArray<T>, T element) |

        Summary

There are lots of holes to fill in, but:

  * We need a suitable translation target for |T[]| in specialized
    generics regardless. One path is full covariance (including for
    primitive arrays); another is to migrate the relatively few methods
    that truck in |T[]| to a different translation.
  * We are going to need some tools for signature migration no matter
    what. I’ve outlined two under separate cover; one is minty-bridges
    (where a method/field says “I am willing to respond to this
    alternate descriptor”), and the other is a technique for migrating
    signatures so that if subclasses override the old signature, we have
    a way to restore order. We are going to need these to get to L100
    regardless.
  * We can borrow some bits from the future to provide a programming
    model for value arrays that is not tied to covariance, and that
    heals the rift between separate array types, which currently are
    completely unrelated. It also moves megamorphic call sites to where
    they belong — |invokeinterface|.