[vector] why 2-arrays are universal for scatter and gather

Thu Aug 23 21:51:52 UTC 2018

TL;DR: a[i][j] where a is scalar and i,j are vector is a
desirable universal addressing mode for the Vector API,
in addition to a[i] where a is a primitive array.

As is well known, Java has only one-dimensional arrays.
Java simulates higher dimensions by collecting the base
addresses of sub-arrays (rows, etc.) into one-dimensional
arrays of references to those sub-arrays.  Thus, indexing
a multi-dimensional array in Java consists of first obtaining
a sub-array, then indexing further into that sub-array.

Today's Vector API cannot express such a multi-step
indexing process for the simple reason that Vectors of
references do not yet exist.  Such vectors don't exist
because they would require the GC to trace references
through all occurrences of such vectors, including those
in hardware registers.  We simply haven't had time to
do this yet.  We might some day, but other use cases
are more urgent.  Anyway, there's no way to express
(directly) the vector equivalent of a[i][j] or a[i][j][k].

We can readily express the vector equivalent of a[i]
as long as the base address 'a' is scalar, not vector.
We can use gather and scatter instructions to load
and store from such an a[i] where i is a vector of
indexes.  I think there's a 1-1 mapping on x86,
if an addressing mode can be formed which expresses
the sum of the base address 'a' plus the array offset
(12 or 16 or some such), and if the indexes in the
vector can be scaled appropriately.

In JIT-centric terms, a single base address 'a' can
be combined into a vector of indexes with suitable
instruction selection, and without requiring reference
tracking through vectors.

What happens if the user want vectorized access
a[i][j] to a Java 2-dimensional array?  The initial
access a[i] would produce a vector of sub-array
references, but this can't be expressed.  The
obvious and useful alternative is to supply a
single vector primitive which takes *two* index
vectors, as well as a 2-D array 'a' (of primitives)
and performs both steps in one go.

The 2-D API call could go like this:

int[][] a = …;
var i = (some vector expression);
var j = (some other vector expression);
var x = i.getElement(a, i, j);

Why repeat i here?  The first occurrence
specifies the shape that both i and j must
have, while the second occurrences of i
and the sole occurrences of j supply the
actual lane data within the shape.
Alternatively, the second i could be dropped,
but it reads better this way.

(Alternatively, the leading vector could be
specified as a vector of the same shape as the
*result* vector.  That may have some technical
advantage, as well as allowing a better match-up
between the type of the 2-D array and the result.)

From the JIT's point of view, it needs to
perform two gather operations back to
back, with no GC safepoint in between.
The  first gather operation loads raw
references from a using the lane values
in i, creating a temporary vector of raw
references.  The temp vector is then
combined with the j vector to perform
vectorized indexing.  It is then used
to form a second gather operation on
the vector of effective addresses, yielding
the desired value.

The first (scalar) indexing and second
(vector) indexing operations need to do
range checking of course.  Vectorized
range check requires a gather on the
length fields of the sub-array objects
into a second temp vector register,
plus a compare between that temp
and j, plus a boolean reduce-and-test.

Note that there are three lane sizes in
play here:  The i and j vectors are ints,
as is the temp length vector.  The temp
reference vector is either ints (for compressed
oops) or longs (for regular machine words),
but it can be allocated as a vector of longs,
since in any case 64-bit arithmetic is needed
to form the final set of effective addresses.
The result vector is, in principle, a vector
of anything from byte to double, but in
practice will have 32- or 64-bit lanes.

All of this seems doable, but it leads to
the question of whether it's just an incremental
benefit, to be followed by ever-increasing labor
adding API points for 3-D and higher rank arrays.

The answer is no; it stops at 2-D.  The reason
for this ties back to the original reason we cannot
do as Java does and decompose a 2-D array
references into two 1-D references:  We cannot
denote the intermediate vector of sub-array
references.

A Java programmer who needs vectorized access
to a 3-D (or higher) array has an option to reduce
the access to a 2-D form, by flattening all but the
last two dimensions of the application array.

For example:

int[][][] a3 = (some voxels in Java form);
int[][] a3viewAs2 = new int[a3.length * a3[0].length][];
… copy subarray refs from a3 t oa3viewAs2 …
var x = i.getElement(a3viewAs2, computeIJFlat(i, j), k);

Here the a3 is assumed to be a rectangular array.
More irregular shapes are easy to handle, although
they require more complex logic to fold together the
leading indexes, where computeIJFlat is a simple
linear function.

The point we have got to is that 2-D scatter/gather
is a universal way to handle vectorized access to
any number of Java arrays simultaneously, while
1-D scatter/gather is inherently limited to a single
Java array.  A programmer who can use only 1-D
scatter/gather must copy all the data into one array,
with all the disadvantages that includes.  A programmer
who can use 2-D scatter/gather can keep the data
in whatever collection of arrays is useful, and
can perform any access to any or all of them,
as long as a flattened index of sub-arrays is
created.  It's already available, in the normal
case of Java 2-D arrays, but can easily be
created (without data copying) in other cases.

So 2-D scatter/gather is far more general than
1-D, and not badly limited compared to higher ranks.

— John