RFR JDK-8223347 Integration of Vector API (Incubator): Java API, implementation, and tests

Wed May 6 03:01:50 UTC 2020

On May 1, 2020, at 1:36 PM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> 
> Hi Remi,
> 
> Thanks for the feedback. I am gonna need to go over this with John. Some comments inline.
> 
> 
>> On Apr 20, 2020, at 6:31 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>> 
>> Hi Paul,
>> about the API part (not the implementation),
>> there are location where the same concept is described with a different names which doesn't help to understand how thing work
>> - vector length <=> vector lane count
>> - vector shape <=> vector bits size
>> - element size <=> lane size
>> 
>> "size" should never be used (VectorSpecies.elementSize() by example ) because it's never clear if it's the byte size or the bits size, using byteSize and bitsSize as used several times in the API should be used everywhere.
> 
> I agree with being explicit here e.g. s/Vector.elementSize/Vector.elementBitSize and other usages.

The name “size” follows the precedent of the API point
Integer.SIZE, which is the size of an `int` in bits.  Although
bit-size and byte-size is an important thing to keep track of,
in the case of the Vector type, it lives primarily in registers,
like `int`, and only secondarily in memory (as `int[]` lives
primarily in memory).  When you need the size of a datum
in a register, you appeal to its bit count, not its byte count.

When you store in memory, then you might want its byte
size but for vectors that is an edge case.

So I don’t think we need bitSize, as a more explicit term for
size, for vectors and their lanes.  It is enough to say “size”.

> 
> I would prefer to consistently use length in methods to refer to the lane count, it matches more closely to an array and for thinking about bounds checking.

Yes, in this way (contrary to what I just said), a Vector is like
an array, and so it just has a length.  It’s a small array, and it
lives in a register, and it doesn’t support good random access,
but it has a length like an array.

> 
> How about on Vector:
> 
> /**
> * Returns number of vector lanes ({@code VLENGTH}), commonly referred to as the lane count.
> *
> * @return the number of vector lanes
> */
> public abstract int length();

Yes.  Although laneCount would be a runner-up for naming this API point.
But a Vector is enough like an array or String that giving it a length
is the least confusing choice.  With documentation that says somewhere
that a vector looks like a little array formed from its lanes, so its length
as a vector is the count of its lanes.

Also, the semi-formal symbol {@code VLENGTH} is employed throughout
the javadoc, and would have to change to something even uglier like
{@code VLANECOUNT} if we got rid of Vector::length.

> And then change VectorShape.laneCount to VectorShape.laneLength.
> 
> ?

That sounds wrong, as if it were reporting the length of a lane somewhere.

Lane-count is correct.  As the javadoc explains clearly, a vector
has a certain number of lanes.

> 
> 
>> 
>> Initially, having VectorSpecies and VectorShape confuse me, it's not the shape of the vector but only it's bits size,
>> renaming VectorShape to VectorBitsSize will help, i think. 
>> 
> 
> Shape is quite a fundamental concept specified and used in the API, representing the information capacity of a vector.
> 
> I think it would be misleading to refer to it only as VectorBitSize and unwieldy to change the specification accordingly. In this sense it serves as a useful abstraction to talk about vectors of the same bit size, and operations that change that size or not (such as shape-changing or shape-invariant).

More comments:

Shape also plays a critical role in the miscibility of vectors.
Two vectors can line up to perform a lane-wise operation if
and only if they have the same shape.  This is fundamental.
If we were to make these rules more “clever”, allowing
cross-shape lanewise operations, we’d give up on our current
performance levels, because the necessary dynamic checks
would put sand in the gears of the optimizer.

There are present and future hardware platforms, and software
abstractions, for which shape is more than just “the number of
my bits”.  The max-bits shape is a dynamically determined
shape which is not the same as any statically determined shape.
In the future, shapes may be distinguished according to whether
the involved vectors have intrinsic masks or not.  In the near
future, we might like to take the option of making synthetic
shapes, such as “a pair of 2-lane doubles of VLENGTH=4”,
which are often seen in other vector developer toolkits.

> 
>> You have the same values defined at several places in the API (sometimes with different names), usually it's aliases but it makes the API bigger that it should.
>> I think all the reference to the element type, vector byte size, vector bit size, element byte size, element byte size, vector lane count should not appear on Vector because they are accessible from Species,
>> and to go from a vector to the corresponding species, species() is enough.
>> 
> 
> But it also makes it less usable, esp. for length().

The following is not an obviously or universally correct principle:
“There should only be one way to do a given task.”  It is surely
not universally true for the Vector API.  When we point out that
“API point X is a duplicate of API point Y”, we may or may not
be willing to follow that up with “so get rid of API point X since
the programmer can reach for Y”.  If it’s our favorite utility
method X, we will defend it vigorously.  If it’s one we can’t
imagine using, we will suggest that it be voted off the island.
Only the most pure of purists will try to prove that some API
has the minimum number of points, and rest happily only
when that is the case.  When Remi objects to Vector::length
he is perhaps being such a purist, but very few programmers
like to live on such spartan foundations, in practice.

So utility methods like Vector::length, Vector::broadcast,
Vector::maskAll, Vector::toDoubleArray, etc. help make
common coding tasks easier.  The most important thing
to make easy is not reducing keystrokes (code golf) but
rather making code as clear as possible.  SKI combinators
are a minimal API for Turing computations, but even if
I loved to code with them, my code would not be reviewable
for correctness, and even I couldn’t debug it as well,
without some “convenience combinators” available.
Likewise, for vectors, it will be so common to loop over
the lanes of a vector, that adding just the tokens
“.species()” to every loop head will make those loops
harder to read and prove correct.  In short:  Code
as you like, but if you expect to review or debug it,
your code had better not have much extra “noise”
in it. This motivates convenience methods.  Most
of those in the Vector API have been placed there
because they made somebody’s job easier writing
clear Vector API code.

A second reason for the convenience methods, in
the Vector API, is discoverability.  We put add, sub,
mul, div, min, max, eq, le on Vector not because
that’s the only possible place for their unique
instantiation, but because they are friendly signs,
on the front door of the API (Vector) that it is
open for business.  When you noodle around with
a Vector in an API, you instantly see a familiar
looking set of operations.  You are not overwhelmed
by hyperbolic arc-cosine, and on the other hand
you are not left with an room empty of everything
but abstract sculptures (the architectural version
of having only higher-order functions).

When you explore Vector::add, you find that it does
about what you think it should, and how it should.
You also discover the special shed (VectorOperations)
where all the other tools are kept, and observe its
connection to those previously-scary abstract
sculptures (the lanewise and reduction higher order
methods).

If the purist succeeded in putting everything in one
place, the initial experience of working with Vector
would be very uninformative, more like playing a
role-playing exploration game than getting work
done on a “batteries included” platform.

I think the above reasons are also why IntStream::sum
is such a good idea.  Anyway, simple stuff should be easy to
read, and complicated stuff should be discoverable from
simple stuff.  That why we repeat ourselves, because we
think it will help users.

> 
> 
>> You have also the same methods defined at several places, on VectorSpecies and as a static method taking a species as parameter, 
>> all the methods of VectorSpecies that takes or return a vector/mask/shuffle should be static in vector/mask/shuffle.
>> 
> 
> It’s a little more nuanced, but I agree there is some duplication e.g. VectorSpecies.fromByteArray compared to the richer set of methods defined on [E]Vector. I think we can remove that method from VectorSpecies. Then I think we are dealing with the more “erased/reflective” and long carrier accepting methods for general operation without appealing to E.

Yes, I’m OK with removing that; it’s more like a leftover from
previous rounds.

OTOH, the static factories for methods annoy me for various
reasons.  I hope we can eventually find a better notation.
Putting methods on VectorSpecies or on Vector itself sometimes
is a viable alternative, sometimes not.  Vector::broadcast is
a static factory in disguise, which justifies its redundant
existence in terms of usability and discoverability.  But to
do a better job on more API points, we might have to wait
for specialized generics to make better moves.  I’m content
to wait…

> 
> 
>> I think Binary and Associative should be merged to one VectorOperation, given that the difference is subtle and the whole point of this API is that if the hardware do not provide a way to reduce a binary op, it can be done in Java. 
> 
> Not quite true, ARM/x86 implementations do provide intrinsic implementations (leveraging a sequence of instructions) for some reduction operations e.g. min/max. I cannot recall there current state of the ARM/x86 implementations for other reductive operations but the plan is to optimize all of them at some point. Further, hardware may evolve in the future.

It’s also useful, in an IDE, to have a static type check on the
VectorOperation token passed to a reduction operation.
You don’t want to accidentally reduce by SUB or DIV,
and Associative allows the type-checker to stop it.

> 
>> 
>> I have no idea what a Conversion.rangeType() is ?
>> 
> 
> It’s the element type of the resulting output vector,
> 
> Doc on VectorOperators.Operator:
> 
> /**
> * Reports the special return type of this operator.
> * If this operator is a boolean, returns {@code boolean.class}.
> * If this operator is a {@code Conversion},
> * returns its {@linkplain Conversion#rangeType range type}.
> *
> * Otherwise, the operator's return value always has
> * whatever type was given as an input, and this method
> * returns {@code Object.class} to denote that fact.
> * @return the special return type, or {@code Object.class} if none
> */
> Class<?> rangeType();

Vector operators are static operation symbols, some of which
have little accessor methods to describe themselves.  The range
type of a conversion is like `int` in C++ `operator int()`.
It’s spelled as a type, not with characters.

> 
> 
>> More specially:
>> - VectorSpecies.loopbound() is not used in the example ?
> 
> In the package-info?  Yes, we should update that.

+1  We should also make sure it optimizes.

> 
> 
>> - VectorSpecies.arrayType()/genericElementType() are more for implementers than users , no ? at least arrayType should be arrayElementType.
> 
> I don’t think we need to expose genericElementType in the API.

+1  I wish we had Class.wrapperType, Class.primitiveType also.

> I suspect arrayType may have been added before it was added to Class in 12 for the constable support.  I think we can remove it.

Yep.  Class.arrayType is a nice $0.25 payoff of long technical
debt.  (I wonder how much that would compound with interest
over 25 years?)

> 
>> - VectorSpecies.withLanes() => withElementType()
>> 
>> - VectorMask.check* are implementer methods ?
> 
> The intent is to provide abstractions so the developer can perform “reflective" checks on vector, mask, or shuffle.

Yes.  Also the Vector API intentionally allows some flexibility
in coding with strong typing (IntVector, etc.) and weak typing
(Vector<Integer>, etc.).  We think we can narrow that gap
with specialized generics, but for now it’s important to allow
both sides to be inhabited.  When bridging the gap, you need
casts, and there are lots of operations in the API which help
with this—make it easier to read when a transition is being
made.  In particular, the check methods allow an intermediate
value which may have lost static type information to be
re-equipped with that static type information.  Maybe they
will be less important in the future, but I think even in the
future it will be useful to be able to assert that two vectors
are of the same shape.  (Remember that vector shape is
always a constraint on correct mixing of vectors, so it’s
something the programmer needs help with sometimes.)

> 
> 
>> - VectorMask.equal(VectorMask) => equiv
>> 
> 
> Because it’s too close to the name “equals”?

Should have been eq, I guess.  Perhaps it’s a leftover from
before we had eq/ne/le/lt/gt/ge.

> 
> 
>> - VectorShuffle.check* are implementer methods ?
>> - VectorShuffle.vectorSpecies() => VectorShuffle.species()  (as in Vector and VectorMask)
> 
> Ok.

It’s hair-splitting, but a VectorShuffle, because it isn’t a vector,
doesn’t have a species.  It operates on vectors of a common species.
So, while we have Vector::species, we have (at a larger conceptual
distance) VectorShuffle::vectorSpecies and VectorMask::vectorSpecies.
The shuffles and masks know which species of vector they apply
to, but that’s “their species” indirectly.  Who was Babe Ruth’s team?
His kids didn’t have the same team, but they had their father’s team,
only indirectly.

> 
> 
>> - VectorShuffle.toVector() should return a IntVector
> 
> No, it returns a vector according to its species.  Note the constraint that shuffle source indexes are always in the range of -VLENGTH to VLENGTH - 1.

This is a sore point in the API.  A VectorShuffle isn’t a vector
in part because its lanes contain source indexes (lane pointers)
instead of vector data.  What’s the implementation type?  We
don’t know exactly, could be byte, could be int, could be some
clever union type superimposed on the vectors being operated on.

It gets worse when the vector being operated on is too long 
and too narrow (at the same time) for its lanes to be able to
encode all the necessary values.  A vector of 512 lanes each
of 8 bits is out of luck in this sense.  In fact, it needs 10 bits
(not 9) per source index in order to represent the “negative
conjugate” (invalid) source index values, which are required
for various common operations.

What to do?  I’m not sure, but the answer probably has to do
with treating VectorShuffle::toVector as a contracting and/or
expanding operation, complete with part numbers.  Or it could
be treated as producing an unstated companion type, which
must be queried reflectively.  For now we’ve papered over
the problems, but the paper won’t hold forever.

> 
> 
>> - VectorShuffle.laneIsValid(), it seems to be an operation on Vector, not VectorShuffle.
>> 
> 
> No, its specific to shuffle for masking out exceptional indexes (those < 0).

(“negative conjugates”.  A term I just made up today.  Not tired of it yet.)

See the javadoc for what they do and how they work.  In short,
we always add exactly one bit to the necessary set of bits for
representing a lane index.  The scheme works for VLENGTH
which is not a power of two.  And it gives various commonly
seen idioms for masking, merging, and error checking.

> 
> 
>> - IntVector.max, why there is no binary version that takes a mask ? It's documented but not why it's not available.
>> 
> 
> It's to reduce the number of methods. On Vector:
> 
> * This is not a full-service named operation like
> * {@link #add(Vector) add()}.  A masked version of
> * this operation is not directly available
> * but may be obtained via the masked version of
> * {@code lanewise}.  Subclasses define an additional
> * scalar-broadcast overloading of this method.
> 

We should probably document “full service”.  (Briefly!)
Joe Darcy also mentioned this.

> 
>> 
>> and i'm also not a big fans of the method that returns a long instead of a vector and only works on 64 bits values.
>> 
> 
> Can you provide more details?

The API points which work with double and long, in the weakly
typed API, use those types as “top for primitive”.  Any primitive
will fit without loss in either a long or double (sometimes both),
just as any Vector will fit in a Vector<?>, without loss.  If you
just need to broadcast zero, or get back a count, then putting it
into a long is a really cheap form of boxing, and easy to read
and prove correct.  (And the redundant API point that helps
you tells you where to find the “real stuff” that’s going on,
so you can learn about it if you need to.)

I think these API points might need a little more documentation,
to make it clear that they are doing *nothing* more than the
corresponding strongly-typed API points, plus (to make up
the primitive type differences) a cheap type conversion and/or
error check.  (Type conversion on output, error check on input,
seems to be the most useful choice here.  It’s an oddly reversed
occurrence of Postel’s Law.)  In particular, if I ask an IntVector
to sum up its lanes, I don’t expect that the long version of the
result will carry any more bits of information than the strongly
typed int version (which truncates overflow).

HTH

— John