[vector] lane-wise operations considered as constants

Thu Jun 20 00:22:05 UTC 2019

In the vector-unstable branch most of the lane-wise
operations are demoted to simple constants in the
VectorOperators class (which is almost, but not quite,
an enum or the union of several enums).

So we are removing these methods (and many like them):

FloatVector atan2(Vector<Float> v);
FloatVector atan2(float s);
FloatVector atan2(Vector<Float> v, VectorMask<Float> m);
FloatVector atan2(float s, VectorMask<Float> m);

Instead of those four we have, also in FloatVector:

FloatVector lanewise(VectorOperators.Binary op, Vector<Float> v);
FloatVector lanewise(VectorOperators.Binary op, float s);
FloatVector lanewise(VectorOperators.Binary op, Vector<Float> v, VectorMask<Float> m);
FloatVector lanewise(VectorOperators.Binary op, float s, VectorMask<Float> m);

plus a lone constant:

VectorOperators.Binary ATAN2;  // public static final in VectorOperators

This would be a clear lose if we just did it for atan2, but in fact
we are doing it for about 90 methods, so it's a net gain.

A few methods keep their previous forms (with four overloadings),
as well as a new constant.  These are ADD, SUB, MUL, and DIV.
We can call these "full service" methods, since you can code all
day long with them without resorting to the lanewise(op…) syntax.
The unary operators ABS and NEG are also retained as regular
regular methods, in both regular and masked overloadings.

A few more methods keep *some* of the previous overloadings,
supporting constant broadcasts but not masking.  These are
methods are SQRT, POW, FMA.  If you want fma(float,FloatVector)
or fma(FloatVector,float), you need to call lanewise(), but you get
fma(float,float) and fma(FloatVector,FloatVector).

Similar choices are made for non-floating bitwise types, adding
nice (but non-masking) methods for AND, OR, and NOT.

Also, every vector type retains nice (but non-masking) methods
for MIN, MAX, EQ, and LT, with optional scalar broadcast of the
second operand.  (So that's two overloadings for each of those.)

These are all arbitrary choices.  But the principles behind these
choices are:

1. The few most common operations should have real methods
overloaded with a full suite of broadcast and/or masking options.

2. A few more operations (almost as common) should have real
methods, but not with all the options.

3. Masking is the first option to drop.  Broadcast is the second
option to drop.

4. The majority of operations (including truly rare ones like SINH)
only take up one API point, a fixed constant in VectorOperators.

5. Type-specific operations (bitwise AND, FMA) should get at least
one or two real methods in their specific types (FloatVector), so that
their presence is discoverable in the context of the typed vector.

Principle 5 is why SQRT, POW, FMA (one each of a unary, binary,
and ternary) show up in FloatVector, but not equally plausible ones
like SIN or ATAN2.  The idea is that the presence of one makes it
easier to discover similar ones in VectorOperators.  In all cases,
the javadoc of the "nice" method points at the operator constant
and the lanewise() method that matches it.

So far so good.  (Or so bad, if you prefer wide, flat, diffuse APIs.)

Breaking out the operations in a quasi-enum opens up a new
possibility (as I think I noted in a previous message).  It becomes
simpler to add a few more operations if you just add an enum.
That's easier than adding a suite of four more methods, with
their own special optimizations.

So, in order to exercise this thesis, and because I found I was
missing them when I tried to code, I added the following
addition operations:

AND_NOT(a,b) := (a & ~b)  (called ANDC2 now, which I think is the wrong name)

BITWISE_BLEND := BITWISE(cbit ? bbit : abit)  (masked bitwise merge)

FIRST_NONZERO(a,b) := (BITS(a)!=0 ? a : b)  (where BITS unmasks -0.0)

The first two are only for bitwise (non-FP) types.  They are
all commonly seen in algorithms.  They correspond to basic
machine operations which may be composed from
other (already-existing) operations, but the composed
expression is (a) hard to read, (b) error prone, and (c)
hard to optimize.  This motivates a special name.

FIRST_NONZERO has some extra points of merit:

1. It provides an easy-to-use way to combine vectors without
the ceremony of forming "the obvious mask".

2. In cases where the vectors have provably disjoint support
(their non-zero lanes are disjoint), it can be strength reduced
to the unsafe operation OR_UNCHECKED (which does a bitwise
OR even if the vector holds floating point values).  This is a
favorable move on some CPUs.

3. The reduction based on FIRST_NON_ZERO is the right way to
ask the CPU to give you the first non-default lane value in a vector.
(Masked SUM is the runner-up, but it's verbose, and tricky to get right,
and hard to optimize; see (a)/(b)/(c) above.)

I added a fourth new operation which I'm less sure about, called
ZOMO (acronym for "zero or minus one").  It combines a test for
the default value (==0) with the use of "all one bits" as a handy
representation for "true".  (Why handy?  Because you can turn around
and use it as a bitwise mask.)

ZOMO(a) := (BITS(a) == 0 ? 0 : -1)  (replace any-non-zero lanes with all-ones)

This a building block for FIRST_NONZERO:  You first build
a mask of all-ones using ZOMO, and then use BITWISE_BLEND
to combine the results (but you have to sneak up on a floating
point vector to apply BITWISE_BLEND).  So:

FIRST_NONZERO(a,b) := BITWISE_BLEND(b,a,ZOMO(a))

This suggests that BITWISE_BLEND could be supported
for floating point numbers, by gating only on the sign bit:

"FLOATWISE_BLEND"(a,b,c) := (c < 0 ? b : a)
=> FLOAT_FIRST_NONZERO(a,b) := FLOATWISE_BLEND(b,a,ZOMO(a))

Top-level point:  We have some room add a few more crucial
operators.  Next-level point:  The above additions are reasonable
candidates.

— John