Vector.shiftER, Vector.shiftEL not working as expected.

Wed Oct 11 18:50:08 UTC 2017

> On 10 Oct 2017, at 17:57, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Oct 10, 2017, at 12:06 PM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> 
>> IIUC correctly those instructions are for logical shift operations on elements and are not a lane-wise shift. I am guessing for the latter some form of permute would be used.
> 
> I think we need to be scrupulous to distinguish between the operations
> which are inside-the-lane maps of scalar operations, and the operations
> which move data across lanes.  The former are usually more efficient
> than the latter, and it is always a muddle when terminology confuses
> the two.
> 
> The scalar in-lane operations should be named the same as the "lifted"
> lane-wise operations:  add, sub, mul, … & also shift, rotate, etc.
> 
> The cross-lane operations need their own separate style of API point.
> For example, names like "shuffle" and "permute" clearly apply to lane
> structure, and cannot be confused with lifted elemental operations.
> 
> (Unfortunately, even the word "lane-wise" is tricky; I think of "lane-wise"
> as "a lifted scalar operating within each lane", but I think you used it
> above in the opposite sense.  Not sure how to pick clear terms here.)
> 

Yes, you can clearly tell i am struggling with the terminology.

> It is a happy accident that in a few cases bitwise operations can ignore
> lane boundaries (xor, and, ior).  But in most cases confusing the two
> sets of operations will just muddle our discussions.  For this reason
> ambiguous phrases like "elemental shift" set my teeth on edge:  I
> immediately feel lost as to whether we are talking in-lane or cross-lane
> semantics.  Put another way, the term "elemental" makes it clear that we
> are talking about elements, but doesn't help us understand whether
> we are talking about the values inside the elements (in-lane work)
> or "outside the lane" motion of the elements among themselves
> (cross-lane work).  It won't always help to just use Intel mnemonics or
> conventions, since we are trying to document a portable semantics.
> 
> The load and store operations *almost* have the same happy accident
> as xor, of not caring about lane structure, *except* for the order of elements.
> For that, Java needs to impose a convention, even if it seems to conflict
> with the way the hardware documents the numbering of lanes.
> Lane zero has to mean the lowest-numbered array element, or we will
> have endless troubles with portability.
> 

+1 that needs to be the mental model developers have in their head when using the API.

> This also means that it is risky, and probably counterproductive, for the
> Java API to try to expose a notion of "left" and "right" lane directions across
> the whole vector, even if the hardware documentation talks about such
> things.  (It can because it commits to a platform-specific byte order.  But
> Java can't; or if it does, the byte order has to be customizable as in NIO.)
> 

Yes, this can cause much confusion. If we are clear about the order of elements in the vector to their association from a source then we could use the terms forwards and backwards and maybe use negative/positive argument values, rather than express direction in a method name (although explicit methods may be better for optimisation purposes?).

> Note that shape abstraction makes whole-vector operations less useful
> for many purposes.  If you don't know the size of your vector, cross-lane
> operators like shuffle are pretty hard to use.  (Not impossible, of course.)

Suggesting we might need a common set of general factory methods for creating Shuffle instances, such as rotate and reverse.

I was pondering if we should subscribe semantics to a negative Shuffle index meaning some default value is applied, thus the cross-lane wise forward/backwards methods could be specified in terms of a swizzle and a shuffle.

Rather than:

  Vector<E, S> rotateEL(int i); //Rotate elements left
  Vector<E, S> rotateER(int i); //Rotate elements right
  Vector<E, S> shiftEL(int i); //shift elements left
  Vector<E, S> shiftER(int i); //shift elements right

We could have:

  Vector<E, S> cycle/rotateElements(int laneDistance);
  Vector<E, S> copyElements(int laneDistance);

> Keeping programmers away from the hard-to-use cross-lane operations
> is another reason to give them names which cannot be confused with the
> more commonly used in-lane operations.
> 
> To summarize:  Always make it clear whether operations are within
> lanes or not (i.e., across the lanes of a whole vector); use natural
> terms for lifted in-lane operations; use a separate vocabulary for
> cross-lane (whole vector) operations.
> 
> — John
> 
> P.S.  A challenge:  Extend the API so that nearest-neighbor computations
> are supported, within some limited distance, allowing stencils to be
> programmed.   Do so without exposing vector sizes.  As in the case
> of partial vector loop cleanups, this probably requires some sort of
> vector-shape abstraction that "mixes in" contextual access to neighbors
> which may be in a nearby vector.  Perhaps this is most naturally used
> inside a Stream of two-vector context items, although we can't optimize
> that very well yet.
> 

For partial vector loop cleanups i was thinking that a method producing a mask given the array length would be useful e.g.:

// return a mask where the first length % lanes mask elements are only set to true
Mask<…> Species.tailMask(int length)

Paul.