[vectorIntrinsics] reinterpret vs. reshape vs. cast

Mon Jun 3 21:07:26 UTC 2019

Hi John,
It’s an interesting proposal and if I understand it correctly you want the computation to proceed as [1] and not as [2]. To do that we limit the shape changing apis, the only one now would be explicit reinterpret() or toArray() followed by fromArray(),

(ps: please ignore my earlier reply)

[1]:
    |         |
    |         |
    |         |
    |         |
    |         |
    |         |
    |         |
    |         |
    |         |
    |         |

[2]:
    |         |
    |         |
    |         |
  |             |
  |             |
  |             |
    |         |
    |         |
  |             |
  |             |
       |   |
       |   |
  |             |
  |             |

Thanks,
Kishor

> -----Original Message-----
> From: Kharbas, Kishor
> Sent: Monday, June 3, 2019 1:58 PM
> To: John Rose <john.r.rose at oracle.com>; panama-dev at openjdk.java.net
> Cc: Kharbas, Kishor <kishor.kharbas at intel.com>
> Subject: RE: [vectorIntrinsics] reinterpret vs. reshape vs. cast
> 
> |         |
> |         |
> |         |
> |         |
> |         |
> |         |
> |         |
> |         |
> 
>     |         |
>     |         |
>     |         |
>     |         |
>     |         |
> 
> > -----Original Message-----
> > From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On
> > Behalf Of John Rose
> > Sent: Thursday, May 30, 2019 10:48 AM
> > To: panama-dev at openjdk.java.net
> > Subject: [vectorIntrinsics] reinterpret vs. reshape vs. cast
> >
> > I have been thinking about the API points that convert vectors between
> > the various species.
> > The existing points are a good "stab" at what are the basics, but they
> > need reconsideration, especially now that we have decent
> > implementations and know (more directly) what are the underlying
> "physics" of the Vector API.
> >
> > So, I want to get rid of reshape, merging it into reinterpret.  The
> > two methods are not different enough (AFAICT) to warrant parallel
> > implementations.
> >
> > In a private version of the branch, I have rewritten reshape as an
> > alias of reinterpret, but without the extra <F> variable:
> >
> >     public abstract <F> Vector<F> reinterpret(VectorSpecies<F> s);
> >
> >     /** Use reinterpret. */
> >     @Deprecated
> >     public final Vector<E> reshape(VectorSpecies<E> s) {
> >         s.check(elementType());  // verify same E
> >         return reinterpret(s);
> >     }
> >
> > After various constant-folding operations, it still comes out the
> > same, as a call to VI.reinterpret.
> >
> > The "check" call is an extra runtime check which ensures that, in
> > fact, the species has the element type E as claimed by the static type
> system.
> > Because Java allows unchecked casts (and we use them) I've sprinkled
> > such "check" calls wherever I think the static type system might need
> > a little run- time help.
> >
> > So much for reshape.  Now let's talk about the hard problem at the
> > center of all of this:  The unpredictable resizing of vectors.  Most
> > vector workloads choose a key vector size and use it for nearly all
> > operations.  Hardware sometimes prefers to work with a single size at
> > a time, and human beings prefer to reason about constant information
> > contents, rather than having to re-derive a size for every vector sub-
> expression.
> >
> > This means I think the Vector API should have a clearer policy of
> > shape- invariance.
> > If you start with shape S_128_BIT and do a bunch of vector operations,
> > you should end up with the same shape, unless you intentionally select
> > a operation that is documented to change a shape.  This is all the
> > more important in portable shape agnostic code, where you don't know
> > the size of the preferred shape.  Having that suddenly change to a
> > non-preferred shape would be a headache.
> >
> > Therefore, I want to make "shape-changing"
> > methods a special category, that is clearly called out in the javadoc,
> > and easy for the user to recognize in the user's code.
> >
> > What would non-reshaping methods be?
> > Well, anything lane-wise that preserves the element type (ETYPE) is
> > obviously shape invariant.  Also shuffles, which are not lane-wise but
> > keep VLENGTH and ETYPE constant, are shape invariant.  Operations with
> > masks and shuffles are pretty much always shape invariant.
> >
> > In-place reinterpret casts are shape invariant, even as they
> > completely redraw lane types and lane boundaries.
> >
> > Now we come to lane-wise value casts.  These are sometimes
> > indispensible but will change the shape of the vector if (as is often
> > the case) the cast changes the bit-size of the ETYPE.
> > The implementation code for this (in the JIT) shows how disruptive
> > this to implement.
> > And I think it's equally disruptive to users.
> > What happens if you need to convert from byte to int, but your
> > preferred byte species cannot scale upward by 4x to an int species of
> > another supported shape?  The API requires you to find out in advance
> > whether the larger shape exists to hold the int species for the cast
> > result.  If you can't find one, you need to radically recast your
> > computation.  This is a portability anti- pattern.
> >
> > Here's what I think is a better way, more portable, easier to
> > implement, and easier for users to manage:
> >
> > Introduce a cast operation which is shape invariant.  Something like this:
> >
> > /** Converts this vector lane-wise to a new vector
> >   * of a lane-type F, keeping underlying shape constant.
> >   *  … */
> > <F> Vector<F> convert(Conversion<E,F> conv, int part);
> >
> > Rather than rely on Class<F> to denote the conversion implicitly the
> > user selects a specific conversion operation from a suitable
> > repertoire (TBD).  The conversion "knows"
> > its domain and range types, and therefore also "knows"
> > whether it will expand or contract the vector.  Byte to int expands,
> > while int to byte contracts, and so on.
> >
> > (Several ISAs refer to this phenomenon as "unpack"
> > and "pack".  There's also "zip" and "unzip" in SVE which has a related
> > function, and I AVX has two-vector shuffles that can do the same.  Zip
> > could be useful for zero-filling or sign-filling expansion, while
> > unzip could be useful for extraction.  There are also mask driven,
> > variable motion, APL-like compress and expand operations on the table
> > which are at least related and maybe useful as implementation tools.
> > There's lots more to be said about implementation, but we can defer
> > that for later.)
> >
> > When a conversion is expanding, in order to retain shape invariance,
> > it is necessary for the conversion to produce 2 (or 4 or 8) output
> > vectors.  We can try to hide this fact in the name of simplicity for
> > the user, but it just makes the code shape-shifty, which (IMO) hurts
> > the user's ability to reason about the rest of the code.
> >
> > (If we there are intrinsic or synthetic shapes that can hold all of
> > the bits, that's fine, but it's still a shape-shifting operation,
> > which I propose we avoid in today's design.  When we add synthetic
> > multi-vectors, after Valhalla, we can re-introduce shape-shifting code
> > that is portable.  But we can't do that today if the number of shapes
> > is a dynamic property.  We need synthetic multi-vector shapes to build
> > out a shape-fluid user experience.  Can't do that
> > today.)
> >
> > OK, so we have a byte-to-int cast that expands from one input vector
> > to four output vectors.  How does the user keep the all straight?  I
> > think a good answer is to add the "part" parameter, noted above.
> >
> > The part parameter is present in all lane-wise shape changing operations.
> > Zero is always a valid argument.
> > For a operation which expands by a factor of N, the valid range of
> > part numbers is [0..N-1].
> > The meaning is simple:  It selects which "part"
> > of the output to return.  It is *not* a lane index (and I don't want
> > to generalize in that direction).
> > A user doing byte-to-int conversion will simply know that there are
> > four parts to deal with, and work that into the algorithm.
> >
> > (There are at least three ways to deal with parts:
> > Use part 0 only, which means the input vector only uses 25% of its
> > lanes.  You load 25% of the input bytes at a time and then expand part
> > 0 to a 100%-sized vector of the same shape.
> > Or, use a little 4-way loop over the parts, disposing of each part separately.
> > Or, unroll the little loop by hand, using 4 temps for parts 0..3.
> > Depends on the
> > application.)
> >
> > If the part number is out of range, you get an array index exception.
> > The pseudo-code of the reference implementation can pretend that the
> > conversion operation produces a tiny array of 4 output vectors, and
> > then the part number is an index into that array.
> >
> > If the conversion doesn't change shape, then zero is the only part
> > number you can ever use.
> > That could be defaulted, but it's not worth the extra overloaded API
> > point IMO.
> >
> > Now, if the conversion contracts, we don't strictly need a part
> > number; we can use a convention which is that the output bits are
> > placed at the beginning of the output vector.
> > But we also want a part number here.  Again, zero means "just throw
> > away everything except the beginning of the computation."  But
> > non-zero part numbers mean "place the output into another part of the
> output vector.
> > Why would we do this?
> > Because conversions sometimes come in pairs, and we need to be able to
> > track lane values through multiple conversions, sometimes.  To do this
> > sanely, I propose that contracting operations have a part number
> > parameter which "steers" the lane values in away compatible with the
> > inverse conversion, with the same part number.  This means that
> > contracting with a non-zero part means that the output is placed in a
> > (zero-filled) vector at lane VLENGTH*part/N, where N is the
> > contraction factor.  This means that immediately following with an
> > inverse, and the same part number, will reproduce the original input.
> > That makes these methods much easier to reason about.  (Maybe a
> > contracting part number should be either negative or zero, to provide
> > an extra error check.  There's no burden on the user to adding a minus
> > sign to the method call, and it will better document what's going on.)
> >
> > I think this "part" idea has legs, and provides a decent way to deal
> > with multi- part results.
> >
> > A beneficial side effect of keeping shape invariance as a principle is
> > that we can concentrate the JIT code on using one register type at a
> > time, and do the fancy footwork for operations like byte to int
> > conversion (with size changes) in Java code where it belongs, rather
> > than in JIT code.  I hope to retire the existing cast intrinsic, which
> > is a highly complex 5-phase instruction selection problem, replacing
> > it with a suite of smaller more flexible primitives, some to do shape
> > changing without conversion and others to do conversion without shape
> changing.
> >
> > — John