[vectorIntrinsics] reinterpret vs. reshape vs. cast

Mon Jun 3 20:57:48 UTC 2019

|         |
|         |
|         |
|         |
|         |
|         |
|         |
|         |

    |         |
    |         |
    |         |
    |         |
    |         |

> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On
> Behalf Of John Rose
> Sent: Thursday, May 30, 2019 10:48 AM
> To: panama-dev at openjdk.java.net
> Subject: [vectorIntrinsics] reinterpret vs. reshape vs. cast
> 
> I have been thinking about the API points that convert vectors between the
> various species.
> The existing points are a good "stab" at what are the basics, but they need
> reconsideration, especially now that we have decent implementations and
> know (more directly) what are the underlying "physics" of the Vector API.
> 
> So, I want to get rid of reshape, merging it into reinterpret.  The two
> methods are not different enough (AFAICT) to warrant parallel
> implementations.
> 
> In a private version of the branch, I have rewritten reshape as an alias of
> reinterpret, but without the extra <F> variable:
> 
>     public abstract <F> Vector<F> reinterpret(VectorSpecies<F> s);
> 
>     /** Use reinterpret. */
>     @Deprecated
>     public final Vector<E> reshape(VectorSpecies<E> s) {
>         s.check(elementType());  // verify same E
>         return reinterpret(s);
>     }
> 
> After various constant-folding operations, it still comes out the same, as a call
> to VI.reinterpret.
> 
> The "check" call is an extra runtime check which ensures that, in fact, the
> species has the element type E as claimed by the static type system.
> Because Java allows unchecked casts (and we use them) I've sprinkled such
> "check" calls wherever I think the static type system might need a little run-
> time help.
> 
> So much for reshape.  Now let's talk about the hard problem at the center of
> all of this:  The unpredictable resizing of vectors.  Most vector workloads
> choose a key vector size and use it for nearly all operations.  Hardware
> sometimes prefers to work with a single size at a time, and human beings
> prefer to reason about constant information contents, rather than having to
> re-derive a size for every vector sub-expression.
> 
> This means I think the Vector API should have a clearer policy of shape-
> invariance.
> If you start with shape S_128_BIT and do a bunch of vector operations, you
> should end up with the same shape, unless you intentionally select a
> operation that is documented to change a shape.  This is all the more
> important in portable shape agnostic code, where you don't know the size of
> the preferred shape.  Having that suddenly change to a non-preferred shape
> would be a headache.
> 
> Therefore, I want to make "shape-changing"
> methods a special category, that is clearly called out in the javadoc, and easy
> for the user to recognize in the user's code.
> 
> What would non-reshaping methods be?
> Well, anything lane-wise that preserves
> the element type (ETYPE) is obviously shape invariant.  Also shuffles, which
> are not lane-wise but keep VLENGTH and ETYPE constant, are shape
> invariant.  Operations with masks and shuffles are pretty much always shape
> invariant.
> 
> In-place reinterpret casts are shape invariant, even as they completely
> redraw lane types and lane boundaries.
> 
> Now we come to lane-wise value casts.  These are sometimes indispensible
> but will change the shape of the vector if (as is often the case) the cast
> changes the bit-size of the ETYPE.
> The implementation code for this (in the JIT) shows how disruptive this to
> implement.
> And I think it's equally disruptive to users.
> What happens if you need to convert from byte to int, but your preferred
> byte species cannot scale upward by 4x to an int species of another
> supported shape?  The API requires you to find out in advance whether the
> larger shape exists to hold the int species for the cast result.  If you can't find
> one, you need to radically recast your computation.  This is a portability anti-
> pattern.
> 
> Here's what I think is a better way, more portable, easier to implement, and
> easier for users to manage:
> 
> Introduce a cast operation which is shape invariant.  Something like this:
> 
> /** Converts this vector lane-wise to a new vector
>   * of a lane-type F, keeping underlying shape constant.
>   *  … */
> <F> Vector<F> convert(Conversion<E,F> conv, int part);
> 
> Rather than rely on Class<F> to denote the conversion implicitly the user
> selects a specific conversion operation from a suitable repertoire (TBD).  The
> conversion "knows"
> its domain and range types, and therefore also "knows"
> whether it will expand or contract the vector.  Byte to int expands, while int
> to byte contracts, and so on.
> 
> (Several ISAs refer to this phenomenon as "unpack"
> and "pack".  There's also "zip" and "unzip" in SVE which has a related
> function, and I AVX has two-vector shuffles that can do the same.  Zip could
> be useful for zero-filling or sign-filling expansion, while unzip could be useful
> for extraction.  There are also mask driven, variable motion, APL-like
> compress and expand operations on the table which are at least related and
> maybe useful as implementation tools.  There's lots more to be said about
> implementation, but we can defer that for later.)
> 
> When a conversion is expanding, in order to retain shape invariance, it is
> necessary for the conversion to produce 2 (or 4 or 8) output vectors.  We can
> try to hide this fact in the name of simplicity for the user, but it just makes
> the code shape-shifty, which (IMO) hurts the user's ability to reason about
> the rest of the code.
> 
> (If we there are intrinsic or synthetic shapes that can hold all of the bits,
> that's fine, but it's still a shape-shifting operation, which I propose we avoid
> in today's design.  When we add synthetic multi-vectors, after Valhalla, we
> can re-introduce shape-shifting code that is portable.  But we can't do that
> today if the number of shapes is a dynamic property.  We need synthetic
> multi-vector shapes to build out a shape-fluid user experience.  Can't do that
> today.)
> 
> OK, so we have a byte-to-int cast that expands from one input vector to four
> output vectors.  How does the user keep the all straight?  I think a good
> answer is to add the "part" parameter, noted above.
> 
> The part parameter is present in all lane-wise shape changing operations.
> Zero is always a valid argument.
> For a operation which expands by a factor of N, the valid range of part
> numbers is [0..N-1].
> The meaning is simple:  It selects which "part"
> of the output to return.  It is *not* a lane index (and I don't want to
> generalize in that direction).
> A user doing byte-to-int conversion will simply know that there are four parts
> to deal with, and work that into the algorithm.
> 
> (There are at least three ways to deal with parts:
> Use part 0 only, which means the input vector only uses 25% of its lanes.  You
> load 25% of the input bytes at a time and then expand part 0 to a 100%-sized
> vector of the same shape.
> Or, use a little 4-way loop over the parts, disposing of each part separately.
> Or, unroll the little loop by hand, using 4 temps for parts 0..3.  Depends on
> the
> application.)
> 
> If the part number is out of range, you get an array index exception.  The
> pseudo-code of the reference implementation can pretend that the
> conversion operation produces a tiny array of 4 output vectors, and then the
> part number is an index into that array.
> 
> If the conversion doesn't change shape, then zero is the only part number
> you can ever use.
> That could be defaulted, but it's not worth the extra overloaded API point
> IMO.
> 
> Now, if the conversion contracts, we don't strictly need a part number; we
> can use a convention which is that the output bits are placed at the beginning
> of the output vector.
> But we also want a part number here.  Again, zero means "just throw away
> everything except the beginning of the computation."  But non-zero part
> numbers mean "place the output into another part of the output vector.
> Why would we do this?
> Because conversions sometimes come in pairs, and we need to be able to
> track lane values through multiple conversions, sometimes.  To do this
> sanely, I propose that contracting operations have a part number parameter
> which "steers" the lane values in away compatible with the inverse
> conversion, with the same part number.  This means that contracting with a
> non-zero part means that the output is placed in a (zero-filled) vector at lane
> VLENGTH*part/N, where N is the contraction factor.  This means that
> immediately following with an inverse, and the same part number, will
> reproduce the original input.
> That makes these methods much easier to
> reason about.  (Maybe a contracting part number should be either negative
> or zero, to provide an extra error check.  There's no burden on the user to
> adding a minus sign to the method call, and it will better document what's
> going on.)
> 
> I think this "part" idea has legs, and provides a decent way to deal with multi-
> part results.
> 
> A beneficial side effect of keeping shape invariance as a principle is that we
> can concentrate the JIT code on using one register type at a time, and do the
> fancy footwork for operations like byte to int conversion (with size changes)
> in Java code where it belongs, rather than in JIT code.  I hope to retire the
> existing cast intrinsic, which is a highly complex 5-phase instruction selection
> problem, replacing it with a suite of smaller more flexible primitives, some to
> do shape changing without conversion and others to do conversion without
> shape changing.
> 
> — John