Vector.shiftER, Vector.shiftEL not working as expected.
John Rose
john.r.rose at oracle.com
Wed Oct 11 01:32:47 UTC 2017
On Oct 10, 2017, at 4:34 PM, Lupusoru, Razvan A <razvan.a.lupusoru at intel.com> wrote:
>
> You are correct, I misread specification for VPSLLD/VPSRLD. They indeed do shift within the lane instead of across lanes. So the updated mapping is as follows for AVX2:
> shiftL - VPSLLD
> shiftR - VPSRLD
> aShiftR - VPSRAD
> rotateEL - VPERMD*
> rotateER - VPERMD*
> shiftEL - VPERMD plus VPAND for zero masking.
> shiftER - similar as shiftEL. I may have to play with it to see best instruction sequence.
>
> * https://stackoverflow.com/questions/40805099/shuffle-avx-256-vector-elements-by-1-position-left-right-c-intrinsics <https://stackoverflow.com/questions/40805099/shuffle-avx-256-vector-elements-by-1-position-left-right-c-intrinsics>
One handy thing about the AVX conventions is that "VP" usually means "in-lane"
(unless the "P" is the start of "PERM", oops).
IIRC that P stands for "partitioned", a word which makes it clear we are
talking about a scalar operation lifted across a set of lanes. I think the
classic term "elemental" refers to this also (as in "Fortran elemental
operations"). I would be happy to use the term "elemental" for the
Java API, as long as we agree to avoid uses of the term in phrases
that use "elemental" to talk about cross-lane operations.
For cross-lane operations, "shuffle" and "permute" are well-understood
terms. Cross-lane operations are really communication operations,
data movement between two localities. They are more short-range
than loads and stores, but they are not as direct as in-lane scalar
operations.
In the above list, I think it is dangerous (as I argued in the previous
message) to overload the terms "rotate" and "shift" for both in-line
elementals and cross-lane element movement. In particular, the
"L" in "shiftEL" makes my head explode when I think about explaining
which direction is "left" to a Java programmer who is working with
arrays of scalars.
IIUC, "shiftEL" and "shiftER" above are lane movement instructions, not
elementals, and therefore should be named something like "moveAfter"
and "moveBefore", where "after" means "towards higher memory
addresses". I want the terms we pick to make it really clear, somehow,
that we are talking specifically about memory order, and not some hazy
idea of arithmetic bit-order (or some even more hazy idea of "high" or "left"
which may or may not be bit-order and/or memory order and/or the
opposite of one of them).
Also, when we get to rebracketing operations, we will need to make
it crystal clear how large lanes are split into small ones, and vice
versa. The NIO concepts of endian-ness will probably help here,
and I expect that the endian-ness will be an *explicit* boolean
parameter in the API. If you want less shuffling, then you will
not flip a coin or hope somebody else set the default right,
but rather ask the vector shape for its preferred endian-ness,
just as you ask the shape for its lane count.
Here's some personal background…
Circa Y2K, we went through a gruesomely painful multi-year exercise
of getting most of the endian-ness bugs out of HotSpot, when we ported
it to SPARC (our first BE platform). I was project lead for that, fighting
uphill every week against LE assumptions tacitly embedded in the
early versions of HotSpot, and in the minds of its authors. Code might
locally made sense talking about the "left" and "right" part of a datum,
and elsewhere about the "first" and "second" part of a structure.
Even worse, some other bits of code would talk about "high" and "low"
parts: And you had to guess whether that meant arithmetically
high (aka "left"), or at a higher address in memory (aka "second").
Something would go wrong in the port, often where the more ambiguous
terms we being used loosely, and someone would tweak it until
the bug went away; 25% of the time this means another part would
break and have to be tweaked. The end state was that, in some
cases, halves of data would be swapped for no logical reason,
other than because we were afraid to find out where the root
cause was. Well, it works now, but don't look to closely at the
conventions for 64-bit registers in the C2 JIT.
Now you can see why I'm being picky about seemingly harmless
names. I don't think the Java Vector API is fated to crawl though the
endian tarpit the way HotSpot did.
— John
More information about the panama-dev
mailing list