[vectorIntrinsics] RFR: 8287289: Gather/Scatter with Index Vector
John R Rose
jrose at openjdk.java.net
Wed Jun 1 17:25:58 UTC 2022
On Wed, 25 May 2022 08:35:53 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:
> When I assist engineers to apply VectorAPI in real business scenarios, I realize that Gather/Scatter APIs depend on indexMap residing in memory.
>
> When an index map is acquired by vector operations, it is represented by IntVector. To do Gather/Scatter operations, an extra integer array must be introduced and an explicit vector store is also required ahead of each Gather/Scatter. Furthermore, the redundant memory store may cause a performance penalty.
>
> Hence I submit this change for discussion. I propose to provide Gather/Scatter API supporting index vector. This patch only includes the change for Gather API.
> It passed the jtreg tests for VectorAPI.
This is a very nice experiment. I agree that the scatter/gather APIs need polishing; I think of them as proofs of concept, in their present form.
It is probably useful to look towards Panama as a source for the best types for these operations, in their final form. (Hat tip to Paul Sandoz for this idea.) If we can figure out how to do scatter-gather over memory segments, that would support Java arrays (perhaps via convenience methods) and also support more general native programming scenarios, including ones which are not limited to 31-bit indexes into the Java heap.
Therefore, I think a useful primitive looks like this:
FloatVector fromArray(
VectorSpecies<Float> species,
MemorySegment a,
IntVector indexVector);
FloatVector fromArray(
VectorSpecies<Float> species,
MemorySegment a,
LongVector indexVector);
Note the overload. With Panama now available, Java has *two* natural index types, `int` and `long`. I think both are important. I know hardware usually support both.
In particular, supporting `long` means that (with safety restrictions removed appropriately) scatter and gather can use full memory addresses, if the right "magic" memory segment is available.
As a side note, I'd also like to see uniform support of subword types for scatter and gather. This is messy, but part of the value of Java is tastefully hiding such messes.
The mess looks like this: A gather of subword types would require a word-wise gather plus lanewise shifting, followed by a contraction. It would probably also have to be done by parts, since the intermediates have lane size of 32 bits.
It's much worse for scatter. A scatter of subword types should probably just call an out-of-line assembly routine. That is, in my opinion, the VectorIntrinsics API should not refuse "trouble making" scatters and gathers, but the JIT should use runtime support routines for them.
The runtime support can use hand-crafted code to do a best-efforts scatter, and then serialize. That's probably faster than having the VI layer refuse to intrinsify, since the code shape emitted by the JIT will still be vectorized (except for the out-call to hand assembly). Over time, some or all of the hand-assembly might migrate into the JIT's emitted code. We are still adjusting that boundary for the `System.arraycopy` intrinsics, and this is a similar case I think.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/201
More information about the panama-dev
mailing list