RFR: 8318650: Optimized subword gather for x86 targets. [v7]
Jatin Bhateja
jbhateja at openjdk.org
Mon Nov 20 05:38:31 UTC 2023
On Mon, 20 Nov 2023 01:34:57 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> > > BTW, I have two questions:
> > >
> > > 1. An intrinsic which should accept the vector as index like non-subword gather is more benefical in real applications. See: [8287289: Gather/Scatter with Index Vector panama-vector#201](https://github.com/openjdk/panama-vector/pull/201) please.
> > > 2. Do you have the plan for adding such optimization for subword scatter in future?
> > >
> > > Thanks, Xiaohong
> >
> >
> > I agree, proposal looks reasonable to me, but given that x86 ISA does not have direct sub-word gather instruction hence we will always need to pass index array to inline expander. Existing interface provisions passing both index array and vector.
>
> So in the x86 backend implementation, are the indexs finally stored into a vector register? Per my understand, it looks that way. If so, maybe an alternative is 1) just making the intrinsics accept an index vector like non-subword types, and 2) calling several times such load-gather intrinsics in java implementation of the subword gather (e.g. 4 load-gather for byte gather with int indexes). That means we can move the complex operations to java side, and compiler should only cover a single load-gather operation. This may make the subword unify with non-subword gathers in compiler/intrinsics side.
Maybe it was not clear from my previous comments, for x86 and targets which do not support direct sub-word gather backends will need an index array, for other cases there are two options, a target specific lowering of gather IR / extend the inline expander to emit a specialized IR or accommodate multiple index vector loads penalty if it still wins over existing fallback implementation. In-addition due lane size incompatibility b/w gather vector lane and index lane it will pose challenges for masked gather operation.
On the other hand, since the patch already demonstrates performance gain other targets backends can also be implemented on the same lines.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16354#issuecomment-1818256345
More information about the core-libs-dev
mailing list