RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation
Fei Gao
fgao at openjdk.org
Wed Jul 16 14:53:50 UTC 2025
On Wed, 16 Jul 2025 06:44:13 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> * case-2: 2 times of gather and merge
>
> * Can be refined. But the `LoadVectorGatherNode` should be changed to accept 2 `index` vectors.
> * case-3: 4 times of gather and merge (only for byte)
>
> * Can be refined. We can implement it just like:
> step-1: `v1 = gather1 + gather2 + 2 * uzp1` // merging the first and second gather-loads
> step-2: `v2 = gather3 + gather4 + 2 * uzp1` // merging the third and fourth gather-loads
> step-3: `v3 = slice (v2, v2)`, `v = or(v1, v3)` // do the final merging
> We have to change `LoadVectorGatherNode` as well. At least making it accept 2 `index` vectors.
>
> As a summary, `LoadVectorGatherNode` will be more complex than before. But the good thing is, giving it one more `index` input is ok. I'm not sure whether this is appliable for other architectures like maybe RVV. But I can try with this change. Do you have better idea? Thanks!
@XiaohongGong thanks for your reply.
This idea generally looks good to me.
For case-2, we have
gather1 + gather2 + uzp1:
[0a 0a 0a 0a ... 0a 0a 0a 0a]
[0b 0b 0b 0b ... 0b 0b 0b 0b]
uzp1.H =>
[bb bb bb bb ... aa aa aa aa]
Can we improve `case-3` by following the pattern of `case-2`?
step-1: v1 = gather1 + gather2 + uzp1
[000a 000a 000a 000a … 000a 000a 000a 000a]
[000b 000b 000b 000b … 000b 000b 000b 000b]
uzp1.H => [0b0b 0b0b 0b0b 0b0b … 0a0a 0a0a 0a0a 0a0a]
step-2: v2 = gather3 + gather4 + uzp1
[000c 000c 000c 000c … 000c 000c 000c 000c]
[000d 000d 000d 000d … 000d 000d 000d 000d]
uzp1.H => [0d0d 0d0d 0d0d 0d0d … 0c0c 0c0c 0c0c 0c0c]
step-3: v3 = uzp1 (v1, v2)
[0b0b 0b0b 0b0b 0b0b … 0a0a 0a0a 0a0a 0a0a]
[0d0d 0d0d 0d0d 0d0d … 0c0c 0c0c 0c0c 0c0c]
uzp1.B => [dddd dddd cccc cccc … bbbb bbbb aaaa aaaa]
Then we can also consistently define the semantics of `LoadVectorGatherNode` as `gather1 + gather2 + uzp1.H `, which would make backend much cleaner. WDYT?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3078968856
More information about the hotspot-compiler-dev
mailing list