RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation

Xiaohong Gong xgong at openjdk.org
Thu Jul 17 01:23:51 UTC 2025


On Wed, 16 Jul 2025 14:49:19 GMT, Fei Gao <fgao at openjdk.org> wrote:

> > * case-2: 2 times of gather and merge
> >   
> >   * Can be refined. But the `LoadVectorGatherNode` should be changed to accept 2 `index` vectors.
> > * case-3: 4 times of gather and merge (only for byte)
> >   
> >   * Can be refined. We can implement it just like:
> >     step-1:  `v1 = gather1 + gather2 + 2 * uzp1`                        // merging the first and second gather-loads
> >     step-2:  `v2 = gather3 + gather4 + 2 * uzp1`                        // merging the third and fourth gather-loads
> >     step-3:  `v3 = slice (v2, v2)`,   `v = or(v1, v3)`                     // do the final merging
> >     We have to change `LoadVectorGatherNode` as well. At least making it accept 2 `index` vectors.
> > 
> > As a summary, `LoadVectorGatherNode` will be more complex than before. But the good thing is, giving it one more `index` input is ok. I'm not sure whether this is appliable for other architectures like maybe RVV. But I can try with this change. Do you have better idea? Thanks!
> 
> @XiaohongGong thanks for your reply.
> 
> This idea generally looks good to me.
> 
> For case-2, we have
> 
> ```
> gather1 + gather2 + uzp1:
> [0a 0a 0a 0a 0a 0a 0a 0a]
> [0b 0b 0b 0b 0b 0b 0b 0b]
> uzp1.H  => 
> [bb bb bb bb aa aa aa aa]
> ```
> 
> Can we improve `case-3` by following the pattern of `case-2`?
> 
> ```
> step-1:  v1 = gather1 + gather2 + uzp1 
> [000a 000a 000a 000a 000a 000a 000a 000a]
> [000b 000b 000b 000b 000b 000b 000b 000b]
> uzp1.H => [0b0b 0b0b 0b0b 0b0b 0a0a 0a0a 0a0a 0a0a]
> 
> step-2:  v2 = gather3 + gather4 + uzp1 
> [000c 000c 000c 000c 000c 000c 000c 000c]
> [000d 000d 000d 000d 000d 000d 000d 000d]
> uzp1.H => [0d0d 0d0d 0d0d 0d0d 0c0c 0c0c 0c0c 0c0c]
> 
> step-3:  v3 = uzp1 (v1, v2)
> [0b0b 0b0b 0b0b 0b0b 0a0a 0a0a 0a0a 0a0a]
> [0d0d 0d0d 0d0d 0d0d 0c0c 0c0c 0c0c 0c0c]
> uzp1.B => [dddd dddd cccc cccc bbbb bbbb aaaa aaaa]
> ```
> 
> Then we can also consistently define the semantics of `LoadVectorGatherNode` as `gather1 + gather2 + uzp1.H `, which would make backend much cleaner. WDYT?

Thanks! We can write a macro-assembler helper for that. Regarding to the definitation of `LoadVectorGatherNode`, we'd better keep the vector type as it is for byte and short vectors. The SVE vector load gather instruction needs the type information. Additionally, the vector layout of the result should be matched with the vector type, right?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3082052940


More information about the hotspot-compiler-dev mailing list