RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation

Fei Gao fgao at openjdk.org
Tue Jul 15 16:02:39 UTC 2025


On Mon, 14 Jul 2025 10:10:47 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Hi @Bhavana-Kilambi, @fg1417, could you please help take a look at this PR?  BTW, since the vector register size of my SVE machine is 128-bit, could you please help test the correctness on a SVE machine with larger vector size (e.g. 512-bit vector size)? Thanks a lot in advance!
>
>> Hi @XiaohongGong , thank you for doing this. As for testing, we can currently only test on 256-bit SVE machines (we no longer have any 512bit machines). We will get back to you with the results soon.
> 
> Testing on 256-bit SVE machines are fine to me. Thanks so much for your help!

@XiaohongGong Please correct me if I’m missing something or got anything wrong.

Taking `short` on `512-bit` machine as an example, these instructions would be generated:

// vgather
sve_dup vtmp, 0
sve_load_0 =>  [0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a]
sve_uzp1 with vtmp =>  [00 00 00 00 00 00 00 00 aa aa aa aa aa aa aa aa]

// vgather1
sve_dup vtmp, 0
sve_load_1 =>  [0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b]
sve_uzp1 with vtmp =>  [00 00 00 00 00 00 00 00 bb bb bb bb bb bb bb bb]
// Slice vgather1, vgather1
ext =>  [bb bb bb bb bb bb bb bb 00 00 00 00 00 00 00 00]

// Or vgather, vslice
sve_orr =>  [bb bb bb bb bb bb bb bb aa aa aa aa aa aa aa aa]

Actually, we can get the target result directly by `uzp1` the output from `sve_load_0` and `sve_load_1`, like

[0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a]
[0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b]
uzp1 => 
[bb bb bb bb bb bb bb bb aa aa aa aa aa aa aa aa]

If so, the current design of `LoadVectorGather` may not be sufficiently low-level to suit `AArch64`. WDYT?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3074255909


More information about the hotspot-compiler-dev mailing list