RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation
Fei Gao
fgao at openjdk.org
Tue Jul 15 16:02:39 UTC 2025
On Mon, 14 Jul 2025 10:10:47 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> Hi @Bhavana-Kilambi, @fg1417, could you please help take a look at this PR? BTW, since the vector register size of my SVE machine is 128-bit, could you please help test the correctness on a SVE machine with larger vector size (e.g. 512-bit vector size)? Thanks a lot in advance!
>
>> Hi @XiaohongGong , thank you for doing this. As for testing, we can currently only test on 256-bit SVE machines (we no longer have any 512bit machines). We will get back to you with the results soon.
>
> Testing on 256-bit SVE machines are fine to me. Thanks so much for your help!
@XiaohongGong Please correct me if I’m missing something or got anything wrong.
Taking `short` on `512-bit` machine as an example, these instructions would be generated:
// vgather
sve_dup vtmp, 0
sve_load_0 => [0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a]
sve_uzp1 with vtmp => [00 00 00 00 00 00 00 00 aa aa aa aa aa aa aa aa]
// vgather1
sve_dup vtmp, 0
sve_load_1 => [0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b]
sve_uzp1 with vtmp => [00 00 00 00 00 00 00 00 bb bb bb bb bb bb bb bb]
// Slice vgather1, vgather1
ext => [bb bb bb bb bb bb bb bb 00 00 00 00 00 00 00 00]
// Or vgather, vslice
sve_orr => [bb bb bb bb bb bb bb bb aa aa aa aa aa aa aa aa]
Actually, we can get the target result directly by `uzp1` the output from `sve_load_0` and `sve_load_1`, like
[0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a]
[0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b]
uzp1 =>
[bb bb bb bb bb bb bb bb aa aa aa aa aa aa aa aa]
If so, the current design of `LoadVectorGather` may not be sufficiently low-level to suit `AArch64`. WDYT?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3074255909
More information about the hotspot-compiler-dev
mailing list