RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation

Bhavana Kilambi bkilambi at openjdk.org
Mon Jul 14 13:36:48 UTC 2025


On Thu, 10 Jul 2025 07:04:44 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
> 
> ### Background
> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
> 
> ### Implementation
> 
> #### Challenges
> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
> 
> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
> 
> Use `ByteVector.SPECIES_512` as an example:
> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
> - It requires 4 times of vector gather-loads to finish the whole operation.
> 
> 
> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
> int[] idx = [0, 1, 2, 3, ..., 63, ...]
> 
> 4 gather-load:
> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
> 
> 
> #### Solution
> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
> 
> Here is the main changes:
> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
> - Added `VectorSliceNode` for result merging.
> - Added `VectorMaskWidenNode` for mask spliting and type conversion fo...

src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 352:

> 350:   // SVE requires vector indices for gather-load/scatter-store operations
> 351:   // on all data types.
> 352:   bool Matcher::gather_scatter_needs_vector_index(BasicType bt) {

There's already a function that tests for `UseSVE > 0` here - https://github.com/openjdk/jdk/blob/bcd86d575fe0682a234228c18b0c2e817d3816da/src/hotspot/cpu/aarch64/matcher_aarch64.hpp#L36

Can it be reused?

src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 3430:

> 3428: 
> 3429: instruct vslice_neon(vReg dst, vReg src1, vReg src2, immI index) %{
> 3430:    predicate(VM_Version::use_neon_for_vector(Matcher::vector_length_in_bytes(n)));

nit: indentation. I think there're 3 spaces here.. Same with the SVE version below.

src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 3434:

> 3432:    format %{ "vslice_neon $dst, $src1, $src2, $index" %}
> 3433:    ins_encode %{
> 3434:     uint length_in_bytes = Matcher::vector_length_in_bytes(this);

nit: indentation. two spaces..

src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 3448:

> 3446:    format %{ "vslice_sve $dst_src1, $dst_src1, $src2, $index" %}
> 3447:    ins_encode %{
> 3448:     assert(UseSVE > 0, "must be sve");

nit: indentation. two spaces..

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2204954269
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2204961131
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2204958060
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2204959807


More information about the hotspot-compiler-dev mailing list