RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5]
Xiaohong Gong
xgong at openjdk.org
Wed Sep 24 09:59:11 UTC 2025
On Tue, 9 Sep 2025 07:30:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>>> Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.
>>
>> Would using `2x Cast + Concatenate` make the IRs and match rule more complex? Mere concatenate would be something like `vector slice` in Vector API. It concatenates two vectors into one with an index denoting the merging position. And it requires the vector types are the same for two input vectors and the dst vector. Hence, if we want to separate this operation with cast and concatenate, the IRs would be (assume original type of `v1/v2` is `4-int`, the result type should be `8-short`):
>> 1) Narrow two input vectors:
>> `v1 = VectorCast(v1) (4-short); v2 = VectorCast(v2) (4-short)`.
>> The vector length are not changed while the element size is half size. Hence the vector length in bytes is half size as well.
>> 2) Resize `v1` and `v2` to double vector length. The higher bits are cleared:
>> `v1 = VectorReinterpret(v1) (8-short); v2 = VectorReinterpret(v2) (8-short)`.
>> 3) Concatenate `v1` and `v2` like slice. The position is the middle of the vector length.
>> `v = VectorSlice(v1, v2, 4) (8-short)`.
>>
>> If we want to merging these IRs in backend, would the match rule be more complex? I will take a considering.
>
> I'm not saying I know that this alternative would be better. I'm just worried about having extra IR nodes, and then optimizations are more complex / just don't work because we don't handle all nodes.
Hi @eme64 , I tried my best simplifying the complex IR of `VectorConcatenateAndNarrow`. To make each IR simple enough, it can be splited to IRs with following pattern:

Here I used a new IR named `VectorSliceNode` which corresponds to the Vector API slice operation. And it will be added in C2 by PR https://github.com/openjdk/jdk/pull/24104 in future. However, it seems it's not so easy if I want to optimize such a complex IR pattern into a single SVE instruction (`uzp1`) with match rule. In addition, the `VectorSlice` accepts the same two inputs, causing the rule cannot be matched because its input node `VectorReinterpret` is not singled used.
Hence, I think we still need to add a new IR. I have two ideas:
1) Add an IR like `VectorSlice`, but it accepts one vector input. It is used to do element lanes shift.
```
e.g. src: abcd efgh idx: 4 -> dst: efgh 0000
```
This IR may have overlap with `VectorSlice`. So I personally do not bias toward it.
2) Add an IR of `VectorConcatenate`, which is used to concatenate two vectors. The element basic type is not changed, while the vector length is extended to double size.
```
e.g. src1: abcd src2: efgh -> dst: efgh abcd
```
WDYT?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2375234209
More information about the hotspot-compiler-dev
mailing list