RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5]

Vladimir Ivanov vlivanov at openjdk.org
Wed Sep 24 18:07:58 UTC 2025


On Wed, 24 Sep 2025 09:54:37 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> I'm not saying I know that this alternative would be better. I'm just worried about having extra IR nodes, and then optimizations are more complex / just don't work because we don't handle all nodes.
>
> Hi @eme64 , I tried my best simplifying the complex IR of `VectorConcatenateAndNarrow`. To make each IR simple enough, it can be splited to IRs with following pattern:
> 
> ![Screenshot 2025-09-24 163340](https://github.com/user-attachments/assets/b0e3471a-4991-4c9b-8c6f-7df000672a15)
> 
> Here I used a new IR named `VectorSliceNode` which corresponds to the Vector API slice operation. And it will be added in C2 by PR https://github.com/openjdk/jdk/pull/24104 in future. However, it seems it's not so easy if I want to optimize such a complex IR pattern into a single SVE instruction (`uzp1`) with match rule. In addition,  the `VectorSlice` accepts the same two inputs, causing the rule cannot be matched because its input node `VectorReinterpret` is not singled used.
> 
> Hence, I think we still need to add a new IR. I have two ideas:
> 1) Add an IR like `VectorSlice`, but it accepts one vector input. It is used to do element lanes shift. 
>    ``` 
>    e.g. src: abcd efgh   idx: 4         -> dst: efgh 0000
>    ```
>    This IR may have overlap with `VectorSlice`. So I personally do not bias toward it.
> 2) Add an IR of `VectorConcatenate`, which is used to concatenate two vectors. The element basic type is not changed, while the vector length is extended to double size.
>     ```
>     e.g. src1: abcd   src2: efgh     -> dst: efgh abcd
>     ```
> WDYT?

I started looking at the PR and it looks appealing to simplify VM intrinsics and lift more code into Java. In other words, subword gather operation can be coded as a composition of operations on int vectors. Have you considered that?

It doesn't solve the problem how to reliably match complex graph into a single instruction through. Matcher favors tree representation, but there are multiple ways to workaround it. Personally, I'd prefer to address it separately.

For now, a dedicated node to concatenate vectors look appropriate (please, note there's existing PackNode et al).
It can be either exposed through VM intrinsic or substituted for a well-known complex IR shape during IGVN (like the one you depicted). The nice thing is it'll uniformly cover all usages irrespective of whether they come from Vector API implementation itself or from user code. 

In the context of Vector API, the plan was to expose generic element rearranges/shuffles through API, but then enable various strength-reductions to optimize well-known/popular shapes. Packing multiple vectors perfectly fits that effort.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2376644298


More information about the hotspot-compiler-dev mailing list