RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v5]

Thu Sep 25 22:04:08 UTC 2025

On Thu, 25 Sep 2025 21:39:56 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> I started looking at the PR and it looks appealing to simplify VM intrinsics and lift more code into Java. In other words, subword gather operation can be coded as a composition of operations on int vectors. Have you considered that?
>> 
>> Thanks so much for looking at this PR! Yes, personally I think we can move these op generation to Java-level for subword gather operation. And I also considered this when I started working at this task. However, this may break current backend implementation for other architectures like X86. I'm not sure whether moving to Java will be also friendly for non-SVE arches. Per my understanding, subword gather depends much more on the backend solution.
>> 
>>  >For now, a dedicated node to concatenate vectors look appropriate (please, note there's existing PackNode et al).
>> It can be either exposed through VM intrinsic or substituted for a well-known complex IR shape during IGVN (like the one you depicted). The nice thing is it'll uniformly cover all usages irrespective of whether they come from Vector API implementation itself or from user code.
>>>
>>>In the context of Vector API, the plan was to expose generic element rearranges/shuffles through API, but then enable various strength-reductions to optimize well-known/popular shapes. Packing multiple vectors perfectly fits that effort.
>> 
>> Thanks for your inputs on the IR choice. I agree with you about adding such a vector concatenate node in C2. And if we decide to move the complex implementation to Java-level, we'd better also add such an API for vector concatenate, right?
>
>> if we decide to move the complex implementation to Java-level, we'd better also add such an API for vector concatenate, right?
> 
> There's already generic shuffle operation present (rearrange). But there're precedents when more specific operations became part of the API for convenience reasons (e.g., slice/unslice). So, a dedicated operation for vector concatenation may be well-justified.

> However, this may break current backend implementation for other architectures like X86. I'm not sure whether moving to Java will be also friendly for non-SVE arches. Per my understanding, subword gather depends much more on the backend solution.

IMO that's a clear sign that current abstraction is way too ad-hoc and platform-specific. x86 ISA lacks native support, so the operation is emulated with hand-written assembly.  If there's a less performant implementation, but which relies on a uniform cross-platform VM interface, it'll be a clear winner.

The PR, as it is now, introduces a new IR representation which complicates things even more. Instead, I'd encourage you to work on a uniform API even if x86 won't be immediately migrated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2380418706