RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation

Thu Jul 17 09:04:49 UTC 2025

On Thu, 17 Jul 2025 02:41:20 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> Thanks! Regarding to the definitation of `LoadVectorGatherNode`, we'd better keep the vector type as it is for byte and short vectors. The SVE vector load gather instruction needs the type information. Additionally, the vector layout of the result should be matched with the vector type, right? We can handle this easily with pure backend implementation. But it seems not easy in mid-end IR level. BTW, `uzp1` is SVE specific instruction, we'd better define a common IR for that, which is also useful for other platforms that want to support subword gather API, right?

That makes sense to me. Thanks for your explanation!

> Maybe I can define the vector type of `LoadVectorGatherNode` as int vector type for subword types. An additional flag is necessary to denote whether it is a byte or short loading. It only finishes the gather operation (without any truncating). And define an IR like `VectorConcateNode` to merge the gather results. For cases that only one time of gather is needed, we can just return a type cast node like `VectorCastI2X`. Seems this will make the IR more common and code more clean.
> 
> The implementation would like:
> 
> * case-1 one gather:
>   
>   * `gather (bt: int)` + `cast (bt: byte|short)`
> * case-2 two gathers:
>   
>   * step-1: `gather1 (bt: int)` + `gather2 (bt: int)` + `concate(gather1, gather2) (bt: short)`
>   * step-2:  `cast (bt: byte)`     //  just for byte vectors
> * case-3 four gathers:
>   
>   * step-1: `gather1 (bt: int)` + `gather2 (bt: int)` + `concate(gather1, gather2) (bt: short)`
>   * step-2: `gather3 (bt: int)` + `gather4 (bt: int)` + `concate(gather3, gather3) (bt: short)`
>   * step-3: `concate (bt: byte)`
> 
> Or more commonly:
> 
> * case-1 one gather:
>   
>   * `gather (bt: int)` + `cast (bt: byte|short)`
> * case-2 two gathers:
>   
>   * step-1: `gather1 (bt: int)` + `gather2 (bt: int)` + `concate(gather1, gather2) (bt: byte|short)`
> * case-3 four gathers:
>   
>   * step-1: `gather1 (bt: int)` + `gather2 (bt: int)` + `gather3 (bt: int)` + `gather4 (bt: int)`
>   * step-2: `concate(gather1, gather2, gather3, gather4) (bt: byte|short)`
> 
> From the IR level, which one do you think is better?

I like this idea! The first one looks better, in which `concate` would provide lower-level and more fine-grained semantics, allowing us to define fewer IR node types while supporting more scenarios.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3083240544