RFR: 8351623: VectorAPI: Add SVE implementation of subword gather load operation [v6]

Xiaohong Gong xgong at openjdk.org
Mon Oct 20 02:49:14 UTC 2025


On Thu, 16 Oct 2025 03:11:27 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> > I suspect it's likely more complex overall adding a slice operation to mask, that is really only needed for a specific case. (A more general operation would be compress/expand of the mask bits, but i don't believe there are hardware instructions for such operations on mask registers.)
> 
> Yes, I agree with you. Personally, I’d prefer not to introduce such APIs for a vector mask.
> 

Hi @PaulSandoz , how about we change it to a private method for mask, and implement it with `Vector.slice()` as before? My only concern is the performance of this method, that we have to change the mask to vector and change it back after slice.

>
> > In my view adding a part parameter is a compromise and seems less complex that requiring N index vectors, and it fits with a general pattern we have around parts of the vector. It moves the specialized operation requirements on the mask into the area where it is needed rather than trying to generalize in a manner that i don't think is appropriate in the mask API.
> 
> Yeah, it can sound reasonable that an API can finish a simple task and then choose to move the results to different part of a vector based on an offset. Consider `loadWithMap` is used as a VM interface, we have to add checks for the passed `origin` against the vector length. Besides, we have to support the same cross-lane shift for other vector types like int/long/double. I will prepare a prototype for this. Thanks for your inputs @PaulSandoz .

The `origin` passed to hotspot compiler is required to be a constant, or the operation for the cross-lane shift will be much more complex.  Once the passed `origin` is not a constant, the whole gather-load API intrinsifaction will fail and fall-back to java, which is a risk to the performance. If accepting it as a variable in compiler, the implementation is just the same with  `Vector.unslice()/unslice()`, which calls the vector rearrange and blend API. Hence, I'd like not move such operation part to compiler. WDYT @PaulSandoz ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3420337535


More information about the hotspot-compiler-dev mailing list