RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v3]

Tue Oct 28 10:23:07 UTC 2025

On Tue, 28 Oct 2025 09:43:03 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Yes, the IR changes you pointed above is right. 
>> 
>> The major performance uplift comes from the existing optimization  of `VectorStoreMask (VectorLoadMask v) => v`. As you know, `VectorLoadMask` will be generated by some APIs like `VectorMask.fromArray()`. With this change, `VectorMask.fromLong()` also generates this IR. The mask conversions (V->P and P->V) between these APIs can be saved. 
>> 
>> Another performance uplift comes from the flexible vector register allocation. Before, the vector register is specified as the same for different instructions. But now, it depends on RA. In this case, it potentially breaks the un-expected  data-dependence across loop iterations.
>
> @XiaohongGong If this is only about `VectorStoreMask (VectorLoadMask v) => v`, why not solve the issue with an `Ideal` optimization? Would that be an alternative?

`VectorStoreMask (VectorLoadMask v) => v` is already existed in C2. Spliting the `VectorLongToMask` and `VectorMaskToLong` can reuse this transformation. That's why the performance can be improved. Because redundent mask conversions are optimized out in some case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468900531