RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2]

Thu Oct 23 10:53:05 UTC 2025

On Thu, 23 Oct 2025 07:31:20 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/aarch64_vector.ad line 405:
>> 
>>> 403:         return true;
>>> 404:     }
>>> 405:   }
>> 
>> The name suggests that if you return false here, then it is still ok to use a predicate instruction.
>> The name suggests that if your return true, then you must use a predicate instruction.
>> 
>> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector.
>> 
>> So now I'm a bit confused.
>> 
>> I'm also wondering:
>> Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other?
>> 
>> And: can you please explain the `if (vt->isa_vectmask() == nullptr) {` check, also for the other platforms?
>
>> The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction.
>> 
>> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector.
>> 
>> So now I'm a bit confused.
> 
> The type for a vector mask is different on architectures that supports the predicate feature or not (please see my details answer below). Hence, for some vector operations, the expected input mask register/layout is different. Please note that there are two kind of layout for a mask if it is stored in a **vector register**. It might be 1) a packed layout with 8-bit element width, or 2) a unpacked layout with 8/16/32/64-bit element width according to the vector type. For the data relative mask operations like `VectorBlend`, it is 2), while for some bit relative mask operations like `VectorMaskTrueCount, VectorMaskFirstTrue, toLong, fromLong, ...`, it is 1) , because the implementation will be more efficient.
> 
> My intention is to use this function guide what the expected IR is generated for a vector mask operation. Before this patch, mid-end do the difference by just checking the type of a vector mask, as it assumes the predicate instruction will be generated for a predicate type, while the vector instructions are generated for a vector type. However, as I mentioned in this PR, some mask operations might not support native predicate instructions on predicate architectures. Instead, they are implemented with the same vector instructions like NEON. We have to do the mask layout conversion inside codegen, which is in-efficient. Generating the same IR pattern like NEON is more efficient.
> 
> So, if this function returns false, it means the input/output mask for a specified opcode requires to be saved into a vector register with the packed layout, even the architecture supports predicate feature. This is decided by the IR's implementation.
> 
>> 
>> I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other?
>> 
> 
> There are three kind of options for a mask: 1) packed vector with 8-bit element size, 2) unpacked vector with 8/16/32/64-bit element size, and 3) predicate.
> 
> 1) The packed vect...

Thanks for all the explanations! Do you think some of that could be moved to code comments? I think that would be quite helpful.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454705672