RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2]

Thu Oct 23 07:35:15 UTC 2025

On Thu, 23 Oct 2025 05:51:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction.
> 
> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector.
> 
> So now I'm a bit confused.

The type for a vector mask is different on architectures that supports the predicate feature or not (please see my details answer below). Hence, for some vector operations, the expected input mask register/layout is different. Please note that there are two kind of layout for a mask if it is stored in a **vector register**. It might be 1) a packed layout with 8-bit element width, or 2) a unpacked layout with 8/16/32/64-bit element width according to the vector type. For the data relative mask operations like `VectorBlend`, it is 2), while for some bit relative mask operations like `VectorMaskTrueCount, VectorMaskFirstTrue, toLong, fromLong, ...`, it is 1) , because the implementation will be more efficient.

My intention is to use this function guide what the expected IR is generated for a vector mask operation. Before this patch, mid-end do the difference by just checking the type of a vector mask, as it assumes the predicate instruction will be generated for a predicate type, while the vector instructions are generated for a vector type. However, as I mentioned in this PR, some mask operations might not support native predicate instructions on predicate architectures. Instead, they are implemented with the same vector instructions like NEON. We have to do the mask layout conversion inside codegen, which is in-efficient. Generating the same IR pattern like NEON is more efficient.

So, if this function returns false, it means the input/output mask for a specified opcode requires to be saved into a vector register with the packed layout, even the architecture supports predicate feature. This is decided by the IR's implementation.

> 
> I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other?
> 

There are three kind of options for a mask: 1) packed vector with 8-bit element size, 2) unpacked vector with 8/16/32/64-bit element size, and 3) predicate.

1) The packed vector with 8-bit layout is a temporary status of the mask which exists on all architectures. For example, it is the result of a `LoadVector` from a boolean array. 
2) The unpacked vector with 8/16/32/64-bit layout is a real vector mask which is unpacked from 1). The `VectorLoadMask` IR is used to implement the unpack operation. It exists on platform that does not support the predicate feature (e.g. NEON, SSE/AVX1/AVX2).
3) The predicate layout is a real vector mask which is also converted from 1). The `VectorLoadMask` IR is used to implement the conversion. It just exists on platforms that do support the predicate feature (e.g. SVE, AVX-512, RVV).

> And: can you please explain the `if (vt->isa_vectmask() == nullptr) {` check, also for the other platforms?

`if (vt->isa_vectmask() == nullptr)` means current vector mask's type is normal `TypeVect` instead of `TypeVectMask`. The type of vector mask is defined based on whether current architecture supports the predicate feature, since the register for a mask is different with vector register on such architectures. If arch supports the predicate feature (such as Arm SVE, AVX512, and RVV), then the type will be defined as `TypeVectMask`, otherwise it is the normal `TypeVect` (e.g. Arm NEON, X86 SSE/AVX1/AVX2). Please see the definition here: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L2444-L2451.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2454193088