RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v2]
    Xiaohong Gong 
    xgong at openjdk.org
       
    Fri Oct 24 01:49:03 UTC 2025
    
    
  
On Thu, 23 Oct 2025 10:50:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>>> The name suggests that if you return false here, then it is still ok to use a predicate instruction. The name suggests that if your return true, then you must use a predicate instruction.
>>> 
>>> But then your comment for `Op_VectorLongToMask` and `Op_VectorMaskToLong` seems to suggest that we return false and do not want that a predicate instruction is used, but instead a packed vector.
>>> 
>>> So now I'm a bit confused.
>> 
>> The type for a vector mask is different on architectures that supports the predicate feature or not (please see my details answer below). Hence, for some vector operations, the expected input mask register/layout is different. Please note that there are two kind of layout for a mask if it is stored in a **vector register**. It might be 1) a packed layout with 8-bit element width, or 2) a unpacked layout with 8/16/32/64-bit element width according to the vector type. For the data relative mask operations like `VectorBlend`, it is 2), while for some bit relative mask operations like `VectorMaskTrueCount, VectorMaskFirstTrue, toLong, fromLong, ...`, it is 1) , because the implementation will be more efficient.
>> 
>> My intention is to use this function guide what the expected IR is generated for a vector mask operation. Before this patch, mid-end do the difference by just checking the type of a vector mask, as it assumes the predicate instruction will be generated for a predicate type, while the vector instructions are generated for a vector type. However, as I mentioned in this PR, some mask operations might not support native predicate instructions on predicate architectures. Instead, they are implemented with the same vector instructions like NEON. We have to do the mask layout conversion inside codegen, which is in-efficient. Generating the same IR pattern like NEON is more efficient.
>> 
>> So, if this function returns false, it means the input/output mask for a specified opcode requires to be saved into a vector register with the packed layout, even the architecture supports predicate feature. This is decided by the IR's implementation.
>> 
>>> 
>>> I'm also wondering: Since there are two options (mask in packed vector vs predicate), does the availability of one always imply the availability of the other? Or could some platform have only one, and another platform only the other?
>>> 
>> 
>> There are three kind of options for a mask: 1) packed vector with 8-bit element size, 2) unpacked vector with 8/16/32/64-bit element size, and 3) pred...
>
> Thanks for all the explanations! Do you think some of that could be moved to code comments? I think that would be quite helpful.
Maybe I can add brief comments before this method like before?
BTW, there are some comments added in the code generation of these two IRs. Would you mind checking the changes in `C2_Macroassembler_aarch64.cpp|hpp` and seeing whether the comment is helpful? Thanks so much!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2458327157
    
    
More information about the hotspot-compiler-dev
mailing list