RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v13]

Thu Jul 10 09:55:44 UTC 2025

On Thu, 10 Jul 2025 03:15:24 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2919:
>> 
>>> 2917:       ins(tmp, D, src2, 1, 0);
>>> 2918:       tbl(dst, size1, tmp, 1, dst);
>>> 2919:     }
>> 
>> Is it better than we wrap this part as a help function, because the code is much the same with line2885-2898?
>
> These two functions can be refined more clearly. Following is my version:
> 
> void C2_MacroAssembler::select_from_two_vectors_neon(FloatRegister dst, FloatRegister src1,
>                                                      FloatRegister src2, FloatRegister index,
>                                                      FloatRegister tmp, unsigned length_in_bytes) {
>   assert_different_registers(dst, src1, src2, tmp);
>   SIMD_Arrangement size = length_in_bytes == 16 ? T16B : T8B;
> 
>   if (length_in_bytes == 16) {
>     assert(UseSVE <= 1, "sve must be <= 1");
>     // If the vector length is 16B, then use the Neon "tbl" instruction with two vector table
>     tbl(dst, size, src1, 2, index);
>   } else { // vector length == 8
>     assert(UseSVE == 0, "must be Neon only");
>     // We need to fit both the source vectors (src1, src2) in a 128-bit register because the
>     // Neon "tbl" instruction supports only looking up 16B vectors. We then use the Neon "tbl"
>     // instruction with one vector lookup
>     ins(tmp, D, src1, 0, 0);
>     ins(tmp, D, src2, 1, 0);
>     tbl(dst, size, tmp, 1, index);
>   }
> }
> 
> void C2_MacroAssembler::select_from_two_vectors_sve(FloatRegister dst, FloatRegister src1,
>                                                     FloatRegister src2, FloatRegister index,
>                                                     FloatRegister tmp, BasicType bt,
>                                                     unsigned length_in_bytes) {
>   assert_different_registers(dst, src1, src2, index, tmp);
>   SIMD_RegVariant T = elemType_to_regVariant(bt);
>   if (length_in_bytes == 8) {
>     assert(UseSVE >= 1, "must be");
>     ins(tmp, D, src1, 0, 0);
>     ins(tmp, D, src2, 1, 0);
>     sve_tbl(dst, T, tmp, index);
>   } else {
>     assert(UseSVE == 2 && length_in_bytes == MaxVectorSize, "must be");
>     sve_tbl(dst, T, src1, src2, index);
>   }
> }
> 
> void C2_MacroAssembler::select_from_two_vectors(FloatRegister dst, FloatRegister src1,
>                                                 FloatRegister src2, FloatRegister index,
>                                                 FloatRegister tmp, BasicType bt,
>                                                 unsigned length_in_bytes) {
> 
>   assert_different_registers(dst, src1, src2, index, tmp);
> 
>   if (UseSVE == 2 || (UseSVE == 1 && length_in_bytes == 8)) {
>     select_from_two_vectors_sve(dst, src1, src2, index, tmp, bt, length_in_bytes);
>     return;
>   }
> 
>   // The only BasicTypes that can reach here are T_SHORT, T_BYTE, T_INT and T_FLOAT
>   assert(bt != T_DOUBLE ...

Thanks a lot for your suggestion @XiaohongGong . I will try this suggestion and see how it looks and get back.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2197182232