RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4]
Bhavana Kilambi
bkilambi at openjdk.org
Wed Jun 25 08:09:35 UTC 2025
On Wed, 25 Jun 2025 06:28:55 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> src/hotspot/cpu/aarch64/aarch64_vector.ad line 262:
>>
>>> 260: (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize )) {
>>> 261: return false;
>>> 262: }
>>
>> How about:
>>
>> case Op_SelectFromTwoVector:
>> // The "tbl" instruction for two vector table is supported only in Neon and SVE2. Return
>> // false if vector length > 16B but supported SVE version < 2.
>> //
>> // Additionally, this operation is disabled for doubles and longs on machines with SVE < 2,
>> // Instead, the default VectorRearrange + VectorBlend is generated as the performance of
>> // the default pattern is slightly better.
>> if (UseSVE < 2 && (type2aelembytes(bt) == 8 || length_in_bytes > 16)) {
>> return false;
>> }
>>
>> // As the SVE2 "tbl" instruction is unpredicated and partial operations cannot be generated
>> // using masks, we currently disable this operation on machines where length_in_bytes <
>> // MaxVectorSize with the only exception of 8B vector length.
>> if (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize)) {
>> return false;
>> }
>>
>> break;
>
> Maybe the NEON `tbl` can also be generated for SVE2 when `length_in_bytes == 16 && length_in_bytes < MaxVectorSize`. This is a special partial version for SVE2. As a summary, The match rule's predicate will be:
> 1) NEON: UseSVE < 2 || (length_in_bytes < 16 || length_in_bytes < MaxVectorSize)
> 2) SVE: UseSVE ==2 && (length_in_bytes >= 16 && length_in_bytes == MaxVectorSize)
>
> Seems this will make predicate or code here more complex. Advantage is this op with 128 vector shape on a SVE2 256 or larger size machine will also be intrinsified. It's not a block and change or not is up to you. We can also revisit this part once the 256-bit SVE2 machine exist in future.
Thanks @XiaohongGong . The case you mention will need an SVE2 machine with MaxVectorSize >= 32B which is currently not available. I think it's better if we revisit these cases once a functioning hardware is available. Shall I add a comment here as a reminder that we need to revisit when such hardware is available?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166075939
More information about the hotspot-compiler-dev
mailing list