RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v7]
Xiaohong Gong
xgong at openjdk.org
Fri Jun 27 03:17:46 UTC 2025
On Thu, 26 Jun 2025 14:20:50 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI.
>>
>> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2.
>>
>> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2.
>>
>> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation.
>>
>> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor.
>>
>> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below -
>>
>>
>> Benchmark (size) Mode Cnt Gain
>> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43
>> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48
>> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55
>> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07
>> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69
>> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52
>> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50
>> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52
>> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38
>> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93
>> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48
>> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49
>>
>>
>> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander.
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
>
> Addressed review comments
LGTM except some minor code style issues. Thanks so much for your updating!
src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5168:
> 5166: define(`SELECT_FROM_TWO_VECTORS_NEON', `
> 5167: instruct vselect_from_two_vectors_Neon_$1_$2(vReg dst, vReg_V$1 src1, vReg_V$2 src2,
> 5168: vReg index, vReg tmp1) %{
Suggestion:
vReg index, vReg tmp) %{
src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5177:
> 5175: uint length_in_bytes = Matcher::vector_length_in_bytes(this);
> 5176: __ select_from_two_vectors_Neon($dst$$FloatRegister, $src1$$FloatRegister,
> 5177: $src2$$FloatRegister,$index$$FloatRegister,
Suggestion:
$src2$$FloatRegister, $index$$FloatRegister,
src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5189:
> 5187: define(`SELECT_FROM_TWO_VECTORS_SVE', `
> 5188: instruct vselect_from_two_vectors_SVE_$1_$2(vReg dst, vReg_V$1 src1, vReg_V$2 src2,
> 5189: vReg index, vReg tmp1) %{
Suggestion:
vReg index, vReg tmp) %{
src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5198:
> 5196: uint length_in_bytes = Matcher::vector_length_in_bytes(this);
> 5197: __ select_from_two_vectors_SVE($dst$$FloatRegister, $src1$$FloatRegister,
> 5198: $src2$$FloatRegister,$index$$FloatRegister,
Suggestion:
$src2$$FloatRegister, $index$$FloatRegister,
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2859:
> 2857: void C2_MacroAssembler::select_from_two_vectors_Neon(FloatRegister dst, FloatRegister src1,
> 2858: FloatRegister src2, FloatRegister index,
> 2859: FloatRegister tmp1, BasicType bt, bool isQ) {
Suggestion:
FloatRegister tmp, BasicType bt, bool isQ) {
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2927:
> 2925: void C2_MacroAssembler::select_from_two_vectors_SVE(FloatRegister dst, FloatRegister src1,
> 2926: FloatRegister src2, FloatRegister index,
> 2927: FloatRegister tmp1, BasicType bt,
Suggestion:
FloatRegister tmp, BasicType bt,
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2938:
> 2936: } else { // UseSVE == 2 and vector_length_in_bytes > 8
> 2937: assert(UseSVE == 2, "must be sve2");
> 2938: sve_tbl(dst, size, src1, 2, index);
Suggestion:
assert(UseSVE == 2, "must be sve2");
sve_tbl(dst, size, src1, 2, index);
-------------
Marked as reviewed by xgong (Committer).
PR Review: https://git.openjdk.org/jdk/pull/23570#pullrequestreview-2964548012
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170627995
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170626178
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170630105
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170629730
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170631738
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170638885
PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170638159
More information about the hotspot-compiler-dev
mailing list