RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3]
Xiaohong Gong
xgong at openjdk.org
Wed Jul 9 01:26:44 UTC 2025
On Tue, 8 Jul 2025 10:33:50 GMT, Fei Gao <fgao at openjdk.org> wrote:
>>> > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>>> > > >
>>> > > >
>>> > > > Actually I didn't change the min vector size for `char` vectors in this patch. Relaxing `short` vectors to 32-bit is to support the vector cast for Vector API, and there is no `char` species in it. Do you think it's better to do the same change for `char` as well? This will just benefit auto-vectorization.
>>> > >
>>> > >
>>> > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases involving `char`, the vector elements are represented using `T_SHORT` as the `BasicType`, rather than `T_CHAR`.
>>> > > This is because, in Java, operands of subword types are always promoted to `int` before any arithmetic operation. As a result, when handling a node like `ConvD2I`, we don’t initially know its actual subword type. Later, the SuperWord phase propagates a narrowed integer type backward to help determine the correct subword type. See:
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>>> > >
>>> > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>>> > >
>>> > > the entire propagation chain tends to use `T_SHORT` as well.
>>> > > This applies to most operations, with the exception of a few like `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>>> > > So your change already benefits many char-related vectorization cases like `convertDoubleToChar` above. That’s why we can safely relax the IR condition mentioned earlier.
>>> >
>>> >
>>> > Thanks for your input! It's really helpful to me. Does this mean it always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do not need to consider `T_CHAR` in vector IRs in backend?
>>>
>>> No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler needs to preserve the higher-order bits of the first operand. Therefore, SuperWord still needs to assign them precise subword types. See:
>>>
>>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>>
>> Yes, I see. Thanks! What I mean is for cases that SLP will use the sub...
>
>> > > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>> > > > >
>> > > > >
>> > > > > Actually I didn't change the min vector size for `char` vectors in this patch. Relaxing `short` vectors to 32-bit is to support the vector cast for Vector API, and there is no `char` species in it. Do you think it's better to do the same change for `char` as well? This will just benefit auto-vectorization.
>> > > >
>> > > >
>> > > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases involving `char`, the vector elements are represented using `T_SHORT` as the `BasicType`, rather than `T_CHAR`.
>> > > > This is because, in Java, operands of subword types are always promoted to `int` before any arithmetic operation. As a result, when handling a node like `ConvD2I`, we don’t initially know its actual subword type. Later, the SuperWord phase propagates a narrowed integer type backward to help determine the correct subword type. See:
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>> > > >
>> > > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>> > > >
>> > > > the entire propagation chain tends to use `T_SHORT` as well.
>> > > > This applies to most operations, with the exception of a few like `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>> > > > So your change already benefits many char-related vectorization cases like `convertDoubleToChar` above. That’s why we can safely relax the IR condition mentioned earlier.
>> > >
>> > >
>> > > Thanks for your input! It's really helpful to me. Does this mean it always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do not need to consider `T_CHAR` in vector IRs in backend?
>> >
>> >
>> > No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler needs to preserve the higher-order bits of the first operand. Therefore, SuperWord still needs to assign them precise subword types. See:
>> > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>>
>> Yes, I see. Thanks! What I mean is for cases th...
@fg1417 , there is performance regression of `D -> S` on NEON for SLP. I'v disabled the case in latest change. And here is the performance data of JMH `TypeVectorOperations` on Grace (the 128-bit SVE machine) and N1 (NEON) respectively:
Grace:
Benchmark COUNT Mode Unit Before After Ratio
TypeVectorOperationsSuperWord.convertD2S 512 avgt ns/op 155.667433 123.222497 1.26
TypeVectorOperationsSuperWord.convertD2S 2048 avgt ns/op 622.262384 489.336020 1.27
TypeVectorOperationsSuperWord.convertL2S 512 avgt ns/op 93.173939 63.557134 1.46
TypeVectorOperationsSuperWord.convertL2S 2048 avgt ns/op 365.287938 239.726941 1.52
TypeVectorOperationsSuperWord.convertS2D 512 avgt ns/op 157.096344 147.560047 1.06
TypeVectorOperationsSuperWord.convertS2D 2048 avgt ns/op 627.039963 614.748559 1.01
TypeVectorOperationsSuperWord.convertS2L 512 avgt ns/op 111.752970 108.629240 1.02
TypeVectorOperationsSuperWord.convertS2L 2048 avgt ns/op 441.312737 441.088523 1.00
N1:
Benchmark COUNT Mode Unit Before After Ratio
TypeVectorOperationsSuperWord.convertD2S 512 avgt ns/op 215.353528 214.769884 1.00
TypeVectorOperationsSuperWord.convertD2S 2048 avgt ns/op 958.428871 952.922855 1.00
TypeVectorOperationsSuperWord.convertL2S 512 avgt ns/op 158.000190 142.647209 1.10
TypeVectorOperationsSuperWord.convertL2S 2048 avgt ns/op 612.525835 532.023419 1.15
TypeVectorOperationsSuperWord.convertS2D 512 avgt ns/op 209.993363 210.466401 0.99
TypeVectorOperationsSuperWord.convertS2D 2048 avgt ns/op 819.181052 803.601170 1.01
TypeVectorOperationsSuperWord.convertS2L 512 avgt ns/op 217.848273 182.680450 1.19
TypeVectorOperationsSuperWord.convertS2L 2048 avgt ns/op 858.031089 695.502377 1.23
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3050738693
More information about the hotspot-compiler-dev
mailing list