RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3]

Xiaohong Gong xgong at openjdk.org
Wed Jul 9 01:26:44 UTC 2025


On Tue, 8 Jul 2025 10:33:50 GMT, Fei Gao <fgao at openjdk.org> wrote:

>>> > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>>> > > > 
>>> > > > 
>>> > > > Actually I didn't change the min vector size for `char` vectors in this patch. Relaxing `short` vectors to 32-bit is to support the vector cast for Vector API, and there is no `char` species in it. Do you think it's better to do the same change for `char` as well? This will just benefit auto-vectorization.
>>> > > 
>>> > > 
>>> > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases involving `char`, the vector elements are represented using `T_SHORT` as the `BasicType`, rather than `T_CHAR`.
>>> > > This is because, in Java, operands of subword types are always promoted to `int` before any arithmetic operation. As a result, when handling a node like `ConvD2I`, we don’t initially know its actual subword type. Later, the SuperWord phase propagates a narrowed integer type backward to help determine the correct subword type. See:
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>>> > > 
>>> > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>>> > > 
>>> > > the entire propagation chain tends to use `T_SHORT` as well.
>>> > > This applies to most operations, with the exception of a few like `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>>> > > So your change already benefits many char-related vectorization cases like `convertDoubleToChar` above. That’s why we can safely relax the IR condition mentioned earlier.
>>> > 
>>> > 
>>> > Thanks for your input! It's really helpful to me. Does this mean it always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do not need to consider `T_CHAR` in vector IRs in backend?
>>> 
>>> No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler needs to preserve the higher-order bits of the first operand. Therefore, SuperWord still needs to assign them precise subword types. See:
>>> 
>>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>> 
>> Yes, I see. Thanks! What I mean is for cases that SLP will use the sub...
>
>> > > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>> > > > > 
>> > > > > 
>> > > > > Actually I didn't change the min vector size for `char` vectors in this patch. Relaxing `short` vectors to 32-bit is to support the vector cast for Vector API, and there is no `char` species in it. Do you think it's better to do the same change for `char` as well? This will just benefit auto-vectorization.
>> > > > 
>> > > > 
>> > > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases involving `char`, the vector elements are represented using `T_SHORT` as the `BasicType`, rather than `T_CHAR`.
>> > > > This is because, in Java, operands of subword types are always promoted to `int` before any arithmetic operation. As a result, when handling a node like `ConvD2I`, we don’t initially know its actual subword type. Later, the SuperWord phase propagates a narrowed integer type backward to help determine the correct subword type. See:
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>> > > > 
>> > > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>> > > > 
>> > > > the entire propagation chain tends to use `T_SHORT` as well.
>> > > > This applies to most operations, with the exception of a few like `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>> > > > So your change already benefits many char-related vectorization cases like `convertDoubleToChar` above. That’s why we can safely relax the IR condition mentioned earlier.
>> > > 
>> > > 
>> > > Thanks for your input! It's really helpful to me. Does this mean it always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do not need to consider `T_CHAR` in vector IRs in backend?
>> > 
>> > 
>> > No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler needs to preserve the higher-order bits of the first operand. Therefore, SuperWord still needs to assign them precise subword types. See:
>> > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>> 
>> Yes, I see. Thanks! What I mean is for cases th...

@fg1417 , there is performance regression of `D -> S` on NEON for SLP. I'v disabled the case in latest change. And here is the performance data of JMH `TypeVectorOperations` on Grace (the 128-bit SVE machine) and N1 (NEON) respectively:

Grace:

Benchmark                                 COUNT Mode  Unit    Before      After     Ratio
TypeVectorOperationsSuperWord.convertD2S  512   avgt  ns/op 155.667433  123.222497  1.26
TypeVectorOperationsSuperWord.convertD2S  2048  avgt  ns/op 622.262384  489.336020  1.27
TypeVectorOperationsSuperWord.convertL2S  512   avgt  ns/op 93.173939   63.557134   1.46
TypeVectorOperationsSuperWord.convertL2S  2048  avgt  ns/op 365.287938  239.726941  1.52
TypeVectorOperationsSuperWord.convertS2D  512   avgt  ns/op 157.096344  147.560047  1.06
TypeVectorOperationsSuperWord.convertS2D  2048  avgt  ns/op 627.039963  614.748559  1.01
TypeVectorOperationsSuperWord.convertS2L  512   avgt  ns/op 111.752970  108.629240  1.02
TypeVectorOperationsSuperWord.convertS2L  2048  avgt  ns/op 441.312737  441.088523  1.00

N1:

Benchmark                                 COUNT Mode  Unit    Before        After   Ratio
TypeVectorOperationsSuperWord.convertD2S  512   avgt  ns/op 215.353528  214.769884  1.00
TypeVectorOperationsSuperWord.convertD2S  2048  avgt  ns/op 958.428871  952.922855  1.00
TypeVectorOperationsSuperWord.convertL2S  512   avgt  ns/op 158.000190  142.647209  1.10
TypeVectorOperationsSuperWord.convertL2S  2048  avgt  ns/op 612.525835  532.023419  1.15
TypeVectorOperationsSuperWord.convertS2D  512   avgt  ns/op 209.993363  210.466401  0.99
TypeVectorOperationsSuperWord.convertS2D  2048  avgt  ns/op 819.181052  803.601170  1.01
TypeVectorOperationsSuperWord.convertS2L  512   avgt  ns/op 217.848273  182.680450  1.19
TypeVectorOperationsSuperWord.convertS2L  2048  avgt  ns/op 858.031089  695.502377  1.23

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3050738693


More information about the hotspot-compiler-dev mailing list