RFR: 8342095: Add autovectorizer support for subword vector casts [v12]
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Mon May 12 03:11:54 UTC 2025
On Mon, 5 May 2025 13:51:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Thanks a lot for running the benchmark on your AVX512 machine! The results are very interesting, in the char cases it looks like we over-unroll the loop with SuperWord enabled even though we don't end up vectorizing the loop, fixing that could solve the slowdown. Since you mentioned the unroll amount was 32x, it might be unrolling to fill a vector (`512/sizeof(char) = 32`).
>>
>>> Wait, but you seem to say that you want to support `casting to T_CHAR`. But is the issue not casting FROM char?
>>
>> You are correct, I think that is my mistake. It looks like casting to char is supported because stores to both short and char become `StoreC`, but casting from char isn't supported because we have no `VectorCastC2X` node. I'll update the bug to make it more accurate.
>>
>> I've also pushed a small commit to remove some extra whitespace and to make the benchmark run faster.
>
> @jaskarth Just checked the internal testing. Saw this failure with `-XX:UseAVX=1`:
>
>
> Failed IR Rules (2) of Methods (2)
> ----------------------------------
> 1) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testByteToLong(byte[],long[])" - [Failed IR rules: 1]:
> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_B2L#_", "_ at min(max_byte, max_long)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})"
> > Phase "PrintIdeal":
> - counts: Graph contains wrong number of nodes:
> * Constraint 1: "(\\d+(\\s){2}(VectorCastB2X.*)+(\\s){2}===.*vector[A-Za-z]<J,2>)"
> - Failed comparison: [found] 0 > 0 [given]
> - No nodes matched!
>
> 2) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testLongToByte(long[],byte[])" - [Failed IR rules: 1]:
> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_L2B#_", "_ at min(max_long, max_byte)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})"
> > Phase "PrintIdeal":
> - counts: Graph contains wrong number of nodes:
> * Constraint 1: "(\\d+(\\s){2}(VectorCastL2X.*)+(\\s){2}===.*vector[A-Za-z]<B,2>)"
> - Failed comparison: [found] 0 > 0 [given]
> - No nodes matched!
@eme64 Thanks for the testing results! It looks like byte<->long conversion isn't supported with AVX1, so I've pushed a small to make the test to check for AVX2 in those cases instead.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2870630280
More information about the hotspot-compiler-dev
mailing list