RFR: 8342095: Add autovectorizer support for subword vector casts [v3]
Emanuel Peter
epeter at openjdk.org
Fri May 2 09:25:46 UTC 2025
On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> @jaskarth Let me know if there is anything we can help you with here :)
>
> @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine:
>
> Baseline Patch
> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement
> VectorSubword.byteToChar 1024 avgt 12 252.954 ± 4.129 ns/op 24.219 ± 0.453 ns/op (10.4x)
> VectorSubword.byteToInt 1024 avgt 12 194.707 ± 3.584 ns/op 38.353 ± 0.637 ns/op (5.07x)
> VectorSubword.byteToLong 1024 avgt 12 73.645 ± 1.418 ns/op 70.521 ± 0.470 ns/op (no change)
> VectorSubword.byteToShort 1024 avgt 12 252.647 ± 3.738 ns/op 22.664 ± 0.449 ns/op (11.1x)
> VectorSubword.charToByte 1024 avgt 12 236.396 ± 3.893 ns/op 228.710 ± 1.967 ns/op (no change)
> VectorSubword.charToInt 1024 avgt 12 179.673 ± 2.811 ns/op 173.764 ± 1.150 ns/op (no change)
> VectorSubword.charToLong 1024 avgt 12 184.867 ± 3.079 ns/op 177.999 ± 1.312 ns/op (no change)
> VectorSubword.charToShort 1024 avgt 12 24.385 ± 1.822 ns/op 22.375 ± 1.980 ns/op (no change)
> VectorSubword.intToByte 1024 avgt 12 190.949 ± 1.475 ns/op 49.376 ± 1.383 ns/op (3.86x)
> VectorSubword.intToChar 1024 avgt 12 182.862 ± 3.708 ns/op 44.344 ± 4.513 ns/op (4.12x)
> VectorSubword.intToLong 1024 avgt 12 76.072 ± 1.153 ns/op 73.382 ± 0.294 ns/op (no change)
> VectorSubword.intToShort 1024 avgt 12 184.362 ± 1.938 ns/op 45.556 ± 3.323 ns/op (4.04x)
> VectorSubword.longToByte 1024 avgt 12 150.766 ± 3.475 ns/op 146.651 ± 0.742 ns/op (no change)
> VectorSubword.longToChar 1024 avgt 12 121.764 ± 1.323 ns/op 117.068 ± 1.891 ns/op (no change)
> VectorSubword.longToInt 1024 avgt 12 83.761 ± 2.140 ns/op 82.084 ± 0.930 ns/op (no change)
> VectorSubword.longToShort 1024 avgt 12 132.293 ± 23.046 ns/op 115.883 ± 0.834 ns/op (+ 12.4%)
> VectorSubword.shortToByte 1024 avgt 12 253.387 ± 5.972 ns/op 27.591 ± 1.311 ns/op (9.18x)
> VectorSubword.shortToChar 1024 avgt 12 21.446 ± 1.914 ns/op 20.608 ± 1.593 ns/op (no change)
> VectorSubword.shortToInt 1024 avgt 12 187.109 ± 3.372 ns/op 36.818 ± 0.989 ns/op (5.08x)
> VectorSubword.shortToLong 1024 avgt 12 75.448 ± 0.930 ns/op 72.835 ± 0.507 ns/op (no change)
>
> Interestingly, eve...
@jaskarth Anyway, I'm super happy that you are working on patching this hole :)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846773124
More information about the hotspot-compiler-dev
mailing list