RFR: 8342095: Add autovectorizer support for subword vector casts [v3]

Emanuel Peter epeter at openjdk.org
Fri May 2 09:25:46 UTC 2025


On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> @jaskarth Let me know if there is anything we can help you with here :)
>
> @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine:
> 
>                                                   Baseline                    Patch
> Benchmark                  (SIZE)  Mode  Cnt    Score    Error  Units    Score   Error  Units  Improvement
> VectorSubword.byteToChar     1024  avgt   12  252.954 ±  4.129  ns/op   24.219 ± 0.453  ns/op  (10.4x)
> VectorSubword.byteToInt      1024  avgt   12  194.707 ±  3.584  ns/op   38.353 ± 0.637  ns/op  (5.07x)
> VectorSubword.byteToLong     1024  avgt   12   73.645 ±  1.418  ns/op   70.521 ± 0.470  ns/op  (no change)
> VectorSubword.byteToShort    1024  avgt   12  252.647 ±  3.738  ns/op   22.664 ± 0.449  ns/op  (11.1x)
> VectorSubword.charToByte     1024  avgt   12  236.396 ±  3.893  ns/op  228.710 ± 1.967  ns/op  (no change)
> VectorSubword.charToInt      1024  avgt   12  179.673 ±  2.811  ns/op  173.764 ± 1.150  ns/op  (no change)
> VectorSubword.charToLong     1024  avgt   12  184.867 ±  3.079  ns/op  177.999 ± 1.312  ns/op  (no change)
> VectorSubword.charToShort    1024  avgt   12   24.385 ±  1.822  ns/op   22.375 ± 1.980  ns/op  (no change)
> VectorSubword.intToByte      1024  avgt   12  190.949 ±  1.475  ns/op   49.376 ± 1.383  ns/op  (3.86x)
> VectorSubword.intToChar      1024  avgt   12  182.862 ±  3.708  ns/op   44.344 ± 4.513  ns/op  (4.12x)
> VectorSubword.intToLong      1024  avgt   12   76.072 ±  1.153  ns/op   73.382 ± 0.294  ns/op  (no change)
> VectorSubword.intToShort     1024  avgt   12  184.362 ±  1.938  ns/op   45.556 ± 3.323  ns/op  (4.04x)
> VectorSubword.longToByte     1024  avgt   12  150.766 ±  3.475  ns/op  146.651 ± 0.742  ns/op  (no change)
> VectorSubword.longToChar     1024  avgt   12  121.764 ±  1.323  ns/op  117.068 ± 1.891  ns/op  (no change)
> VectorSubword.longToInt      1024  avgt   12   83.761 ±  2.140  ns/op   82.084 ± 0.930  ns/op  (no change)
> VectorSubword.longToShort    1024  avgt   12  132.293 ± 23.046  ns/op  115.883 ± 0.834  ns/op  (+ 12.4%)
> VectorSubword.shortToByte    1024  avgt   12  253.387 ±  5.972  ns/op   27.591 ± 1.311  ns/op  (9.18x)
> VectorSubword.shortToChar    1024  avgt   12   21.446 ±  1.914  ns/op   20.608 ± 1.593  ns/op  (no change)
> VectorSubword.shortToInt     1024  avgt   12  187.109 ±  3.372  ns/op   36.818 ± 0.989  ns/op  (5.08x)
> VectorSubword.shortToLong    1024  avgt   12   75.448 ±  0.930  ns/op   72.835 ± 0.507  ns/op  (no change)
> 
> Interestingly, eve...

@jaskarth Anyway, I'm super happy that you are working on patching this hole :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846773124


More information about the hotspot-compiler-dev mailing list