RFR: 8266054: VectorAPI rotate operation optimization

Paul Sandoz psandoz at openjdk.java.net
Tue Apr 27 18:45:34 UTC 2021


On Tue, 27 Apr 2021 17:56:04 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
> 
>     vec1 = lanewise(VectorOperators.LSHL, n)
>     vec2 = lanewise(VectorOperators.LSHR, n)
>     res = lanewise(VectorOperations.OR, vec1 , vec2)
> 
> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
> 
> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations )   instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
> 
> Please find below the performance data for included JMH benchmark.
> Machine:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascadelake Server)
> 
> ``
> Benchmark | (TESTSIZE) | Baseline (ops/min) | Withopt (ops/min) | Gain %
> -- | -- | -- | -- | --
> RotateBenchmark.testRotateLeftI | 64 | 6813.033 | 6936.507 | 1.81
> RotateBenchmark.testRotateLeftI | 64 | 11549.524 | 12109.845 | 4.85
> RotateBenchmark.testRotateLeftI | 64 | 17780.166 | 18180.083 | 2.25
> RotateBenchmark.testRotateLeftI | 128 | 3355.631 | 3426.436 | 2.11
> RotateBenchmark.testRotateLeftI | 128 | 5925.534 | 6096.71 | 2.89
> RotateBenchmark.testRotateLeftI | 128 | 8847.645 | 8964.224 | 1.32
> RotateBenchmark.testRotateLeftI | 256 | 1704.192 | 1779.964 | 4.45
> RotateBenchmark.testRotateLeftI | 256 | 2919.158 | 3000.392 | 2.78
> RotateBenchmark.testRotateLeftI | 256 | 4386.134 | 4514.211 | 2.92
> RotateBenchmark.testRotateLeftL | 64 | 3419.193 | 3500.522 | 2.38
> RotateBenchmark.testRotateLeftL | 64 | 5982.87 | 6056.589 | 1.23
> RotateBenchmark.testRotateLeftL | 64 | 8829.172 | 8949.157 | 1.36
> RotateBenchmark.testRotateLeftL | 128 | 1684.745 | 1784.49 | 5.92
> RotateBenchmark.testRotateLeftL | 128 | 2942.226 | 2947.684 | 0.19
> RotateBenchmark.testRotateLeftL | 128 | 4385.488 | 4554.113 | 3.85
> RotateBenchmark.testRotateLeftL | 256 | 834.195 | 856.939 | 2.73
> RotateBenchmark.testRotateLeftL | 256 | 1463.802 | 1551.051 | 5.96
> RotateBenchmark.testRotateLeftL | 256 | 2187.005 | 2268.596 | 3.73
> RotateBenchmark.testRotateRightI | 64 | 6676.132 | 7077.587 | 6.01
> RotateBenchmark.testRotateRightI | 64 | 11452.825 | 11855.45 | 3.52
> RotateBenchmark.testRotateRightI | 64 | 17507.286 | 18164.757 | 3.76
> RotateBenchmark.testRotateRightI | 128 | 3412.394 | 3519.829 | 3.15
> RotateBenchmark.testRotateRightI | 128 | 5905.677 | 5875.698 | -0.51
> RotateBenchmark.testRotateRightI | 128 | 8745.95 | 8890.757 | 1.66
> RotateBenchmark.testRotateRightI | 256 | 1681.176 | 1708.54 | 1.63
> RotateBenchmark.testRotateRightI | 256 | 3004.008 | 3005.606 | 0.05
> RotateBenchmark.testRotateRightI | 256 | 4466.633 | 4548.232 | 1.83
> RotateBenchmark.testRotateRightL | 64 | 3361.499 | 3461.121 | 2.96
> RotateBenchmark.testRotateRightL | 64 | 5873.37 | 6072.209 | 3.39
> RotateBenchmark.testRotateRightL | 64 | 8820.284 | 9016.172 | 2.22
> RotateBenchmark.testRotateRightL | 128 | 1715.556 | 1720.67 | 0.30
> RotateBenchmark.testRotateRightL | 128 | 2957.337 | 3149.193 | 6.49
> RotateBenchmark.testRotateRightL | 128 | 4440.468 | 4473.221 | 0.74
> RotateBenchmark.testRotateRightL | 256 | 851.391 | 875.371 | 2.82
> RotateBenchmark.testRotateRightL | 256 | 1452.422 | 1522.942 | 4.86
> RotateBenchmark.testRotateRightL | 256 | 2208.454 | 2263.349 | 2.49
> 
>  ``

I noticed the tests are only updated for int and long, is that intentional? The HotSpot changes in some cases seem to imply all integral types are supported via the use of `is_integral_type`, contradicted by the use of `is_subword_type`.

I would recommend trying to leverage Integer/Long.rotateLeft/Right implementations. They are not available for byte/short, so lets add specific methods in those cases, that should make the Java op implementation clearer.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3720


More information about the hotspot-compiler-dev mailing list