RFR: 8266054: VectorAPI rotate operation optimization [v3]

Paul Sandoz psandoz at openjdk.java.net
Mon May 3 16:57:55 UTC 2021


On Mon, 3 May 2021 06:51:29 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
>> 
>>     vec1 = lanewise(VectorOperators.LSHL, n)
>>     vec2 = lanewise(VectorOperators.LSHR, n)
>>     res = lanewise(VectorOperations.OR, vec1 , vec2)
>> 
>> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
>> 
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations )   instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>> 
>> Please find below the performance data for included JMH benchmark.
>> Machine:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade lake Server)
>> 
>> ``
>> 
>> Benchmark | (TESTSIZE) | Shift | Baseline (ops/ms) | Withopt (ops/ms) | Gain %
>> -- | -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftI | (size) | (shift) | 7384.747 | 7706.652 | 4.36
>> RotateBenchmark.testRotateLeftI | 64 | 11 | 3723.305 | 3816.968 | 2.52
>> RotateBenchmark.testRotateLeftI | 128 | 11 | 1811.521 | 1966.05 | 8.53
>> RotateBenchmark.testRotateLeftI | 256 | 11 | 7133.296 | 7715.047 | 8.16
>> RotateBenchmark.testRotateLeftI | 64 | 21 | 3612.144 | 3886.225 | 7.59
>> RotateBenchmark.testRotateLeftI | 128 | 21 | 1815.422 | 1962.753 | 8.12
>> RotateBenchmark.testRotateLeftI | 256 | 21 | 7216.353 | 7677.165 | 6.39
>> RotateBenchmark.testRotateLeftI | 64 | 31 | 3602.008 | 3892.297 | 8.06
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 1882.163 | 1958.887 | 4.08
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 11819.443 | 11912.864 | 0.79
>> RotateBenchmark.testRotateLeftI | 64 | 11 | 5978.475 | 6060.189 | 1.37
>> RotateBenchmark.testRotateLeftI | 128 | 11 | 2965.179 | 3060.969 | 3.23
>> RotateBenchmark.testRotateLeftI | 256 | 11 | 11479.579 | 11684.148 | 1.78
>> RotateBenchmark.testRotateLeftI | 64 | 21 | 5904.903 | 6094.409 | 3.21
>> RotateBenchmark.testRotateLeftI | 128 | 21 | 2969.879 | 3074.1 | 3.51
>> RotateBenchmark.testRotateLeftI | 256 | 21 | 11531.654 | 12155.954 | 5.41
>> RotateBenchmark.testRotateLeftI | 64 | 31 | 5730.918 | 6112.514 | 6.66
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 2937.19 | 2976.297 | 1.33
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 16159.184 | 16459.462 | 1.86
>> RotateBenchmark.testRotateLeftI | 64 | 11 | 8154.982 | 8396.089 | 2.96
>> RotateBenchmark.testRotateLeftI | 128 | 11 | 4142.224 | 4292.049 | 3.62
>> RotateBenchmark.testRotateLeftI | 256 | 11 | 15958.154 | 16163.518 | 1.29
>> RotateBenchmark.testRotateLeftI | 64 | 21 | 8098.805 | 8504.279 | 5.01
>> RotateBenchmark.testRotateLeftI | 128 | 21 | 4137.598 | 4314.868 | 4.28
>> RotateBenchmark.testRotateLeftI | 256 | 21 | 16201.666 | 15992.958 | -1.29
>> RotateBenchmark.testRotateLeftI | 64 | 31 | 8027.169 | 8484.379 | 5.70
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 4146.29 | 4039.681 | -2.57
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 3566.176 | 3805.248 | 6.70
>> RotateBenchmark.testRotateLeftL | 64 | 11 | 1820.219 | 1962.866 | 7.84
>> RotateBenchmark.testRotateLeftL | 128 | 11 | 917.085 | 1007.334 | 9.84
>> RotateBenchmark.testRotateLeftL | 256 | 11 | 3592.139 | 3973.698 | 10.62
>> RotateBenchmark.testRotateLeftL | 64 | 21 | 1827.63 | 1999.711 | 9.42
>> RotateBenchmark.testRotateLeftL | 128 | 21 | 907.104 | 1002.997 | 10.57
>> RotateBenchmark.testRotateLeftL | 256 | 21 | 3780.962 | 3873.489 | 2.45
>> RotateBenchmark.testRotateLeftL | 64 | 31 | 1830.121 | 1955.63 | 6.86
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 891.411 | 982.138 | 10.18
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 5890.544 | 6100.594 | 3.57
>> RotateBenchmark.testRotateLeftL | 64 | 11 | 2984.329 | 3021.971 | 1.26
>> RotateBenchmark.testRotateLeftL | 128 | 11 | 1485.109 | 1527.689 | 2.87
>> RotateBenchmark.testRotateLeftL | 256 | 11 | 5903.411 | 6083.775 | 3.06
>> RotateBenchmark.testRotateLeftL | 64 | 21 | 2925.37 | 3050.958 | 4.29
>> RotateBenchmark.testRotateLeftL | 128 | 21 | 1486.432 | 1537.155 | 3.41
>> RotateBenchmark.testRotateLeftL | 256 | 21 | 5853.721 | 6000.682 | 2.51
>> RotateBenchmark.testRotateLeftL | 64 | 31 | 2896.116 | 3072.783 | 6.10
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 1483.132 | 1546.588 | 4.28
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 8059.206 | 8218.047 | 1.97
>> RotateBenchmark.testRotateLeftL | 64 | 11 | 4022.416 | 4195.52 | 4.30
>> RotateBenchmark.testRotateLeftL | 128 | 11 | 2084.296 | 2068.238 | -0.77
>> RotateBenchmark.testRotateLeftL | 256 | 11 | 7971.832 | 8172.819 | 2.52
>> RotateBenchmark.testRotateLeftL | 64 | 21 | 4032.036 | 4344.469 | 7.75
>> RotateBenchmark.testRotateLeftL | 128 | 21 | 2068.957 | 2138.685 | 3.37
>> RotateBenchmark.testRotateLeftL | 256 | 21 | 8140.63 | 8003.283 | -1.69
>> RotateBenchmark.testRotateLeftL | 64 | 31 | 4088.621 | 4296.091 | 5.07
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 2007.753 | 2088.455 | 4.02
>> RotateBenchmark.testRotateRightI | 256 | 31 | 7358.793 | 7548.976 | 2.58
>> RotateBenchmark.testRotateRightI | 64 | 11 | 3648.868 | 3897.47 | 6.81
>> RotateBenchmark.testRotateRightI | 128 | 11 | 1862.73 | 1969.964 | 5.76
>> RotateBenchmark.testRotateRightI | 256 | 11 | 7268.806 | 7790.588 | 7.18
>> RotateBenchmark.testRotateRightI | 64 | 21 | 3577.79 | 3979.675 | 11.23
>> RotateBenchmark.testRotateRightI | 128 | 21 | 1773.243 | 1921.088 | 8.34
>> RotateBenchmark.testRotateRightI | 256 | 21 | 7084.974 | 7609.912 | 7.41
>> RotateBenchmark.testRotateRightI | 64 | 31 | 3688.781 | 3909.65 | 5.99
>> RotateBenchmark.testRotateRightI | 128 | 31 | 1845.978 | 1928.316 | 4.46
>> RotateBenchmark.testRotateRightI | 256 | 31 | 11463.228 | 12179.833 | 6.25
>> RotateBenchmark.testRotateRightI | 64 | 11 | 5678.052 | 6028.573 | 6.17
>> RotateBenchmark.testRotateRightI | 128 | 11 | 2990.419 | 3070.409 | 2.67
>> RotateBenchmark.testRotateRightI | 256 | 11 | 11780.283 | 12105.261 | 2.76
>> RotateBenchmark.testRotateRightI | 64 | 21 | 5827.8 | 6020.208 | 3.30
>> RotateBenchmark.testRotateRightI | 128 | 21 | 2904.852 | 3047.154 | 4.90
>> RotateBenchmark.testRotateRightI | 256 | 21 | 11359.146 | 12060.401 | 6.17
>> RotateBenchmark.testRotateRightI | 64 | 31 | 5823.207 | 6079.82 | 4.41
>> RotateBenchmark.testRotateRightI | 128 | 31 | 2984.484 | 3045.719 | 2.05
>> RotateBenchmark.testRotateRightI | 256 | 31 | 16200.504 | 16376.475 | 1.09
>> RotateBenchmark.testRotateRightI | 64 | 11 | 8118.399 | 8315.407 | 2.43
>> RotateBenchmark.testRotateRightI | 128 | 11 | 4130.745 | 4092.588 | -0.92
>> RotateBenchmark.testRotateRightI | 256 | 11 | 15842.168 | 16469.119 | 3.96
>> RotateBenchmark.testRotateRightI | 64 | 21 | 7855.164 | 8188.913 | 4.25
>> RotateBenchmark.testRotateRightI | 128 | 21 | 4114.378 | 4035.56 | -1.92
>> RotateBenchmark.testRotateRightI | 256 | 21 | 15636.117 | 16289.632 | 4.18
>> RotateBenchmark.testRotateRightI | 64 | 31 | 8108.067 | 7996.517 | -1.38
>> RotateBenchmark.testRotateRightI | 128 | 31 | 3997.547 | 4153.58 | 3.90
>> RotateBenchmark.testRotateRightL | 256 | 31 | 3685.99 | 3814.384 | 3.48
>> RotateBenchmark.testRotateRightL | 64 | 11 | 1787.875 | 1916.541 | 7.20
>> RotateBenchmark.testRotateRightL | 128 | 11 | 940.141 | 990.383 | 5.34
>> RotateBenchmark.testRotateRightL | 256 | 11 | 3745.968 | 3920.667 | 4.66
>> RotateBenchmark.testRotateRightL | 64 | 21 | 1877.94 | 1998.072 | 6.40
>> RotateBenchmark.testRotateRightL | 128 | 21 | 933.536 | 1004.61 | 7.61
>> RotateBenchmark.testRotateRightL | 256 | 21 | 3744.763 | 3947.427 | 5.41
>> RotateBenchmark.testRotateRightL | 64 | 31 | 1864.818 | 1978.277 | 6.08
>> RotateBenchmark.testRotateRightL | 128 | 31 | 906.965 | 998.692 | 10.11
>> RotateBenchmark.testRotateRightL | 256 | 31 | 5910.469 | 6062.429 | 2.57
>> RotateBenchmark.testRotateRightL | 64 | 11 | 2914.64 | 3033.127 | 4.07
>> RotateBenchmark.testRotateRightL | 128 | 11 | 1491.344 | 1543.936 | 3.53
>> RotateBenchmark.testRotateRightL | 256 | 11 | 5801.818 | 6098.892 | 5.12
>> RotateBenchmark.testRotateRightL | 64 | 21 | 2881.328 | 3089.547 | 7.23
>> RotateBenchmark.testRotateRightL | 128 | 21 | 1485.969 | 1526.231 | 2.71
>> RotateBenchmark.testRotateRightL | 256 | 21 | 5783.495 | 5957.649 | 3.01
>> RotateBenchmark.testRotateRightL | 64 | 31 | 3008.182 | 3026.323 | 0.60
>> RotateBenchmark.testRotateRightL | 128 | 31 | 1464.566 | 1546.825 | 5.62
>> RotateBenchmark.testRotateRightL | 256 | 31 | 8208.124 | 8361.437 | 1.87
>> RotateBenchmark.testRotateRightL | 64 | 11 | 4062.465 | 4319.412 | 6.32
>> RotateBenchmark.testRotateRightL | 128 | 11 | 2029.995 | 2086.497 | 2.78
>> RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11
>> RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47
>> RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10
>> RotateBenchmark.testRotateRightL | 256 | 21 | 8155.015 | 8175.792 | 0.25
>> RotateBenchmark.testRotateRightL | 64 | 31 | 3960.629 | 4263.922 | 7.66
>> RotateBenchmark.testRotateRightL | 128 | 31 | 1996.862 | 2055.486 | 2.94
>> 
>> ``
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8266054: Review comments resolution.

Testing-wise, can we reuse the `Kernel-Binary-*-op.template` files? hence no need for separate templates 
Further, i think we need to test with the vector accepting lane-wise method and the broadcast accepting method, since they go through different code paths. The broadcast method can use primitive type rather than cast to `int`, likely making it easier to reuse the binary templates.

It would be good if the scalar methods for rotating left/right were identical for the main code and tests. I prefer the code in the test methods.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 524:

> 522:     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
> 523:     /** Produce {@code rotateLeft(a,n)}.  Integral only. */
> 524:     public static final /*bitwise*/ Binary ROL = binary("ROL", "rotateLeft", VectorSupport.VECTOR_OP_LROTATE, VO_SHIFT | VO_SPECIAL);

I think we can remove the `VO_SPECIAL` flag on `ROL` and `ROR` now it is uniformly managed?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3720


More information about the hotspot-compiler-dev mailing list