RFR: 8266054: VectorAPI rotate operation optimization [v4]

Jatin Bhateja jbhateja at openjdk.java.net
Fri May 7 18:31:15 UTC 2021


> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
> 
>     vec1 = lanewise(VectorOperators.LSHL, n)
>     vec2 = lanewise(VectorOperators.LSHR, n)
>     res = lanewise(VectorOperations.OR, vec1 , vec2)
> 
> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
> 
> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations )   instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
> 
> Please find below the performance data for included JMH benchmark.
> Machine:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade lake Server)
> 
> ``
> 
> Benchmark | (TESTSIZE) | Shift | Baseline (ops/ms) | Withopt (ops/ms) | Gain %
> -- | -- | -- | -- | -- | --
> RotateBenchmark.testRotateLeftI | (size) | (shift) | 7384.747 | 7706.652 | 4.36
> RotateBenchmark.testRotateLeftI | 64 | 11 | 3723.305 | 3816.968 | 2.52
> RotateBenchmark.testRotateLeftI | 128 | 11 | 1811.521 | 1966.05 | 8.53
> RotateBenchmark.testRotateLeftI | 256 | 11 | 7133.296 | 7715.047 | 8.16
> RotateBenchmark.testRotateLeftI | 64 | 21 | 3612.144 | 3886.225 | 7.59
> RotateBenchmark.testRotateLeftI | 128 | 21 | 1815.422 | 1962.753 | 8.12
> RotateBenchmark.testRotateLeftI | 256 | 21 | 7216.353 | 7677.165 | 6.39
> RotateBenchmark.testRotateLeftI | 64 | 31 | 3602.008 | 3892.297 | 8.06
> RotateBenchmark.testRotateLeftI | 128 | 31 | 1882.163 | 1958.887 | 4.08
> RotateBenchmark.testRotateLeftI | 256 | 31 | 11819.443 | 11912.864 | 0.79
> RotateBenchmark.testRotateLeftI | 64 | 11 | 5978.475 | 6060.189 | 1.37
> RotateBenchmark.testRotateLeftI | 128 | 11 | 2965.179 | 3060.969 | 3.23
> RotateBenchmark.testRotateLeftI | 256 | 11 | 11479.579 | 11684.148 | 1.78
> RotateBenchmark.testRotateLeftI | 64 | 21 | 5904.903 | 6094.409 | 3.21
> RotateBenchmark.testRotateLeftI | 128 | 21 | 2969.879 | 3074.1 | 3.51
> RotateBenchmark.testRotateLeftI | 256 | 21 | 11531.654 | 12155.954 | 5.41
> RotateBenchmark.testRotateLeftI | 64 | 31 | 5730.918 | 6112.514 | 6.66
> RotateBenchmark.testRotateLeftI | 128 | 31 | 2937.19 | 2976.297 | 1.33
> RotateBenchmark.testRotateLeftI | 256 | 31 | 16159.184 | 16459.462 | 1.86
> RotateBenchmark.testRotateLeftI | 64 | 11 | 8154.982 | 8396.089 | 2.96
> RotateBenchmark.testRotateLeftI | 128 | 11 | 4142.224 | 4292.049 | 3.62
> RotateBenchmark.testRotateLeftI | 256 | 11 | 15958.154 | 16163.518 | 1.29
> RotateBenchmark.testRotateLeftI | 64 | 21 | 8098.805 | 8504.279 | 5.01
> RotateBenchmark.testRotateLeftI | 128 | 21 | 4137.598 | 4314.868 | 4.28
> RotateBenchmark.testRotateLeftI | 256 | 21 | 16201.666 | 15992.958 | -1.29
> RotateBenchmark.testRotateLeftI | 64 | 31 | 8027.169 | 8484.379 | 5.70
> RotateBenchmark.testRotateLeftI | 128 | 31 | 4146.29 | 4039.681 | -2.57
> RotateBenchmark.testRotateLeftL | 256 | 31 | 3566.176 | 3805.248 | 6.70
> RotateBenchmark.testRotateLeftL | 64 | 11 | 1820.219 | 1962.866 | 7.84
> RotateBenchmark.testRotateLeftL | 128 | 11 | 917.085 | 1007.334 | 9.84
> RotateBenchmark.testRotateLeftL | 256 | 11 | 3592.139 | 3973.698 | 10.62
> RotateBenchmark.testRotateLeftL | 64 | 21 | 1827.63 | 1999.711 | 9.42
> RotateBenchmark.testRotateLeftL | 128 | 21 | 907.104 | 1002.997 | 10.57
> RotateBenchmark.testRotateLeftL | 256 | 21 | 3780.962 | 3873.489 | 2.45
> RotateBenchmark.testRotateLeftL | 64 | 31 | 1830.121 | 1955.63 | 6.86
> RotateBenchmark.testRotateLeftL | 128 | 31 | 891.411 | 982.138 | 10.18
> RotateBenchmark.testRotateLeftL | 256 | 31 | 5890.544 | 6100.594 | 3.57
> RotateBenchmark.testRotateLeftL | 64 | 11 | 2984.329 | 3021.971 | 1.26
> RotateBenchmark.testRotateLeftL | 128 | 11 | 1485.109 | 1527.689 | 2.87
> RotateBenchmark.testRotateLeftL | 256 | 11 | 5903.411 | 6083.775 | 3.06
> RotateBenchmark.testRotateLeftL | 64 | 21 | 2925.37 | 3050.958 | 4.29
> RotateBenchmark.testRotateLeftL | 128 | 21 | 1486.432 | 1537.155 | 3.41
> RotateBenchmark.testRotateLeftL | 256 | 21 | 5853.721 | 6000.682 | 2.51
> RotateBenchmark.testRotateLeftL | 64 | 31 | 2896.116 | 3072.783 | 6.10
> RotateBenchmark.testRotateLeftL | 128 | 31 | 1483.132 | 1546.588 | 4.28
> RotateBenchmark.testRotateLeftL | 256 | 31 | 8059.206 | 8218.047 | 1.97
> RotateBenchmark.testRotateLeftL | 64 | 11 | 4022.416 | 4195.52 | 4.30
> RotateBenchmark.testRotateLeftL | 128 | 11 | 2084.296 | 2068.238 | -0.77
> RotateBenchmark.testRotateLeftL | 256 | 11 | 7971.832 | 8172.819 | 2.52
> RotateBenchmark.testRotateLeftL | 64 | 21 | 4032.036 | 4344.469 | 7.75
> RotateBenchmark.testRotateLeftL | 128 | 21 | 2068.957 | 2138.685 | 3.37
> RotateBenchmark.testRotateLeftL | 256 | 21 | 8140.63 | 8003.283 | -1.69
> RotateBenchmark.testRotateLeftL | 64 | 31 | 4088.621 | 4296.091 | 5.07
> RotateBenchmark.testRotateLeftL | 128 | 31 | 2007.753 | 2088.455 | 4.02
> RotateBenchmark.testRotateRightI | 256 | 31 | 7358.793 | 7548.976 | 2.58
> RotateBenchmark.testRotateRightI | 64 | 11 | 3648.868 | 3897.47 | 6.81
> RotateBenchmark.testRotateRightI | 128 | 11 | 1862.73 | 1969.964 | 5.76
> RotateBenchmark.testRotateRightI | 256 | 11 | 7268.806 | 7790.588 | 7.18
> RotateBenchmark.testRotateRightI | 64 | 21 | 3577.79 | 3979.675 | 11.23
> RotateBenchmark.testRotateRightI | 128 | 21 | 1773.243 | 1921.088 | 8.34
> RotateBenchmark.testRotateRightI | 256 | 21 | 7084.974 | 7609.912 | 7.41
> RotateBenchmark.testRotateRightI | 64 | 31 | 3688.781 | 3909.65 | 5.99
> RotateBenchmark.testRotateRightI | 128 | 31 | 1845.978 | 1928.316 | 4.46
> RotateBenchmark.testRotateRightI | 256 | 31 | 11463.228 | 12179.833 | 6.25
> RotateBenchmark.testRotateRightI | 64 | 11 | 5678.052 | 6028.573 | 6.17
> RotateBenchmark.testRotateRightI | 128 | 11 | 2990.419 | 3070.409 | 2.67
> RotateBenchmark.testRotateRightI | 256 | 11 | 11780.283 | 12105.261 | 2.76
> RotateBenchmark.testRotateRightI | 64 | 21 | 5827.8 | 6020.208 | 3.30
> RotateBenchmark.testRotateRightI | 128 | 21 | 2904.852 | 3047.154 | 4.90
> RotateBenchmark.testRotateRightI | 256 | 21 | 11359.146 | 12060.401 | 6.17
> RotateBenchmark.testRotateRightI | 64 | 31 | 5823.207 | 6079.82 | 4.41
> RotateBenchmark.testRotateRightI | 128 | 31 | 2984.484 | 3045.719 | 2.05
> RotateBenchmark.testRotateRightI | 256 | 31 | 16200.504 | 16376.475 | 1.09
> RotateBenchmark.testRotateRightI | 64 | 11 | 8118.399 | 8315.407 | 2.43
> RotateBenchmark.testRotateRightI | 128 | 11 | 4130.745 | 4092.588 | -0.92
> RotateBenchmark.testRotateRightI | 256 | 11 | 15842.168 | 16469.119 | 3.96
> RotateBenchmark.testRotateRightI | 64 | 21 | 7855.164 | 8188.913 | 4.25
> RotateBenchmark.testRotateRightI | 128 | 21 | 4114.378 | 4035.56 | -1.92
> RotateBenchmark.testRotateRightI | 256 | 21 | 15636.117 | 16289.632 | 4.18
> RotateBenchmark.testRotateRightI | 64 | 31 | 8108.067 | 7996.517 | -1.38
> RotateBenchmark.testRotateRightI | 128 | 31 | 3997.547 | 4153.58 | 3.90
> RotateBenchmark.testRotateRightL | 256 | 31 | 3685.99 | 3814.384 | 3.48
> RotateBenchmark.testRotateRightL | 64 | 11 | 1787.875 | 1916.541 | 7.20
> RotateBenchmark.testRotateRightL | 128 | 11 | 940.141 | 990.383 | 5.34
> RotateBenchmark.testRotateRightL | 256 | 11 | 3745.968 | 3920.667 | 4.66
> RotateBenchmark.testRotateRightL | 64 | 21 | 1877.94 | 1998.072 | 6.40
> RotateBenchmark.testRotateRightL | 128 | 21 | 933.536 | 1004.61 | 7.61
> RotateBenchmark.testRotateRightL | 256 | 21 | 3744.763 | 3947.427 | 5.41
> RotateBenchmark.testRotateRightL | 64 | 31 | 1864.818 | 1978.277 | 6.08
> RotateBenchmark.testRotateRightL | 128 | 31 | 906.965 | 998.692 | 10.11
> RotateBenchmark.testRotateRightL | 256 | 31 | 5910.469 | 6062.429 | 2.57
> RotateBenchmark.testRotateRightL | 64 | 11 | 2914.64 | 3033.127 | 4.07
> RotateBenchmark.testRotateRightL | 128 | 11 | 1491.344 | 1543.936 | 3.53
> RotateBenchmark.testRotateRightL | 256 | 11 | 5801.818 | 6098.892 | 5.12
> RotateBenchmark.testRotateRightL | 64 | 21 | 2881.328 | 3089.547 | 7.23
> RotateBenchmark.testRotateRightL | 128 | 21 | 1485.969 | 1526.231 | 2.71
> RotateBenchmark.testRotateRightL | 256 | 21 | 5783.495 | 5957.649 | 3.01
> RotateBenchmark.testRotateRightL | 64 | 31 | 3008.182 | 3026.323 | 0.60
> RotateBenchmark.testRotateRightL | 128 | 31 | 1464.566 | 1546.825 | 5.62
> RotateBenchmark.testRotateRightL | 256 | 31 | 8208.124 | 8361.437 | 1.87
> RotateBenchmark.testRotateRightL | 64 | 11 | 4062.465 | 4319.412 | 6.32
> RotateBenchmark.testRotateRightL | 128 | 11 | 2029.995 | 2086.497 | 2.78
> RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11
> RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47
> RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10
> RotateBenchmark.testRotateRightL | 256 | 21 | 8155.015 | 8175.792 | 0.25
> RotateBenchmark.testRotateRightL | 64 | 31 | 3960.629 | 4263.922 | 7.66
> RotateBenchmark.testRotateRightL | 128 | 31 | 1996.862 | 2055.486 | 2.94
> 
> ``

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  8266054: Review comments resolution.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3720/files
  - new: https://git.openjdk.java.net/jdk/pull/3720/files/f7945bff..8042aa23

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3720&range=02-03

  Stats: 2651 lines in 39 files changed: 2328 ins; 10 del; 313 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3720.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3720/head:pull/3720

PR: https://git.openjdk.java.net/jdk/pull/3720


More information about the core-libs-dev mailing list