Integrated: 8266054: VectorAPI rotate operation optimization

Jatin Bhateja jbhateja at openjdk.java.net
Wed Jul 28 02:22:41 UTC 2021


On Tue, 27 Apr 2021 17:56:04 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
> 
>     vec1 = lanewise(VectorOperators.LSHL, n)
>     vec2 = lanewise(VectorOperators.LSHR, n)
>     res = lanewise(VectorOperations.OR, vec1 , vec2)
> 
> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
> 
> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations )   instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
> 
> Please find below the performance data for included JMH benchmark.
> Machine:  Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
> 
> 
> <html xmlns:v="urn:schemas-microsoft-com:vml"
> xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40">
> 
> <head>
> 
> <meta name=ProgId content=Excel.Sheet>
> <meta name=Generator content="Microsoft Excel 15">
> <link id=Main-File rel=Main-File
> href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
> <link rel=File-List
> href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> <style>
> 
> </style>
> </head>
> 
> <body link="#0563C1" vlink="#954F72">
> 
> 
> 
> Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts (ops/ms) | Gain
> -- | -- | -- | -- | -- | -- | --
> RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 0.973851372
> RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 0.966757399
> RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 1.030234907
> RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 0.986665464
> RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 0.967630525
> RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 0.984453766
> RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 0.978908964
> RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 0.988670669
> RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 0.986649996
> RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 0.992415694
> RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 0.975610495
> RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 0.986262605
> RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | 7777.105 | 1.003719792
> RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 1.001646863
> RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 0.999849188
> RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 1.000843194
> RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 0.998776954
> RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 0.999041739
> RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 3.329338501
> RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 3.233829288
> RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 3.246098286
> RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 3.230315997
> RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 3.171940969
> RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 3.22095324
> RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 3.34665393
> RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 3.18429981
> RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 3.210244272
> RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 3.259019764
> RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 3.352069988
> RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 3.181869353
> RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 2.625252123
> RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 2.697217983
> RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 2.72596813
> RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 2.871118372
> RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 2.624018047
> RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 2.691895339
> RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 3.364761291
> RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 | 3.229688895
> RotateBenchmark.testRotateLeftL | 128 | 15 | 256 | 1996.921 | 6985.256 | 3.498013191
> RotateBenchmark.testRotateLeftL | 128 | 15 | 512 | 986.906 | 3217.778 | 3.260470602
> RotateBenchmark.testRotateLeftL | 128 | 31 | 256 | 1999.06 | 6977.672 | 3.490476524
> RotateBenchmark.testRotateLeftL | 128 | 31 | 512 | 987.258 | 3236.63 | 3.278403416
> RotateBenchmark.testRotateLeftL | 256 | 7 | 256 | 3752.412 | 12995.954 | 3.4633601
> RotateBenchmark.testRotateLeftL | 256 | 7 | 512 | 1824.093 | 5809.576 | 3.184912173
> RotateBenchmark.testRotateLeftL | 256 | 15 | 256 | 3759.99 | 13262.631 | 3.52730486
> RotateBenchmark.testRotateLeftL | 256 | 15 | 512 | 1823.393 | 5803.872 | 3.183006626
> RotateBenchmark.testRotateLeftL | 256 | 31 | 256 | 3757.134 | 13284.633 | 3.535842214
> RotateBenchmark.testRotateLeftL | 256 | 31 | 512 | 1822.192 | 5824.178 | 3.196248255
> RotateBenchmark.testRotateLeftL | 512 | 7 | 256 | 5794.005 | 15567.753 | 2.686872552
> RotateBenchmark.testRotateLeftL | 512 | 7 | 512 | 2969.393 | 7694.79 | 2.591368
> RotateBenchmark.testRotateLeftL | 512 | 15 | 256 | 5817.292 | 15726.597 | 2.703422314
> RotateBenchmark.testRotateLeftL | 512 | 15 | 512 | 2944.655 | 7664.954 | 2.603005785
> RotateBenchmark.testRotateLeftL | 512 | 31 | 256 | 5822.131 | 16718.64 | 2.871567129
> RotateBenchmark.testRotateLeftL | 512 | 31 | 512 | 2944.763 | 7657.814 | 2.600485676
> RotateBenchmark.testRotateLeftS | 128 | 7 | 256 | 8006.155 | 7976.701 | 0.99632108
> RotateBenchmark.testRotateLeftS | 128 | 7 | 512 | 4031.753 | 4003.43 | 0.992975016
> RotateBenchmark.testRotateLeftS | 128 | 15 | 256 | 8003.879 | 7952.752 | 0.993612222
> RotateBenchmark.testRotateLeftS | 128 | 15 | 512 | 4026.359 | 4014.757 | 0.997118488
> RotateBenchmark.testRotateLeftS | 128 | 31 | 256 | 8000.842 | 7995.733 | 0.999361442
> RotateBenchmark.testRotateLeftS | 128 | 31 | 512 | 4044.421 | 4007.426 | 0.990852832
> RotateBenchmark.testRotateLeftS | 256 | 7 | 256 | 15078.471 | 15034.395 | 0.997076892
> RotateBenchmark.testRotateLeftS | 256 | 7 | 512 | 7236.509 | 7620.334 | 1.053040078
> RotateBenchmark.testRotateLeftS | 256 | 15 | 256 | 15093.661 | 15024.17 | 0.995396014
> RotateBenchmark.testRotateLeftS | 256 | 15 | 512 | 7308.568 | 7724.381 | 1.056893909
> RotateBenchmark.testRotateLeftS | 256 | 31 | 256 | 15332.233 | 15432.113 | 1.006514381
> RotateBenchmark.testRotateLeftS | 256 | 31 | 512 | 7317.18 | 7626.679 | 1.042297579
> RotateBenchmark.testRotateLeftS | 512 | 7 | 256 | 24079.012 | 23939.263 | 0.994196232
> RotateBenchmark.testRotateLeftS | 512 | 7 | 512 | 11441.41 | 11921.21 | 1.041935391
> RotateBenchmark.testRotateLeftS | 512 | 15 | 256 | 23563.675 | 23590.959 | 1.001157884
> RotateBenchmark.testRotateLeftS | 512 | 15 | 512 | 11418.634 | 11949.391 | 1.046481654
> RotateBenchmark.testRotateLeftS | 512 | 31 | 256 | 24035.69 | 23595.385 | 0.9816812
> RotateBenchmark.testRotateLeftS | 512 | 31 | 512 | 11668.091 | 11899.536 | 1.019835721
> RotateBenchmark.testRotateRightB | 128 | 7 | 256 | 3852.421 | 3816.521 | 0.990681185
> RotateBenchmark.testRotateRightB | 128 | 7 | 512 | 1956.766 | 1923.638 | 0.983070025
> RotateBenchmark.testRotateRightB | 128 | 15 | 256 | 3899.136 | 4038.945 | 1.035856405
> RotateBenchmark.testRotateRightB | 128 | 15 | 512 | 1957.733 | 2030.973 | 1.037410617
> RotateBenchmark.testRotateRightB | 128 | 31 | 256 | 3902.5 | 4043.736 | 1.03619116
> RotateBenchmark.testRotateRightB | 128 | 31 | 512 | 1957.728 | 1920.434 | 0.980950367
> RotateBenchmark.testRotateRightB | 256 | 7 | 256 | 4565.887 | 4515.083 | 0.988873137
> RotateBenchmark.testRotateRightB | 256 | 7 | 512 | 2300.057 | 2278.065 | 0.990438498
> RotateBenchmark.testRotateRightB | 256 | 15 | 256 | 4570.754 | 4527.692 | 0.990578797
> RotateBenchmark.testRotateRightB | 256 | 15 | 512 | 2300.524 | 2268.659 | 0.986148808
> RotateBenchmark.testRotateRightB | 256 | 31 | 256 | 4577.569 | 4513.29 | 0.98595783
> RotateBenchmark.testRotateRightB | 256 | 31 | 512 | 2304.335 | 2273.178 | 0.986478962
> RotateBenchmark.testRotateRightB | 512 | 7 | 256 | 7772.483 | 7842.671 | 1.009030319
> RotateBenchmark.testRotateRightB | 512 | 7 | 512 | 3907.265 | 3917.325 | 1.002574691
> RotateBenchmark.testRotateRightB | 512 | 15 | 256 | 7855.653 | 7865.25 | 1.001221668
> RotateBenchmark.testRotateRightB | 512 | 15 | 512 | 3909.845 | 3976.813 | 1.017128045
> RotateBenchmark.testRotateRightB | 512 | 31 | 256 | 7746.765 | 7870.159 | 1.015928455
> RotateBenchmark.testRotateRightB | 512 | 31 | 512 | 3919.596 | 3981.934 | 1.01590419
> RotateBenchmark.testRotateRightI | 128 | 7 | 256 | 4125.151 | 13056.878 | 3.165187893
> RotateBenchmark.testRotateRightI | 128 | 7 | 512 | 2045.201 | 6501.447 | 3.17887924
> RotateBenchmark.testRotateRightI | 128 | 15 | 256 | 4111.736 | 13318.124 | 3.23905134
> RotateBenchmark.testRotateRightI | 128 | 15 | 512 | 2055.355 | 6497.289 | 3.161151723
> RotateBenchmark.testRotateRightI | 128 | 31 | 256 | 4109.353 | 13073.3 | 3.181352393
> RotateBenchmark.testRotateRightI | 128 | 31 | 512 | 2055.431 | 6463.902 | 3.14479153
> RotateBenchmark.testRotateRightI | 256 | 7 | 256 | 7804.976 | 24585.962 | 3.150036848
> RotateBenchmark.testRotateRightI | 256 | 7 | 512 | 3815.818 | 11985.145 | 3.140911071
> RotateBenchmark.testRotateRightI | 256 | 15 | 256 | 7644.977 | 25863.841 | 3.383115606
> RotateBenchmark.testRotateRightI | 256 | 15 | 512 | 3822.508 | 12280.58 | 3.212702236
> RotateBenchmark.testRotateRightI | 256 | 31 | 256 | 7709.635 | 25655.108 | 3.327668301
> RotateBenchmark.testRotateRightI | 256 | 31 | 512 | 3801.5 | 12271.65 | 3.228107326
> RotateBenchmark.testRotateRightI | 512 | 7 | 256 | 12223.711 | 31239.788 | 2.555671351
> RotateBenchmark.testRotateRightI | 512 | 7 | 512 | 5973.571 | 16740.852 | 2.802486486
> RotateBenchmark.testRotateRightI | 512 | 15 | 256 | 12205.47 | 31248.025 | 2.560165647
> RotateBenchmark.testRotateRightI | 512 | 15 | 512 | 5966.513 | 15728.168 | 2.6360737
> RotateBenchmark.testRotateRightI | 512 | 31 | 256 | 12209.405 | 33181.105 | 2.71766765
> RotateBenchmark.testRotateRightI | 512 | 31 | 512 | 5981.527 | 15727.496 | 2.629344647
> RotateBenchmark.testRotateRightL | 128 | 7 | 256 | 2054.509 | 6980.849 | 3.397818652
> RotateBenchmark.testRotateRightL | 128 | 7 | 512 | 997.375 | 3242.374 | 3.250907633
> RotateBenchmark.testRotateRightL | 128 | 15 | 256 | 2051.459 | 6892.389 | 3.359749817
> RotateBenchmark.testRotateRightL | 128 | 15 | 512 | 1002.906 | 3223.342 | 3.21400211
> RotateBenchmark.testRotateRightL | 128 | 31 | 256 | 2044.749 | 6984.157 | 3.415654929
> RotateBenchmark.testRotateRightL | 128 | 31 | 512 | 1004.273 | 3237.496 | 3.22372104
> RotateBenchmark.testRotateRightL | 256 | 7 | 256 | 3811.551 | 13347.75 | 3.501920872
> RotateBenchmark.testRotateRightL | 256 | 7 | 512 | 1892.883 | 5840.85 | 3.085689924
> RotateBenchmark.testRotateRightL | 256 | 15 | 256 | 3821.705 | 14034.823 | 3.672398314
> RotateBenchmark.testRotateRightL | 256 | 15 | 512 | 1799.193 | 5817.533 | 3.233412424
> RotateBenchmark.testRotateRightL | 256 | 31 | 256 | 3816.666 | 14022.31 | 3.673968327
> RotateBenchmark.testRotateRightL | 256 | 31 | 512 | 1796.649 | 5824.13 | 3.241662673
> RotateBenchmark.testRotateRightL | 512 | 7 | 256 | 5943.986 | 15586.254 | 2.622188881
> RotateBenchmark.testRotateRightL | 512 | 7 | 512 | 3022.686 | 7662.241 | 2.534911334
> RotateBenchmark.testRotateRightL | 512 | 15 | 256 | 5958.008 | 15726.859 | 2.639616966
> RotateBenchmark.testRotateRightL | 512 | 15 | 512 | 2998.469 | 7654.703 | 2.552870482
> RotateBenchmark.testRotateRightL | 512 | 31 | 256 | 5937.491 | 15741.207 | 2.651154671
> RotateBenchmark.testRotateRightL | 512 | 31 | 512 | 3014.699 | 7656.837 | 2.539834657
> RotateBenchmark.testRotateRightS | 128 | 7 | 256 | 8172.896 | 8003.474 | 0.979270261
> RotateBenchmark.testRotateRightS | 128 | 7 | 512 | 4111.074 | 4047.267 | 0.984479238
> RotateBenchmark.testRotateRightS | 128 | 15 | 256 | 8225.79 | 8040.421 | 0.9774649
> RotateBenchmark.testRotateRightS | 128 | 15 | 512 | 4129.801 | 4011.919 | 0.971455767
> RotateBenchmark.testRotateRightS | 128 | 31 | 256 | 8176.102 | 8052.686 | 0.984905276
> RotateBenchmark.testRotateRightS | 128 | 31 | 512 | 4117.735 | 4046.522 | 0.982705784
> RotateBenchmark.testRotateRightS | 256 | 7 | 256 | 15213.617 | 15169.51 | 0.997100821
> RotateBenchmark.testRotateRightS | 256 | 7 | 512 | 7530.289 | 7625.581 | 1.012654494
> RotateBenchmark.testRotateRightS | 256 | 15 | 256 | 15238.384 | 15069.978 | 0.988948566
> RotateBenchmark.testRotateRightS | 256 | 15 | 512 | 7275.098 | 7620.764 | 1.047513587
> RotateBenchmark.testRotateRightS | 256 | 31 | 256 | 15299.821 | 15043.765 | 0.983264118
> RotateBenchmark.testRotateRightS | 256 | 31 | 512 | 7273.028 | 7630.97 | 1.04921499
> RotateBenchmark.testRotateRightS | 512 | 7 | 256 | 23998.152 | 23920.046 | 0.996745333
> RotateBenchmark.testRotateRightS | 512 | 7 | 512 | 11582.679 | 11916.382 | 1.02881052
> RotateBenchmark.testRotateRightS | 512 | 15 | 256 | 23982.797 | 23434.756 | 0.977148579
> RotateBenchmark.testRotateRightS | 512 | 15 | 512 | 11629.806 | 11918.759 | 1.0248459
> RotateBenchmark.testRotateRightS | 512 | 31 | 256 | 23988.549 | 23475.629 | 0.978618132
> RotateBenchmark.testRotateRightS | 512 | 31 | 512 | 11650.146 | 11916.47 | 1.022860143
> 
> 
> 
> </body>
> 
> </html>

This pull request has now been integrated.

Changeset: d994b93e
Author:    Jatin Bhateja <jbhateja at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/d994b93e211d49af79212d765633ba3457365a08
Stats:     4438 lines in 57 files changed: 4219 ins; 58 del; 161 mod

8266054: VectorAPI rotate operation optimization

Reviewed-by: psandoz, sviswanathan

-------------

PR: https://git.openjdk.java.net/jdk/pull/3720


More information about the core-libs-dev mailing list