Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]
On Tue, 20 Jul 2021 09:57:07 GMT, Jatin Bhateja <jbhateja@openjdk.org> wrote:
Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
vec1 = lanewise(VectorOperators.LSHL, n) vec2 = lanewise(VectorOperators.LSHR, n) res = lanewise(VectorOperations.OR, vec1 , vec2)
This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
Please find below the performance data for included JMH benchmark. Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style>
</style> </head>
<body link="#0563C1" vlink="#954F72">
Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts (ops/ms) | Gain -- | -- | -- | -- | -- | -- | -- RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 0.973851372 RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 0.966757399 RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 1.030234907 RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 0.986665464 RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 0.967630525 RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 0.984453766 RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 0.978908964 RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 0.988670669 RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 0.986649996 RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 0.992415694 RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 0.975610495 RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 0.986262605 RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | 7777.105 | 1.003719792 RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 1.001646863 RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 0.999849188 RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 1.000843194 RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 0.998776954 RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 0.999041739 RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 3.329338501 RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 3.233829288 RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 3.246098286 RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 3.230315997 RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 3.171940969 RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 3.22095324 RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 3.34665393 RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 3.18429981 RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 3.210244272 RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 3.259019764 RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 3.352069988 RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 3.181869353 RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 2.625252123 RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 2.697217983 RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 2.72596813 RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 2.871118372 RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 2.624018047 RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 2.691895339 RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 3.364761291 RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 | 3.229688895 RotateBenchmark.testRotateLeftL | 128 | 15 | 256 | 1996.921 | 6985.256 | 3.498013191 RotateBenchmark.testRotateLeftL | 128 | 15 | 512 | 986.906 | 3217.778 | 3.260470602 RotateBenchmark.testRotateLeftL | 128 | 31 | 256 | 1999.06 | 6977.672 | 3.490476524 RotateBenchmark.testRotateLeftL | 128 | 31 | 512 | 987.258 | 3236.63 | 3.278403416 RotateBenchmark.testRotateLeftL | 256 | 7 | 256 | 3752.412 | 12995.954 | 3.4633601 RotateBenchmark.testRotateLeftL | 256 | 7 | 512 | 1824.093 | 5809.576 | 3.184912173 RotateBenchmark.testRotateLeftL | 256 | 15 | 256 | 3759.99 | 13262.631 | 3.52730486 RotateBenchmark.testRotateLeftL | 256 | 15 | 512 | 1823.393 | 5803.872 | 3.183006626 RotateBenchmark.testRotateLeftL | 256 | 31 | 256 | 3757.134 | 13284.633 | 3.535842214 RotateBenchmark.testRotateLeftL | 256 | 31 | 512 | 1822.192 | 5824.178 | 3.196248255 RotateBenchmark.testRotateLeftL | 512 | 7 | 256 | 5794.005 | 15567.753 | 2.686872552 RotateBenchmark.testRotateLeftL | 512 | 7 | 512 | 2969.393 | 7694.79 | 2.591368 RotateBenchmark.testRotateLeftL | 512 | 15 | 256 | 5817.292 | 15726.597 | 2.703422314 RotateBenchmark.testRotateLeftL | 512 | 15 | 512 | 2944.655 | 7664.954 | 2.603005785 RotateBenchmark.testRotateLeftL | 512 | 31 | 256 | 5822.131 | 16718.64 | 2.871567129 RotateBenchmark.testRotateLeftL | 512 | 31 | 512 | 2944.763 | 7657.814 | 2.600485676 RotateBenchmark.testRotateLeftS | 128 | 7 | 256 | 8006.155 | 7976.701 | 0.99632108 RotateBenchmark.testRotateLeftS | 128 | 7 | 512 | 4031.753 | 4003.43 | 0.992975016 RotateBenchmark.testRotateLeftS | 128 | 15 | 256 | 8003.879 | 7952.752 | 0.993612222 RotateBenchmark.testRotateLeftS | 128 | 15 | 512 | 4026.359 | 4014.757 | 0.997118488 RotateBenchmark.testRotateLeftS | 128 | 31 | 256 | 8000.842 | 7995.733 | 0.999361442 RotateBenchmark.testRotateLeftS | 128 | 31 | 512 | 4044.421 | 4007.426 | 0.990852832 RotateBenchmark.testRotateLeftS | 256 | 7 | 256 | 15078.471 | 15034.395 | 0.997076892 RotateBenchmark.testRotateLeftS | 256 | 7 | 512 | 7236.509 | 7620.334 | 1.053040078 RotateBenchmark.testRotateLeftS | 256 | 15 | 256 | 15093.661 | 15024.17 | 0.995396014 RotateBenchmark.testRotateLeftS | 256 | 15 | 512 | 7308.568 | 7724.381 | 1.056893909 RotateBenchmark.testRotateLeftS | 256 | 31 | 256 | 15332.233 | 15432.113 | 1.006514381 RotateBenchmark.testRotateLeftS | 256 | 31 | 512 | 7317.18 | 7626.679 | 1.042297579 RotateBenchmark.testRotateLeftS | 512 | 7 | 256 | 24079.012 | 23939.263 | 0.994196232 RotateBenchmark.testRotateLeftS | 512 | 7 | 512 | 11441.41 | 11921.21 | 1.041935391 RotateBenchmark.testRotateLeftS | 512 | 15 | 256 | 23563.675 | 23590.959 | 1.001157884 RotateBenchmark.testRotateLeftS | 512 | 15 | 512 | 11418.634 | 11949.391 | 1.046481654 RotateBenchmark.testRotateLeftS | 512 | 31 | 256 | 24035.69 | 23595.385 | 0.9816812 RotateBenchmark.testRotateLeftS | 512 | 31 | 512 | 11668.091 | 11899.536 | 1.019835721 RotateBenchmark.testRotateRightB | 128 | 7 | 256 | 3852.421 | 3816.521 | 0.990681185 RotateBenchmark.testRotateRightB | 128 | 7 | 512 | 1956.766 | 1923.638 | 0.983070025 RotateBenchmark.testRotateRightB | 128 | 15 | 256 | 3899.136 | 4038.945 | 1.035856405 RotateBenchmark.testRotateRightB | 128 | 15 | 512 | 1957.733 | 2030.973 | 1.037410617 RotateBenchmark.testRotateRightB | 128 | 31 | 256 | 3902.5 | 4043.736 | 1.03619116 RotateBenchmark.testRotateRightB | 128 | 31 | 512 | 1957.728 | 1920.434 | 0.980950367 RotateBenchmark.testRotateRightB | 256 | 7 | 256 | 4565.887 | 4515.083 | 0.988873137 RotateBenchmark.testRotateRightB | 256 | 7 | 512 | 2300.057 | 2278.065 | 0.990438498 RotateBenchmark.testRotateRightB | 256 | 15 | 256 | 4570.754 | 4527.692 | 0.990578797 RotateBenchmark.testRotateRightB | 256 | 15 | 512 | 2300.524 | 2268.659 | 0.986148808 RotateBenchmark.testRotateRightB | 256 | 31 | 256 | 4577.569 | 4513.29 | 0.98595783 RotateBenchmark.testRotateRightB | 256 | 31 | 512 | 2304.335 | 2273.178 | 0.986478962 RotateBenchmark.testRotateRightB | 512 | 7 | 256 | 7772.483 | 7842.671 | 1.009030319 RotateBenchmark.testRotateRightB | 512 | 7 | 512 | 3907.265 | 3917.325 | 1.002574691 RotateBenchmark.testRotateRightB | 512 | 15 | 256 | 7855.653 | 7865.25 | 1.001221668 RotateBenchmark.testRotateRightB | 512 | 15 | 512 | 3909.845 | 3976.813 | 1.017128045 RotateBenchmark.testRotateRightB | 512 | 31 | 256 | 7746.765 | 7870.159 | 1.015928455 RotateBenchmark.testRotateRightB | 512 | 31 | 512 | 3919.596 | 3981.934 | 1.01590419 RotateBenchmark.testRotateRightI | 128 | 7 | 256 | 4125.151 | 13056.878 | 3.165187893 RotateBenchmark.testRotateRightI | 128 | 7 | 512 | 2045.201 | 6501.447 | 3.17887924 RotateBenchmark.testRotateRightI | 128 | 15 | 256 | 4111.736 | 13318.124 | 3.23905134 RotateBenchmark.testRotateRightI | 128 | 15 | 512 | 2055.355 | 6497.289 | 3.161151723 RotateBenchmark.testRotateRightI | 128 | 31 | 256 | 4109.353 | 13073.3 | 3.181352393 RotateBenchmark.testRotateRightI | 128 | 31 | 512 | 2055.431 | 6463.902 | 3.14479153 RotateBenchmark.testRotateRightI | 256 | 7 | 256 | 7804.976 | 24585.962 | 3.150036848 RotateBenchmark.testRotateRightI | 256 | 7 | 512 | 3815.818 | 11985.145 | 3.140911071 RotateBenchmark.testRotateRightI | 256 | 15 | 256 | 7644.977 | 25863.841 | 3.383115606 RotateBenchmark.testRotateRightI | 256 | 15 | 512 | 3822.508 | 12280.58 | 3.212702236 RotateBenchmark.testRotateRightI | 256 | 31 | 256 | 7709.635 | 25655.108 | 3.327668301 RotateBenchmark.testRotateRightI | 256 | 31 | 512 | 3801.5 | 12271.65 | 3.228107326 RotateBenchmark.testRotateRightI | 512 | 7 | 256 | 12223.711 | 31239.788 | 2.555671351 RotateBenchmark.testRotateRightI | 512 | 7 | 512 | 5973.571 | 16740.852 | 2.802486486 RotateBenchmark.testRotateRightI | 512 | 15 | 256 | 12205.47 | 31248.025 | 2.560165647 RotateBenchmark.testRotateRightI | 512 | 15 | 512 | 5966.513 | 15728.168 | 2.6360737 RotateBenchmark.testRotateRightI | 512 | 31 | 256 | 12209.405 | 33181.105 | 2.71766765 RotateBenchmark.testRotateRightI | 512 | 31 | 512 | 5981.527 | 15727.496 | 2.629344647 RotateBenchmark.testRotateRightL | 128 | 7 | 256 | 2054.509 | 6980.849 | 3.397818652 RotateBenchmark.testRotateRightL | 128 | 7 | 512 | 997.375 | 3242.374 | 3.250907633 RotateBenchmark.testRotateRightL | 128 | 15 | 256 | 2051.459 | 6892.389 | 3.359749817 RotateBenchmark.testRotateRightL | 128 | 15 | 512 | 1002.906 | 3223.342 | 3.21400211 RotateBenchmark.testRotateRightL | 128 | 31 | 256 | 2044.749 | 6984.157 | 3.415654929 RotateBenchmark.testRotateRightL | 128 | 31 | 512 | 1004.273 | 3237.496 | 3.22372104 RotateBenchmark.testRotateRightL | 256 | 7 | 256 | 3811.551 | 13347.75 | 3.501920872 RotateBenchmark.testRotateRightL | 256 | 7 | 512 | 1892.883 | 5840.85 | 3.085689924 RotateBenchmark.testRotateRightL | 256 | 15 | 256 | 3821.705 | 14034.823 | 3.672398314 RotateBenchmark.testRotateRightL | 256 | 15 | 512 | 1799.193 | 5817.533 | 3.233412424 RotateBenchmark.testRotateRightL | 256 | 31 | 256 | 3816.666 | 14022.31 | 3.673968327 RotateBenchmark.testRotateRightL | 256 | 31 | 512 | 1796.649 | 5824.13 | 3.241662673 RotateBenchmark.testRotateRightL | 512 | 7 | 256 | 5943.986 | 15586.254 | 2.622188881 RotateBenchmark.testRotateRightL | 512 | 7 | 512 | 3022.686 | 7662.241 | 2.534911334 RotateBenchmark.testRotateRightL | 512 | 15 | 256 | 5958.008 | 15726.859 | 2.639616966 RotateBenchmark.testRotateRightL | 512 | 15 | 512 | 2998.469 | 7654.703 | 2.552870482 RotateBenchmark.testRotateRightL | 512 | 31 | 256 | 5937.491 | 15741.207 | 2.651154671 RotateBenchmark.testRotateRightL | 512 | 31 | 512 | 3014.699 | 7656.837 | 2.539834657 RotateBenchmark.testRotateRightS | 128 | 7 | 256 | 8172.896 | 8003.474 | 0.979270261 RotateBenchmark.testRotateRightS | 128 | 7 | 512 | 4111.074 | 4047.267 | 0.984479238 RotateBenchmark.testRotateRightS | 128 | 15 | 256 | 8225.79 | 8040.421 | 0.9774649 RotateBenchmark.testRotateRightS | 128 | 15 | 512 | 4129.801 | 4011.919 | 0.971455767 RotateBenchmark.testRotateRightS | 128 | 31 | 256 | 8176.102 | 8052.686 | 0.984905276 RotateBenchmark.testRotateRightS | 128 | 31 | 512 | 4117.735 | 4046.522 | 0.982705784 RotateBenchmark.testRotateRightS | 256 | 7 | 256 | 15213.617 | 15169.51 | 0.997100821 RotateBenchmark.testRotateRightS | 256 | 7 | 512 | 7530.289 | 7625.581 | 1.012654494 RotateBenchmark.testRotateRightS | 256 | 15 | 256 | 15238.384 | 15069.978 | 0.988948566 RotateBenchmark.testRotateRightS | 256 | 15 | 512 | 7275.098 | 7620.764 | 1.047513587 RotateBenchmark.testRotateRightS | 256 | 31 | 256 | 15299.821 | 15043.765 | 0.983264118 RotateBenchmark.testRotateRightS | 256 | 31 | 512 | 7273.028 | 7630.97 | 1.04921499 RotateBenchmark.testRotateRightS | 512 | 7 | 256 | 23998.152 | 23920.046 | 0.996745333 RotateBenchmark.testRotateRightS | 512 | 7 | 512 | 11582.679 | 11916.382 | 1.02881052 RotateBenchmark.testRotateRightS | 512 | 15 | 256 | 23982.797 | 23434.756 | 0.977148579 RotateBenchmark.testRotateRightS | 512 | 15 | 512 | 11629.806 | 11918.759 | 1.0248459 RotateBenchmark.testRotateRightS | 512 | 31 | 256 | 23988.549 | 23475.629 | 0.978618132 RotateBenchmark.testRotateRightS | 512 | 31 | 512 | 11650.146 | 11916.47 | 1.022860143
</body>
</html>
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits:
- 8266054: Re-designing benchmark to remove noise. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054 - 8266054: Formal argument name change to be more appropriate. - 8266054: Review comments resolution. - 8266054: Incorporating styling changes based on reviews. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054 - ... and 9 more: https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3720
On Tue, 27 Jul 2021 18:31:20 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits:
- 8266054: Re-designing benchmark to remove noise. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054 - 8266054: Formal argument name change to be more appropriate. - 8266054: Review comments resolution. - 8266054: Incorporating styling changes based on reviews. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054 - ... and 9 more: https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
Looks good to me.
@sviswa7 and @jatin-bhateja jatin-bhateja The push caused https://bugs.openjdk.java.net/browse/JDK-8271366 I am strongly suggest in a future to ask an Oracle's engineer to test Intel's changes before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720
On Wed, 28 Jul 2021 04:48:35 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
Looks good to me.
@sviswa7 and @jatin-bhateja jatin-bhateja The push caused https://bugs.openjdk.java.net/browse/JDK-8271366 I am strongly suggest in a future to ask an Oracle's engineer to test Intel's changes before pushing.
@vnkozlov @PaulSandoz Sorry for the inconvenience. @jatin-bhateja Please don't be in a hurry to push and reach out to Oracle engineers for testing before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720
On Tue, 27 Jul 2021 18:31:20 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits:
- 8266054: Re-designing benchmark to remove noise. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054 - 8266054: Formal argument name change to be more appropriate. - 8266054: Review comments resolution. - 8266054: Incorporating styling changes based on reviews. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge http://github.com/openjdk/jdk into JDK-8266054 - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054 - ... and 9 more: https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
Looks good to me.
@sviswa7 and @jatin-bhateja jatin-bhateja The push caused https://bugs.openjdk.java.net/browse/JDK-8271366 I am strongly suggest in a future to ask an Oracle's engineer to test Intel's changes before pushing.
Yes, as discussed before please request that we perform internal tests before integrating e.g. CC me. Unfortunately the pre-commit PR tests don't cover all the tests cases and we don't yet have a way to expand that set. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720
participants (3)
-
Paul Sandoz
-
Sandhya Viswanathan
-
Vladimir Kozlov