RFR: 8271366: [REDO] JDK-8266054 VectorAPI rotate operation optimization
Jatin Bhateja
jbhateja at openjdk.java.net
Wed Jul 28 17:01:57 UTC 2021
**AVX512 Vector Rotate optimization:**
Currently Vector Rotates are inferred by C2 through auto-vectorization flow.
It handles following scenarios:-
1. Rotates with constant shift count, if target ISA support vector rotate instructions with constant shift operand.
2. Rotates with non-constant shift count, here shift value is first broadcasted such that each lane of shift vector specifies shift count for corresponding lane of vector to be rotated. Appropriate conversion IR node (I2R) is generated before broadcasting shift value for long vector rotation operations.
Existing vector API java side implementation handles vector rotates operation in terms of ShiftLeft, ShiftRight and logical Or operations. This patch moves this logic from Java side to C2 compiler which already has the infrastructure to dismantle the rotate operations for targets which do not support vector rotation instructions.
There are following two flavors of lanewise APIs which could be used for rotation.
1. API accepting vector shift count argument. Dismantling logic has been extended to cover this case.
2. API accepting scalar shift count argument, IR generated for this is similar to the IR generated by SLP. This allows leveraging common dismantling infrastructure for vector API use case.
Following performance data is collected using existing VectorAPI benchmarks.
Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts (ops/ms) | Gain
-- | -- | -- | -- | -- | -- | --
RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 0.973851372
RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 0.966757399
RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 1.030234907
RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 0.986665464
RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 0.967630525
RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 0.984453766
RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 0.978908964
RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 0.988670669
RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 0.986649996
RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 0.992415694
RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 0.975610495
RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 0.986262605
RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | 7777.105 | 1.003719792
RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 1.001646863
RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 0.999849188
RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 1.000843194
RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 0.998776954
RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 0.999041739
RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 3.329338501
RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 3.233829288
RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 3.246098286
RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 3.230315997
RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 3.171940969
RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 3.22095324
RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 3.34665393
RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 3.18429981
RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 3.210244272
RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 3.259019764
RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 3.352069988
RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 3.181869353
RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 2.625252123
RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 2.697217983
RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 2.72596813
RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 2.871118372
RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 2.624018047
RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 2.691895339
RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 3.364761291
RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 | 3.229688895
RotateBenchmark.testRotateLeftL | 128 | 15 | 256 | 1996.921 | 6985.256 | 3.498013191
RotateBenchmark.testRotateLeftL | 128 | 15 | 512 | 986.906 | 3217.778 | 3.260470602
RotateBenchmark.testRotateLeftL | 128 | 31 | 256 | 1999.06 | 6977.672 | 3.490476524
RotateBenchmark.testRotateLeftL | 128 | 31 | 512 | 987.258 | 3236.63 | 3.278403416
RotateBenchmark.testRotateLeftL | 256 | 7 | 256 | 3752.412 | 12995.954 | 3.4633601
RotateBenchmark.testRotateLeftL | 256 | 7 | 512 | 1824.093 | 5809.576 | 3.184912173
RotateBenchmark.testRotateLeftL | 256 | 15 | 256 | 3759.99 | 13262.631 | 3.52730486
RotateBenchmark.testRotateLeftL | 256 | 15 | 512 | 1823.393 | 5803.872 | 3.183006626
RotateBenchmark.testRotateLeftL | 256 | 31 | 256 | 3757.134 | 13284.633 | 3.535842214
RotateBenchmark.testRotateLeftL | 256 | 31 | 512 | 1822.192 | 5824.178 | 3.196248255
RotateBenchmark.testRotateLeftL | 512 | 7 | 256 | 5794.005 | 15567.753 | 2.686872552
RotateBenchmark.testRotateLeftL | 512 | 7 | 512 | 2969.393 | 7694.79 | 2.591368
RotateBenchmark.testRotateLeftL | 512 | 15 | 256 | 5817.292 | 15726.597 | 2.703422314
RotateBenchmark.testRotateLeftL | 512 | 15 | 512 | 2944.655 | 7664.954 | 2.603005785
RotateBenchmark.testRotateLeftL | 512 | 31 | 256 | 5822.131 | 16718.64 | 2.871567129
RotateBenchmark.testRotateLeftL | 512 | 31 | 512 | 2944.763 | 7657.814 | 2.600485676
RotateBenchmark.testRotateLeftS | 128 | 7 | 256 | 8006.155 | 7976.701 | 0.99632108
RotateBenchmark.testRotateLeftS | 128 | 7 | 512 | 4031.753 | 4003.43 | 0.992975016
RotateBenchmark.testRotateLeftS | 128 | 15 | 256 | 8003.879 | 7952.752 | 0.993612222
RotateBenchmark.testRotateLeftS | 128 | 15 | 512 | 4026.359 | 4014.757 | 0.997118488
RotateBenchmark.testRotateLeftS | 128 | 31 | 256 | 8000.842 | 7995.733 | 0.999361442
RotateBenchmark.testRotateLeftS | 128 | 31 | 512 | 4044.421 | 4007.426 | 0.990852832
RotateBenchmark.testRotateLeftS | 256 | 7 | 256 | 15078.471 | 15034.395 | 0.997076892
RotateBenchmark.testRotateLeftS | 256 | 7 | 512 | 7236.509 | 7620.334 | 1.053040078
RotateBenchmark.testRotateLeftS | 256 | 15 | 256 | 15093.661 | 15024.17 | 0.995396014
RotateBenchmark.testRotateLeftS | 256 | 15 | 512 | 7308.568 | 7724.381 | 1.056893909
RotateBenchmark.testRotateLeftS | 256 | 31 | 256 | 15332.233 | 15432.113 | 1.006514381
RotateBenchmark.testRotateLeftS | 256 | 31 | 512 | 7317.18 | 7626.679 | 1.042297579
RotateBenchmark.testRotateLeftS | 512 | 7 | 256 | 24079.012 | 23939.263 | 0.994196232
RotateBenchmark.testRotateLeftS | 512 | 7 | 512 | 11441.41 | 11921.21 | 1.041935391
RotateBenchmark.testRotateLeftS | 512 | 15 | 256 | 23563.675 | 23590.959 | 1.001157884
RotateBenchmark.testRotateLeftS | 512 | 15 | 512 | 11418.634 | 11949.391 | 1.046481654
RotateBenchmark.testRotateLeftS | 512 | 31 | 256 | 24035.69 | 23595.385 | 0.9816812
RotateBenchmark.testRotateLeftS | 512 | 31 | 512 | 11668.091 | 11899.536 | 1.019835721
RotateBenchmark.testRotateRightB | 128 | 7 | 256 | 3852.421 | 3816.521 | 0.990681185
RotateBenchmark.testRotateRightB | 128 | 7 | 512 | 1956.766 | 1923.638 | 0.983070025
RotateBenchmark.testRotateRightB | 128 | 15 | 256 | 3899.136 | 4038.945 | 1.035856405
RotateBenchmark.testRotateRightB | 128 | 15 | 512 | 1957.733 | 2030.973 | 1.037410617
RotateBenchmark.testRotateRightB | 128 | 31 | 256 | 3902.5 | 4043.736 | 1.03619116
RotateBenchmark.testRotateRightB | 128 | 31 | 512 | 1957.728 | 1920.434 | 0.980950367
RotateBenchmark.testRotateRightB | 256 | 7 | 256 | 4565.887 | 4515.083 | 0.988873137
RotateBenchmark.testRotateRightB | 256 | 7 | 512 | 2300.057 | 2278.065 | 0.990438498
RotateBenchmark.testRotateRightB | 256 | 15 | 256 | 4570.754 | 4527.692 | 0.990578797
RotateBenchmark.testRotateRightB | 256 | 15 | 512 | 2300.524 | 2268.659 | 0.986148808
RotateBenchmark.testRotateRightB | 256 | 31 | 256 | 4577.569 | 4513.29 | 0.98595783
RotateBenchmark.testRotateRightB | 256 | 31 | 512 | 2304.335 | 2273.178 | 0.986478962
RotateBenchmark.testRotateRightB | 512 | 7 | 256 | 7772.483 | 7842.671 | 1.009030319
RotateBenchmark.testRotateRightB | 512 | 7 | 512 | 3907.265 | 3917.325 | 1.002574691
RotateBenchmark.testRotateRightB | 512 | 15 | 256 | 7855.653 | 7865.25 | 1.001221668
RotateBenchmark.testRotateRightB | 512 | 15 | 512 | 3909.845 | 3976.813 | 1.017128045
RotateBenchmark.testRotateRightB | 512 | 31 | 256 | 7746.765 | 7870.159 | 1.015928455
RotateBenchmark.testRotateRightB | 512 | 31 | 512 | 3919.596 | 3981.934 | 1.01590419
RotateBenchmark.testRotateRightI | 128 | 7 | 256 | 4125.151 | 13056.878 | 3.165187893
RotateBenchmark.testRotateRightI | 128 | 7 | 512 | 2045.201 | 6501.447 | 3.17887924
RotateBenchmark.testRotateRightI | 128 | 15 | 256 | 4111.736 | 13318.124 | 3.23905134
RotateBenchmark.testRotateRightI | 128 | 15 | 512 | 2055.355 | 6497.289 | 3.161151723
RotateBenchmark.testRotateRightI | 128 | 31 | 256 | 4109.353 | 13073.3 | 3.181352393
RotateBenchmark.testRotateRightI | 128 | 31 | 512 | 2055.431 | 6463.902 | 3.14479153
RotateBenchmark.testRotateRightI | 256 | 7 | 256 | 7804.976 | 24585.962 | 3.150036848
RotateBenchmark.testRotateRightI | 256 | 7 | 512 | 3815.818 | 11985.145 | 3.140911071
RotateBenchmark.testRotateRightI | 256 | 15 | 256 | 7644.977 | 25863.841 | 3.383115606
RotateBenchmark.testRotateRightI | 256 | 15 | 512 | 3822.508 | 12280.58 | 3.212702236
RotateBenchmark.testRotateRightI | 256 | 31 | 256 | 7709.635 | 25655.108 | 3.327668301
RotateBenchmark.testRotateRightI | 256 | 31 | 512 | 3801.5 | 12271.65 | 3.228107326
RotateBenchmark.testRotateRightI | 512 | 7 | 256 | 12223.711 | 31239.788 | 2.555671351
RotateBenchmark.testRotateRightI | 512 | 7 | 512 | 5973.571 | 16740.852 | 2.802486486
RotateBenchmark.testRotateRightI | 512 | 15 | 256 | 12205.47 | 31248.025 | 2.560165647
RotateBenchmark.testRotateRightI | 512 | 15 | 512 | 5966.513 | 15728.168 | 2.6360737
RotateBenchmark.testRotateRightI | 512 | 31 | 256 | 12209.405 | 33181.105 | 2.71766765
RotateBenchmark.testRotateRightI | 512 | 31 | 512 | 5981.527 | 15727.496 | 2.629344647
RotateBenchmark.testRotateRightL | 128 | 7 | 256 | 2054.509 | 6980.849 | 3.397818652
RotateBenchmark.testRotateRightL | 128 | 7 | 512 | 997.375 | 3242.374 | 3.250907633
RotateBenchmark.testRotateRightL | 128 | 15 | 256 | 2051.459 | 6892.389 | 3.359749817
RotateBenchmark.testRotateRightL | 128 | 15 | 512 | 1002.906 | 3223.342 | 3.21400211
RotateBenchmark.testRotateRightL | 128 | 31 | 256 | 2044.749 | 6984.157 | 3.415654929
RotateBenchmark.testRotateRightL | 128 | 31 | 512 | 1004.273 | 3237.496 | 3.22372104
RotateBenchmark.testRotateRightL | 256 | 7 | 256 | 3811.551 | 13347.75 | 3.501920872
RotateBenchmark.testRotateRightL | 256 | 7 | 512 | 1892.883 | 5840.85 | 3.085689924
RotateBenchmark.testRotateRightL | 256 | 15 | 256 | 3821.705 | 14034.823 | 3.672398314
RotateBenchmark.testRotateRightL | 256 | 15 | 512 | 1799.193 | 5817.533 | 3.233412424
RotateBenchmark.testRotateRightL | 256 | 31 | 256 | 3816.666 | 14022.31 | 3.673968327
RotateBenchmark.testRotateRightL | 256 | 31 | 512 | 1796.649 | 5824.13 | 3.241662673
RotateBenchmark.testRotateRightL | 512 | 7 | 256 | 5943.986 | 15586.254 | 2.622188881
RotateBenchmark.testRotateRightL | 512 | 7 | 512 | 3022.686 | 7662.241 | 2.534911334
RotateBenchmark.testRotateRightL | 512 | 15 | 256 | 5958.008 | 15726.859 | 2.639616966
RotateBenchmark.testRotateRightL | 512 | 15 | 512 | 2998.469 | 7654.703 | 2.552870482
RotateBenchmark.testRotateRightL | 512 | 31 | 256 | 5937.491 | 15741.207 | 2.651154671
RotateBenchmark.testRotateRightL | 512 | 31 | 512 | 3014.699 | 7656.837 | 2.539834657
RotateBenchmark.testRotateRightS | 128 | 7 | 256 | 8172.896 | 8003.474 | 0.979270261
RotateBenchmark.testRotateRightS | 128 | 7 | 512 | 4111.074 | 4047.267 | 0.984479238
RotateBenchmark.testRotateRightS | 128 | 15 | 256 | 8225.79 | 8040.421 | 0.9774649
RotateBenchmark.testRotateRightS | 128 | 15 | 512 | 4129.801 | 4011.919 | 0.971455767
RotateBenchmark.testRotateRightS | 128 | 31 | 256 | 8176.102 | 8052.686 | 0.984905276
RotateBenchmark.testRotateRightS | 128 | 31 | 512 | 4117.735 | 4046.522 | 0.982705784
RotateBenchmark.testRotateRightS | 256 | 7 | 256 | 15213.617 | 15169.51 | 0.997100821
RotateBenchmark.testRotateRightS | 256 | 7 | 512 | 7530.289 | 7625.581 | 1.012654494
RotateBenchmark.testRotateRightS | 256 | 15 | 256 | 15238.384 | 15069.978 | 0.988948566
RotateBenchmark.testRotateRightS | 256 | 15 | 512 | 7275.098 | 7620.764 | 1.047513587
RotateBenchmark.testRotateRightS | 256 | 31 | 256 | 15299.821 | 15043.765 | 0.983264118
RotateBenchmark.testRotateRightS | 256 | 31 | 512 | 7273.028 | 7630.97 | 1.04921499
RotateBenchmark.testRotateRightS | 512 | 7 | 256 | 23998.152 | 23920.046 | 0.996745333
RotateBenchmark.testRotateRightS | 512 | 7 | 512 | 11582.679 | 11916.382 | 1.02881052
RotateBenchmark.testRotateRightS | 512 | 15 | 256 | 23982.797 | 23434.756 | 0.977148579
RotateBenchmark.testRotateRightS | 512 | 15 | 512 | 11629.806 | 11918.759 | 1.0248459
RotateBenchmark.testRotateRightS | 512 | 31 | 256 | 23988.549 | 23475.629 | 0.978618132
RotateBenchmark.testRotateRightS | 512 | 31 | 512 | 11650.146 | 11916.47 | 1.022860143
PS: This is a follow-up patch for JDK-8266054 which was backed out due to tier2 regressions.
-------------
Commit messages:
- 8271366: [REDO] JDK-8266054 VectorAPI rotate operation optimization
Changes: https://git.openjdk.java.net/jdk/pull/4924/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4924&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8271366
Stats: 4439 lines in 57 files changed: 4221 ins; 58 del; 160 mod
Patch: https://git.openjdk.java.net/jdk/pull/4924.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/4924/head:pull/4924
PR: https://git.openjdk.java.net/jdk/pull/4924
More information about the hotspot-compiler-dev
mailing list