RFR: 8271366: [REDO] JDK-8266054 VectorAPI rotate operation optimization
Jatin Bhateja
jbhateja at openjdk.java.net
Fri Jul 30 18:56:25 UTC 2021
On Wed, 28 Jul 2021 18:42:00 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> **AVX512 Vector Rotate optimization:**
>>
>> Currently Vector Rotates are inferred by C2 through auto-vectorization flow.
>> It handles following scenarios:-
>> 1. Rotates with constant shift count, if target ISA support vector rotate instructions with constant shift operand.
>> 2. Rotates with non-constant shift count, here shift value is first broadcasted such that each lane of shift vector specifies shift count for corresponding lane of vector to be rotated. Appropriate conversion IR node (I2R) is generated before broadcasting shift value for long vector rotation operations.
>>
>> Existing vector API java side implementation handles vector rotates operation in terms of ShiftLeft, ShiftRight and logical Or operations. This patch moves this logic from Java side to C2 compiler which already has the infrastructure to dismantle the rotate operations for targets which do not support vector rotation instructions.
>>
>> There are following two flavors of lanewise APIs which could be used for rotation.
>> 1. API accepting vector shift count argument. Dismantling logic has been extended to cover this case.
>> 2. API accepting scalar shift count argument, IR generated for this is similar to the IR generated by SLP. This allows leveraging common dismantling infrastructure for vector API use case.
>>
>> Following performance data is collected using existing VectorAPI benchmarks.
>> Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>>
>> Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts (ops/ms) | Gain
>> -- | -- | -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 0.973851372
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 0.966757399
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 1.030234907
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 0.986665464
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 0.967630525
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 0.984453766
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 0.978908964
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 0.988670669
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 0.986649996
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 0.992415694
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 0.975610495
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 0.986262605
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | 7777.105 | 1.003719792
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 1.001646863
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 0.999849188
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 1.000843194
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 0.998776954
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 0.999041739
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 3.329338501
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 3.233829288
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 3.246098286
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 3.230315997
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 3.171940969
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 3.22095324
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 3.34665393
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 3.18429981
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 3.210244272
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 3.259019764
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 3.352069988
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 3.181869353
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 2.625252123
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 2.697217983
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 2.72596813
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 2.871118372
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 2.624018047
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 2.691895339
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 3.364761291
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 | 3.229688895
>> RotateBenchmark.testRotateLeftL | 128 | 15 | 256 | 1996.921 | 6985.256 | 3.498013191
>> RotateBenchmark.testRotateLeftL | 128 | 15 | 512 | 986.906 | 3217.778 | 3.260470602
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 256 | 1999.06 | 6977.672 | 3.490476524
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 512 | 987.258 | 3236.63 | 3.278403416
>> RotateBenchmark.testRotateLeftL | 256 | 7 | 256 | 3752.412 | 12995.954 | 3.4633601
>> RotateBenchmark.testRotateLeftL | 256 | 7 | 512 | 1824.093 | 5809.576 | 3.184912173
>> RotateBenchmark.testRotateLeftL | 256 | 15 | 256 | 3759.99 | 13262.631 | 3.52730486
>> RotateBenchmark.testRotateLeftL | 256 | 15 | 512 | 1823.393 | 5803.872 | 3.183006626
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 256 | 3757.134 | 13284.633 | 3.535842214
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 512 | 1822.192 | 5824.178 | 3.196248255
>> RotateBenchmark.testRotateLeftL | 512 | 7 | 256 | 5794.005 | 15567.753 | 2.686872552
>> RotateBenchmark.testRotateLeftL | 512 | 7 | 512 | 2969.393 | 7694.79 | 2.591368
>> RotateBenchmark.testRotateLeftL | 512 | 15 | 256 | 5817.292 | 15726.597 | 2.703422314
>> RotateBenchmark.testRotateLeftL | 512 | 15 | 512 | 2944.655 | 7664.954 | 2.603005785
>> RotateBenchmark.testRotateLeftL | 512 | 31 | 256 | 5822.131 | 16718.64 | 2.871567129
>> RotateBenchmark.testRotateLeftL | 512 | 31 | 512 | 2944.763 | 7657.814 | 2.600485676
>> RotateBenchmark.testRotateLeftS | 128 | 7 | 256 | 8006.155 | 7976.701 | 0.99632108
>> RotateBenchmark.testRotateLeftS | 128 | 7 | 512 | 4031.753 | 4003.43 | 0.992975016
>> RotateBenchmark.testRotateLeftS | 128 | 15 | 256 | 8003.879 | 7952.752 | 0.993612222
>> RotateBenchmark.testRotateLeftS | 128 | 15 | 512 | 4026.359 | 4014.757 | 0.997118488
>> RotateBenchmark.testRotateLeftS | 128 | 31 | 256 | 8000.842 | 7995.733 | 0.999361442
>> RotateBenchmark.testRotateLeftS | 128 | 31 | 512 | 4044.421 | 4007.426 | 0.990852832
>> RotateBenchmark.testRotateLeftS | 256 | 7 | 256 | 15078.471 | 15034.395 | 0.997076892
>> RotateBenchmark.testRotateLeftS | 256 | 7 | 512 | 7236.509 | 7620.334 | 1.053040078
>> RotateBenchmark.testRotateLeftS | 256 | 15 | 256 | 15093.661 | 15024.17 | 0.995396014
>> RotateBenchmark.testRotateLeftS | 256 | 15 | 512 | 7308.568 | 7724.381 | 1.056893909
>> RotateBenchmark.testRotateLeftS | 256 | 31 | 256 | 15332.233 | 15432.113 | 1.006514381
>> RotateBenchmark.testRotateLeftS | 256 | 31 | 512 | 7317.18 | 7626.679 | 1.042297579
>> RotateBenchmark.testRotateLeftS | 512 | 7 | 256 | 24079.012 | 23939.263 | 0.994196232
>> RotateBenchmark.testRotateLeftS | 512 | 7 | 512 | 11441.41 | 11921.21 | 1.041935391
>> RotateBenchmark.testRotateLeftS | 512 | 15 | 256 | 23563.675 | 23590.959 | 1.001157884
>> RotateBenchmark.testRotateLeftS | 512 | 15 | 512 | 11418.634 | 11949.391 | 1.046481654
>> RotateBenchmark.testRotateLeftS | 512 | 31 | 256 | 24035.69 | 23595.385 | 0.9816812
>> RotateBenchmark.testRotateLeftS | 512 | 31 | 512 | 11668.091 | 11899.536 | 1.019835721
>> RotateBenchmark.testRotateRightB | 128 | 7 | 256 | 3852.421 | 3816.521 | 0.990681185
>> RotateBenchmark.testRotateRightB | 128 | 7 | 512 | 1956.766 | 1923.638 | 0.983070025
>> RotateBenchmark.testRotateRightB | 128 | 15 | 256 | 3899.136 | 4038.945 | 1.035856405
>> RotateBenchmark.testRotateRightB | 128 | 15 | 512 | 1957.733 | 2030.973 | 1.037410617
>> RotateBenchmark.testRotateRightB | 128 | 31 | 256 | 3902.5 | 4043.736 | 1.03619116
>> RotateBenchmark.testRotateRightB | 128 | 31 | 512 | 1957.728 | 1920.434 | 0.980950367
>> RotateBenchmark.testRotateRightB | 256 | 7 | 256 | 4565.887 | 4515.083 | 0.988873137
>> RotateBenchmark.testRotateRightB | 256 | 7 | 512 | 2300.057 | 2278.065 | 0.990438498
>> RotateBenchmark.testRotateRightB | 256 | 15 | 256 | 4570.754 | 4527.692 | 0.990578797
>> RotateBenchmark.testRotateRightB | 256 | 15 | 512 | 2300.524 | 2268.659 | 0.986148808
>> RotateBenchmark.testRotateRightB | 256 | 31 | 256 | 4577.569 | 4513.29 | 0.98595783
>> RotateBenchmark.testRotateRightB | 256 | 31 | 512 | 2304.335 | 2273.178 | 0.986478962
>> RotateBenchmark.testRotateRightB | 512 | 7 | 256 | 7772.483 | 7842.671 | 1.009030319
>> RotateBenchmark.testRotateRightB | 512 | 7 | 512 | 3907.265 | 3917.325 | 1.002574691
>> RotateBenchmark.testRotateRightB | 512 | 15 | 256 | 7855.653 | 7865.25 | 1.001221668
>> RotateBenchmark.testRotateRightB | 512 | 15 | 512 | 3909.845 | 3976.813 | 1.017128045
>> RotateBenchmark.testRotateRightB | 512 | 31 | 256 | 7746.765 | 7870.159 | 1.015928455
>> RotateBenchmark.testRotateRightB | 512 | 31 | 512 | 3919.596 | 3981.934 | 1.01590419
>> RotateBenchmark.testRotateRightI | 128 | 7 | 256 | 4125.151 | 13056.878 | 3.165187893
>> RotateBenchmark.testRotateRightI | 128 | 7 | 512 | 2045.201 | 6501.447 | 3.17887924
>> RotateBenchmark.testRotateRightI | 128 | 15 | 256 | 4111.736 | 13318.124 | 3.23905134
>> RotateBenchmark.testRotateRightI | 128 | 15 | 512 | 2055.355 | 6497.289 | 3.161151723
>> RotateBenchmark.testRotateRightI | 128 | 31 | 256 | 4109.353 | 13073.3 | 3.181352393
>> RotateBenchmark.testRotateRightI | 128 | 31 | 512 | 2055.431 | 6463.902 | 3.14479153
>> RotateBenchmark.testRotateRightI | 256 | 7 | 256 | 7804.976 | 24585.962 | 3.150036848
>> RotateBenchmark.testRotateRightI | 256 | 7 | 512 | 3815.818 | 11985.145 | 3.140911071
>> RotateBenchmark.testRotateRightI | 256 | 15 | 256 | 7644.977 | 25863.841 | 3.383115606
>> RotateBenchmark.testRotateRightI | 256 | 15 | 512 | 3822.508 | 12280.58 | 3.212702236
>> RotateBenchmark.testRotateRightI | 256 | 31 | 256 | 7709.635 | 25655.108 | 3.327668301
>> RotateBenchmark.testRotateRightI | 256 | 31 | 512 | 3801.5 | 12271.65 | 3.228107326
>> RotateBenchmark.testRotateRightI | 512 | 7 | 256 | 12223.711 | 31239.788 | 2.555671351
>> RotateBenchmark.testRotateRightI | 512 | 7 | 512 | 5973.571 | 16740.852 | 2.802486486
>> RotateBenchmark.testRotateRightI | 512 | 15 | 256 | 12205.47 | 31248.025 | 2.560165647
>> RotateBenchmark.testRotateRightI | 512 | 15 | 512 | 5966.513 | 15728.168 | 2.6360737
>> RotateBenchmark.testRotateRightI | 512 | 31 | 256 | 12209.405 | 33181.105 | 2.71766765
>> RotateBenchmark.testRotateRightI | 512 | 31 | 512 | 5981.527 | 15727.496 | 2.629344647
>> RotateBenchmark.testRotateRightL | 128 | 7 | 256 | 2054.509 | 6980.849 | 3.397818652
>> RotateBenchmark.testRotateRightL | 128 | 7 | 512 | 997.375 | 3242.374 | 3.250907633
>> RotateBenchmark.testRotateRightL | 128 | 15 | 256 | 2051.459 | 6892.389 | 3.359749817
>> RotateBenchmark.testRotateRightL | 128 | 15 | 512 | 1002.906 | 3223.342 | 3.21400211
>> RotateBenchmark.testRotateRightL | 128 | 31 | 256 | 2044.749 | 6984.157 | 3.415654929
>> RotateBenchmark.testRotateRightL | 128 | 31 | 512 | 1004.273 | 3237.496 | 3.22372104
>> RotateBenchmark.testRotateRightL | 256 | 7 | 256 | 3811.551 | 13347.75 | 3.501920872
>> RotateBenchmark.testRotateRightL | 256 | 7 | 512 | 1892.883 | 5840.85 | 3.085689924
>> RotateBenchmark.testRotateRightL | 256 | 15 | 256 | 3821.705 | 14034.823 | 3.672398314
>> RotateBenchmark.testRotateRightL | 256 | 15 | 512 | 1799.193 | 5817.533 | 3.233412424
>> RotateBenchmark.testRotateRightL | 256 | 31 | 256 | 3816.666 | 14022.31 | 3.673968327
>> RotateBenchmark.testRotateRightL | 256 | 31 | 512 | 1796.649 | 5824.13 | 3.241662673
>> RotateBenchmark.testRotateRightL | 512 | 7 | 256 | 5943.986 | 15586.254 | 2.622188881
>> RotateBenchmark.testRotateRightL | 512 | 7 | 512 | 3022.686 | 7662.241 | 2.534911334
>> RotateBenchmark.testRotateRightL | 512 | 15 | 256 | 5958.008 | 15726.859 | 2.639616966
>> RotateBenchmark.testRotateRightL | 512 | 15 | 512 | 2998.469 | 7654.703 | 2.552870482
>> RotateBenchmark.testRotateRightL | 512 | 31 | 256 | 5937.491 | 15741.207 | 2.651154671
>> RotateBenchmark.testRotateRightL | 512 | 31 | 512 | 3014.699 | 7656.837 | 2.539834657
>> RotateBenchmark.testRotateRightS | 128 | 7 | 256 | 8172.896 | 8003.474 | 0.979270261
>> RotateBenchmark.testRotateRightS | 128 | 7 | 512 | 4111.074 | 4047.267 | 0.984479238
>> RotateBenchmark.testRotateRightS | 128 | 15 | 256 | 8225.79 | 8040.421 | 0.9774649
>> RotateBenchmark.testRotateRightS | 128 | 15 | 512 | 4129.801 | 4011.919 | 0.971455767
>> RotateBenchmark.testRotateRightS | 128 | 31 | 256 | 8176.102 | 8052.686 | 0.984905276
>> RotateBenchmark.testRotateRightS | 128 | 31 | 512 | 4117.735 | 4046.522 | 0.982705784
>> RotateBenchmark.testRotateRightS | 256 | 7 | 256 | 15213.617 | 15169.51 | 0.997100821
>> RotateBenchmark.testRotateRightS | 256 | 7 | 512 | 7530.289 | 7625.581 | 1.012654494
>> RotateBenchmark.testRotateRightS | 256 | 15 | 256 | 15238.384 | 15069.978 | 0.988948566
>> RotateBenchmark.testRotateRightS | 256 | 15 | 512 | 7275.098 | 7620.764 | 1.047513587
>> RotateBenchmark.testRotateRightS | 256 | 31 | 256 | 15299.821 | 15043.765 | 0.983264118
>> RotateBenchmark.testRotateRightS | 256 | 31 | 512 | 7273.028 | 7630.97 | 1.04921499
>> RotateBenchmark.testRotateRightS | 512 | 7 | 256 | 23998.152 | 23920.046 | 0.996745333
>> RotateBenchmark.testRotateRightS | 512 | 7 | 512 | 11582.679 | 11916.382 | 1.02881052
>> RotateBenchmark.testRotateRightS | 512 | 15 | 256 | 23982.797 | 23434.756 | 0.977148579
>> RotateBenchmark.testRotateRightS | 512 | 15 | 512 | 11629.806 | 11918.759 | 1.0248459
>> RotateBenchmark.testRotateRightS | 512 | 31 | 256 | 23988.549 | 23475.629 | 0.978618132
>> RotateBenchmark.testRotateRightS | 512 | 31 | 512 | 11650.146 | 11916.47 | 1.022860143
>>
>> PS: This is a follow-up patch for JDK-8266054 which was backed out due to tier2 regressions.
>
> `compiler/c2/cr6340864/TestLongVectRotate.java` still fail on Aarch64 with the same assert.
Hi @vnkozlov , @PaulSandoz
Can you kindly run this through your test infra and share the results.
Also reported test case only handles the scalar (constant / non-constant ) rotate shifts.
**test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVectRotate.java**
Following case will result into a crash on JDK-17
for(int i ; i < LEN ; i++) {
a[i] = Integer.RotateLeft(b[i], c[i]);
}
Where a, b and c all are integral arrays.
Current patch fixes this issue in entirety.
I have restrictive fix for JDK-17 where we can prevent SLP from vectorizing rotate operation if target platform does not support variable shifts. Following will prevent this bug with minimal set of changes since JDK-17 is near RC1.
[https://github.com/openjdk/jdk/pull/4924/files#diff-692826251cae892bc4737919579c6afbd317551cd507f99c7bd29d585c1282e2R312](https://github.com/openjdk/jdk/pull/4924/files#diff-692826251cae892bc4737919579c6afbd317551cd507f99c7bd29d585c1282e2R312)
Kindly let me know your thoughts on this.
Thanks
-------------
PR: https://git.openjdk.java.net/jdk/pull/4924
More information about the hotspot-compiler-dev
mailing list