RFR: 8266054: VectorAPI rotate operation optimization [v13]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Tue Jul 27 04:07:37 UTC 2021


On Tue, 20 Jul 2021 09:57:07 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
>> 
>>     vec1 = lanewise(VectorOperators.LSHL, n)
>>     vec2 = lanewise(VectorOperators.LSHR, n)
>>     res = lanewise(VectorOperations.OR, vec1 , vec2)
>> 
>> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
>> 
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations )   instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>> 
>> Please find below the performance data for included JMH benchmark.
>> Machine:  Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>> 
>> 
>> <html xmlns:v="urn:schemas-microsoft-com:vml"
>> xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40">
>> 
>> <head>
>> 
>> <meta name=ProgId content=Excel.Sheet>
>> <meta name=Generator content="Microsoft Excel 15">
>> <link id=Main-File rel=Main-File
>> href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> <link rel=File-List
>> href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> <style>
>> 
>> </style>
>> </head>
>> 
>> <body link="#0563C1" vlink="#954F72">
>> 
>> 
>> 
>> Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts (ops/ms) | Gain
>> -- | -- | -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 0.973851372
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 0.966757399
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 1.030234907
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 0.986665464
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 0.967630525
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 0.984453766
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 0.978908964
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 0.988670669
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 0.986649996
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 0.992415694
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 0.975610495
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 0.986262605
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | 7777.105 | 1.003719792
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 1.001646863
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 0.999849188
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 1.000843194
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 0.998776954
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 0.999041739
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 3.329338501
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 3.233829288
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 3.246098286
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 3.230315997
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 3.171940969
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 3.22095324
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 3.34665393
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 3.18429981
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 3.210244272
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 3.259019764
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 3.352069988
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 3.181869353
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 2.625252123
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 2.697217983
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 2.72596813
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 2.871118372
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 2.624018047
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 2.691895339
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 3.364761291
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 | 3.229688895
>> RotateBenchmark.testRotateLeftL | 128 | 15 | 256 | 1996.921 | 6985.256 | 3.498013191
>> RotateBenchmark.testRotateLeftL | 128 | 15 | 512 | 986.906 | 3217.778 | 3.260470602
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 256 | 1999.06 | 6977.672 | 3.490476524
>> RotateBenchmark.testRotateLeftL | 128 | 31 | 512 | 987.258 | 3236.63 | 3.278403416
>> RotateBenchmark.testRotateLeftL | 256 | 7 | 256 | 3752.412 | 12995.954 | 3.4633601
>> RotateBenchmark.testRotateLeftL | 256 | 7 | 512 | 1824.093 | 5809.576 | 3.184912173
>> RotateBenchmark.testRotateLeftL | 256 | 15 | 256 | 3759.99 | 13262.631 | 3.52730486
>> RotateBenchmark.testRotateLeftL | 256 | 15 | 512 | 1823.393 | 5803.872 | 3.183006626
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 256 | 3757.134 | 13284.633 | 3.535842214
>> RotateBenchmark.testRotateLeftL | 256 | 31 | 512 | 1822.192 | 5824.178 | 3.196248255
>> RotateBenchmark.testRotateLeftL | 512 | 7 | 256 | 5794.005 | 15567.753 | 2.686872552
>> RotateBenchmark.testRotateLeftL | 512 | 7 | 512 | 2969.393 | 7694.79 | 2.591368
>> RotateBenchmark.testRotateLeftL | 512 | 15 | 256 | 5817.292 | 15726.597 | 2.703422314
>> RotateBenchmark.testRotateLeftL | 512 | 15 | 512 | 2944.655 | 7664.954 | 2.603005785
>> RotateBenchmark.testRotateLeftL | 512 | 31 | 256 | 5822.131 | 16718.64 | 2.871567129
>> RotateBenchmark.testRotateLeftL | 512 | 31 | 512 | 2944.763 | 7657.814 | 2.600485676
>> RotateBenchmark.testRotateLeftS | 128 | 7 | 256 | 8006.155 | 7976.701 | 0.99632108
>> RotateBenchmark.testRotateLeftS | 128 | 7 | 512 | 4031.753 | 4003.43 | 0.992975016
>> RotateBenchmark.testRotateLeftS | 128 | 15 | 256 | 8003.879 | 7952.752 | 0.993612222
>> RotateBenchmark.testRotateLeftS | 128 | 15 | 512 | 4026.359 | 4014.757 | 0.997118488
>> RotateBenchmark.testRotateLeftS | 128 | 31 | 256 | 8000.842 | 7995.733 | 0.999361442
>> RotateBenchmark.testRotateLeftS | 128 | 31 | 512 | 4044.421 | 4007.426 | 0.990852832
>> RotateBenchmark.testRotateLeftS | 256 | 7 | 256 | 15078.471 | 15034.395 | 0.997076892
>> RotateBenchmark.testRotateLeftS | 256 | 7 | 512 | 7236.509 | 7620.334 | 1.053040078
>> RotateBenchmark.testRotateLeftS | 256 | 15 | 256 | 15093.661 | 15024.17 | 0.995396014
>> RotateBenchmark.testRotateLeftS | 256 | 15 | 512 | 7308.568 | 7724.381 | 1.056893909
>> RotateBenchmark.testRotateLeftS | 256 | 31 | 256 | 15332.233 | 15432.113 | 1.006514381
>> RotateBenchmark.testRotateLeftS | 256 | 31 | 512 | 7317.18 | 7626.679 | 1.042297579
>> RotateBenchmark.testRotateLeftS | 512 | 7 | 256 | 24079.012 | 23939.263 | 0.994196232
>> RotateBenchmark.testRotateLeftS | 512 | 7 | 512 | 11441.41 | 11921.21 | 1.041935391
>> RotateBenchmark.testRotateLeftS | 512 | 15 | 256 | 23563.675 | 23590.959 | 1.001157884
>> RotateBenchmark.testRotateLeftS | 512 | 15 | 512 | 11418.634 | 11949.391 | 1.046481654
>> RotateBenchmark.testRotateLeftS | 512 | 31 | 256 | 24035.69 | 23595.385 | 0.9816812
>> RotateBenchmark.testRotateLeftS | 512 | 31 | 512 | 11668.091 | 11899.536 | 1.019835721
>> RotateBenchmark.testRotateRightB | 128 | 7 | 256 | 3852.421 | 3816.521 | 0.990681185
>> RotateBenchmark.testRotateRightB | 128 | 7 | 512 | 1956.766 | 1923.638 | 0.983070025
>> RotateBenchmark.testRotateRightB | 128 | 15 | 256 | 3899.136 | 4038.945 | 1.035856405
>> RotateBenchmark.testRotateRightB | 128 | 15 | 512 | 1957.733 | 2030.973 | 1.037410617
>> RotateBenchmark.testRotateRightB | 128 | 31 | 256 | 3902.5 | 4043.736 | 1.03619116
>> RotateBenchmark.testRotateRightB | 128 | 31 | 512 | 1957.728 | 1920.434 | 0.980950367
>> RotateBenchmark.testRotateRightB | 256 | 7 | 256 | 4565.887 | 4515.083 | 0.988873137
>> RotateBenchmark.testRotateRightB | 256 | 7 | 512 | 2300.057 | 2278.065 | 0.990438498
>> RotateBenchmark.testRotateRightB | 256 | 15 | 256 | 4570.754 | 4527.692 | 0.990578797
>> RotateBenchmark.testRotateRightB | 256 | 15 | 512 | 2300.524 | 2268.659 | 0.986148808
>> RotateBenchmark.testRotateRightB | 256 | 31 | 256 | 4577.569 | 4513.29 | 0.98595783
>> RotateBenchmark.testRotateRightB | 256 | 31 | 512 | 2304.335 | 2273.178 | 0.986478962
>> RotateBenchmark.testRotateRightB | 512 | 7 | 256 | 7772.483 | 7842.671 | 1.009030319
>> RotateBenchmark.testRotateRightB | 512 | 7 | 512 | 3907.265 | 3917.325 | 1.002574691
>> RotateBenchmark.testRotateRightB | 512 | 15 | 256 | 7855.653 | 7865.25 | 1.001221668
>> RotateBenchmark.testRotateRightB | 512 | 15 | 512 | 3909.845 | 3976.813 | 1.017128045
>> RotateBenchmark.testRotateRightB | 512 | 31 | 256 | 7746.765 | 7870.159 | 1.015928455
>> RotateBenchmark.testRotateRightB | 512 | 31 | 512 | 3919.596 | 3981.934 | 1.01590419
>> RotateBenchmark.testRotateRightI | 128 | 7 | 256 | 4125.151 | 13056.878 | 3.165187893
>> RotateBenchmark.testRotateRightI | 128 | 7 | 512 | 2045.201 | 6501.447 | 3.17887924
>> RotateBenchmark.testRotateRightI | 128 | 15 | 256 | 4111.736 | 13318.124 | 3.23905134
>> RotateBenchmark.testRotateRightI | 128 | 15 | 512 | 2055.355 | 6497.289 | 3.161151723
>> RotateBenchmark.testRotateRightI | 128 | 31 | 256 | 4109.353 | 13073.3 | 3.181352393
>> RotateBenchmark.testRotateRightI | 128 | 31 | 512 | 2055.431 | 6463.902 | 3.14479153
>> RotateBenchmark.testRotateRightI | 256 | 7 | 256 | 7804.976 | 24585.962 | 3.150036848
>> RotateBenchmark.testRotateRightI | 256 | 7 | 512 | 3815.818 | 11985.145 | 3.140911071
>> RotateBenchmark.testRotateRightI | 256 | 15 | 256 | 7644.977 | 25863.841 | 3.383115606
>> RotateBenchmark.testRotateRightI | 256 | 15 | 512 | 3822.508 | 12280.58 | 3.212702236
>> RotateBenchmark.testRotateRightI | 256 | 31 | 256 | 7709.635 | 25655.108 | 3.327668301
>> RotateBenchmark.testRotateRightI | 256 | 31 | 512 | 3801.5 | 12271.65 | 3.228107326
>> RotateBenchmark.testRotateRightI | 512 | 7 | 256 | 12223.711 | 31239.788 | 2.555671351
>> RotateBenchmark.testRotateRightI | 512 | 7 | 512 | 5973.571 | 16740.852 | 2.802486486
>> RotateBenchmark.testRotateRightI | 512 | 15 | 256 | 12205.47 | 31248.025 | 2.560165647
>> RotateBenchmark.testRotateRightI | 512 | 15 | 512 | 5966.513 | 15728.168 | 2.6360737
>> RotateBenchmark.testRotateRightI | 512 | 31 | 256 | 12209.405 | 33181.105 | 2.71766765
>> RotateBenchmark.testRotateRightI | 512 | 31 | 512 | 5981.527 | 15727.496 | 2.629344647
>> RotateBenchmark.testRotateRightL | 128 | 7 | 256 | 2054.509 | 6980.849 | 3.397818652
>> RotateBenchmark.testRotateRightL | 128 | 7 | 512 | 997.375 | 3242.374 | 3.250907633
>> RotateBenchmark.testRotateRightL | 128 | 15 | 256 | 2051.459 | 6892.389 | 3.359749817
>> RotateBenchmark.testRotateRightL | 128 | 15 | 512 | 1002.906 | 3223.342 | 3.21400211
>> RotateBenchmark.testRotateRightL | 128 | 31 | 256 | 2044.749 | 6984.157 | 3.415654929
>> RotateBenchmark.testRotateRightL | 128 | 31 | 512 | 1004.273 | 3237.496 | 3.22372104
>> RotateBenchmark.testRotateRightL | 256 | 7 | 256 | 3811.551 | 13347.75 | 3.501920872
>> RotateBenchmark.testRotateRightL | 256 | 7 | 512 | 1892.883 | 5840.85 | 3.085689924
>> RotateBenchmark.testRotateRightL | 256 | 15 | 256 | 3821.705 | 14034.823 | 3.672398314
>> RotateBenchmark.testRotateRightL | 256 | 15 | 512 | 1799.193 | 5817.533 | 3.233412424
>> RotateBenchmark.testRotateRightL | 256 | 31 | 256 | 3816.666 | 14022.31 | 3.673968327
>> RotateBenchmark.testRotateRightL | 256 | 31 | 512 | 1796.649 | 5824.13 | 3.241662673
>> RotateBenchmark.testRotateRightL | 512 | 7 | 256 | 5943.986 | 15586.254 | 2.622188881
>> RotateBenchmark.testRotateRightL | 512 | 7 | 512 | 3022.686 | 7662.241 | 2.534911334
>> RotateBenchmark.testRotateRightL | 512 | 15 | 256 | 5958.008 | 15726.859 | 2.639616966
>> RotateBenchmark.testRotateRightL | 512 | 15 | 512 | 2998.469 | 7654.703 | 2.552870482
>> RotateBenchmark.testRotateRightL | 512 | 31 | 256 | 5937.491 | 15741.207 | 2.651154671
>> RotateBenchmark.testRotateRightL | 512 | 31 | 512 | 3014.699 | 7656.837 | 2.539834657
>> RotateBenchmark.testRotateRightS | 128 | 7 | 256 | 8172.896 | 8003.474 | 0.979270261
>> RotateBenchmark.testRotateRightS | 128 | 7 | 512 | 4111.074 | 4047.267 | 0.984479238
>> RotateBenchmark.testRotateRightS | 128 | 15 | 256 | 8225.79 | 8040.421 | 0.9774649
>> RotateBenchmark.testRotateRightS | 128 | 15 | 512 | 4129.801 | 4011.919 | 0.971455767
>> RotateBenchmark.testRotateRightS | 128 | 31 | 256 | 8176.102 | 8052.686 | 0.984905276
>> RotateBenchmark.testRotateRightS | 128 | 31 | 512 | 4117.735 | 4046.522 | 0.982705784
>> RotateBenchmark.testRotateRightS | 256 | 7 | 256 | 15213.617 | 15169.51 | 0.997100821
>> RotateBenchmark.testRotateRightS | 256 | 7 | 512 | 7530.289 | 7625.581 | 1.012654494
>> RotateBenchmark.testRotateRightS | 256 | 15 | 256 | 15238.384 | 15069.978 | 0.988948566
>> RotateBenchmark.testRotateRightS | 256 | 15 | 512 | 7275.098 | 7620.764 | 1.047513587
>> RotateBenchmark.testRotateRightS | 256 | 31 | 256 | 15299.821 | 15043.765 | 0.983264118
>> RotateBenchmark.testRotateRightS | 256 | 31 | 512 | 7273.028 | 7630.97 | 1.04921499
>> RotateBenchmark.testRotateRightS | 512 | 7 | 256 | 23998.152 | 23920.046 | 0.996745333
>> RotateBenchmark.testRotateRightS | 512 | 7 | 512 | 11582.679 | 11916.382 | 1.02881052
>> RotateBenchmark.testRotateRightS | 512 | 15 | 256 | 23982.797 | 23434.756 | 0.977148579
>> RotateBenchmark.testRotateRightS | 512 | 15 | 512 | 11629.806 | 11918.759 | 1.0248459
>> RotateBenchmark.testRotateRightS | 512 | 31 | 256 | 23988.549 | 23475.629 | 0.978618132
>> RotateBenchmark.testRotateRightS | 512 | 31 | 512 | 11650.146 | 11916.47 | 1.022860143
>> 
>> 
>> 
>> </body>
>> 
>> </html>
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits:
> 
>  - 8266054: Re-designing benchmark to remove noise.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>  - 8266054: Formal argument name change to be more appropriate.
>  - 8266054: Review comments resolution.
>  - 8266054: Incorporating styling changes based on reviews.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>  - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into JDK-8266054
>  - ... and 9 more: https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2

src/hotspot/share/opto/vectorIntrinsics.cpp line 1598:

> 1596:       cnt = elem_bt == T_LONG ? gvn().transform(new ConvI2LNode(cnt)) : cnt;
> 1597:       opd2 = gvn().transform(VectorNode::scalar2vector(cnt, num_elem, type_bt));
> 1598:     } else {

Why conversion for only T_LONG and not for T_BYTE and T_SHORT? Is there an assumption here that only T_INT and T_LONG elem_bt are supported?

src/hotspot/share/opto/vectornode.cpp line 1199:

> 1197:                                              (Node*)(phase->intcon(shift_mask + 1));
> 1198:     Node* vector_mask = phase->transform(VectorNode::scalar2vector(shift_mask_node,vlen, elem_ty));
> 1199:     int subVopc = VectorNode::opcode((bt == T_LONG) ? Op_SubL : Op_SubI, bt);

There seems to be an assumption here that the vector type is INT or LONG only and not subword type. From Vector API you can get the sub word types as well.
Also if this path is coming from auto-vectorizer, don't we need masking here?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3720


More information about the core-libs-dev mailing list