RFR: 8314124: RISC-V: implement Base64 intrinsic - decoding

Hamlin Li mli at openjdk.org
Tue Aug 20 18:31:12 UTC 2024


On Thu, 4 Jul 2024 10:09:41 GMT, Hamlin Li <mli at openjdk.org> wrote:

> ## Performance
> benchmarks run on CanVM-K230
> 
> data
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark m2+m1+scalar | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv | Score -intrinsic | Error | Units | Improvement
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 97.771 | 98.506 | 0.713 | ns/op | 1.008
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.715 | 118.422 | 0.428 | ns/op | 1.006
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 174.625 | 172.767 | 7.671 | ns/op | 0.989
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 286.391 | 317.175 | 11.443 | ns/op | 1.107
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 336.932 | 503.257 | 15.738 | ns/op | 1.494
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 418.894 | 625.485 | 7.21 | ns/op | 1.493
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 353.813 | 698.67 | 15.485 | ns/op | 1.975
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 499.243 | 866.909 | 4.427 | ns/op | 1.736
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 1451.277 | 3530.048 | 3.685 | ns/op | 2.432
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 2258.785 | 5964.066 | 9.075 | ns/op | 2.64
> Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 39689.204 | 122334.929 | 255.195 | ns/op | 3.082
> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 187.032 | 158.558 | 7.606 | ns/op | 0.848
> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 209.558 | 200.774 | 7.648 | ns/op | 0.958
> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 556.696 | 505.072 | 8.748 | ns/op | 0.907
> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2139.767 | 1876.825 | 13.787 | ns/op | 0.877
> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 6142.353 | 3818.199 | 35.622 | ns/op | 0.622
> Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 8746.205 | 4787.155 | 109.819 | ns/op | 0.547
> Base64Decode.testBase64MIMEDecode | 0 | ...

To continue the discussion at https://github.com/openjdk/jdk/pull/19973#issuecomment-2210907011.

vrgroup implementation bring some regression compared with current implementation in this pr in large size data (vrgroup also bring regression in small size data, but we can ignore the regression in small size data, as current implementation use scalar version when data size is small, it's expected.)
A implementation with vrgroup is at https://github.com/openjdk/jdk/compare/master...Hamlin-Li:jdk:baes64-decode-vrgroup?expand=1

comparison between this implementation and vrgroup
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark +/- vrgroup | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic+rvv+vrgroup | Score +intrinsic+rvv-vrgroup | Error | Units | Improvement of vrgroup
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Base64Decode.testBase64Decode | 0 | 144 | 4 | 1 | avgt | 10 | 101.993 | 99.2 | 0.781 | ns/op | 0.973
Base64Decode.testBase64Decode | 0 | 144 | 4 | 3 | avgt | 10 | 117.832 | 117.596 | 2.431 | ns/op | 0.998
Base64Decode.testBase64Decode | 0 | 144 | 4 | 7 | avgt | 10 | 429.577 | 174.873 | 4.125 | ns/op | 0.407
Base64Decode.testBase64Decode | 0 | 144 | 4 | 32 | avgt | 10 | 1760.438 | 286.046 | 3.946 | ns/op | 0.162
Base64Decode.testBase64Decode | 0 | 144 | 4 | 64 | avgt | 10 | 1060.156 | 339.35 | 1.789 | ns/op | 0.32
Base64Decode.testBase64Decode | 0 | 144 | 4 | 80 | avgt | 10 | 1929.515 | 422.906 | 48.816 | ns/op | 0.219
Base64Decode.testBase64Decode | 0 | 144 | 4 | 96 | avgt | 10 | 398.397 | 340.595 | 1.805 | ns/op | 0.855
Base64Decode.testBase64Decode | 0 | 144 | 4 | 112 | avgt | 10 | 1257.429 | 495.14 | 1.849 | ns/op | 0.394
Base64Decode.testBase64Decode | 0 | 144 | 4 | 512 | avgt | 10 | 3115.738 | 1451.795 | 17.349 | ns/op | 0.466
Base64Decode.testBase64Decode | 0 | 144 | 4 | 1000 | avgt | 10 | 4719.422 | 2321.598 | 582.276 | ns/op | 0.492
Base64Decode.testBase64Decode | 0 | 144 | 4 | 20000 | avgt | 10 | 48630.78 | 40487.502 | 370.749 | ns/op | 0.833
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 252.071 | 187.793 | 12.937 | ns/op | 0.745
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 316.001 | 209.721 | 18.705 | ns/op | 0.664
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 1162.103 | 561.51 | 2.002 | ns/op | 0.483
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 4870.108 | 2145.144 | 28.822 | ns/op | 0.44
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 10383.563 | 6138.464 | 65.675 | ns/op | 0.591
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 13784.27 | 8764.186 | 176.608 | ns/op | 0.636
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 96 | avgt | 10 | 16233.788 | 11421.009 | 109.045 | ns/op | 0.704
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 112 | avgt | 10 | 18013.584 | 14380.185 | 106.091 | ns/op | 0.798
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 512 | avgt | 10 | 80484.884 | 82614.343 | 113.118 | ns/op | 1.026
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 157590.94 | 165972.524 | 877.18 | ns/op | 1.053
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 2927722.669 | 3177495.202 | 12088.306 | ns/op | 1.085
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 1 | avgt | 10 | 97.234 | 97.971 | 0.155 | ns/op | 1.008
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 3 | avgt | 10 | 116.975 | 116.443 | 0.92 | ns/op | 0.995
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 7 | avgt | 10 | 428.975 | 177.21 | 3.4 | ns/op | 0.413
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 32 | avgt | 10 | 1759.346 | 293.573 | 10.449 | ns/op | 0.167
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 64 | avgt | 10 | 3036.901 | 340.794 | 6.915 | ns/op | 0.112
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 80 | avgt | 10 | 3682.705 | 425.593 | 6.004 | ns/op | 0.116
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 96 | avgt | 10 | 3891.699 | 349.889 | 9.875 | ns/op | 0.09
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 112 | avgt | 10 | 5135.364 | 494.459 | 32.21 | ns/op | 0.096
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 512 | avgt | 10 | 21152.095 | 1465.85 | 123.962 | ns/op | 0.069
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 40731.606 | 2258.455 | 253.43 | ns/op | 0.055
Base64Decode.testBase64URLDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 800260.537 | 39655.109 | 3438.808 | ns/op | 0.05
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 1 | avgt | 10 | 22709.004 | 22146.988 | 449.864 | ns/op | 0.975
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 3 | avgt | 10 | 22852.835 | 23008.575 | 142.386 | ns/op | 1.007
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 7 | avgt | 10 | 22954.637 | 21762.891 | 29.84 | ns/op | 0.948
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 32 | avgt | 10 | 22279.986 | 21683.46 | 145.879 | ns/op | 0.973
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 64 | avgt | 10 | 22512.975 | 22018.745 | 131.94 | ns/op | 0.978
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 80 | avgt | 10 | 23507.467 | 22171.746 | 130.631 | ns/op | 0.943
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 96 | avgt | 10 | 22264.421 | 22109.353 | 32.412 | ns/op | 0.993
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 112 | avgt | 10 | 22295.383 | 21843.31 | 128.373 | ns/op | 0.98
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 512 | avgt | 10 | 23068.531 | 22249.809 | 53.561 | ns/op | 0.965
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 22287.685 | 22598.346 | 59.69 | ns/op | 1.014
Base64Decode.testBase64WithErrorInputsDecode | 0 | 144 | 4 | 20000 | avgt | 10 | 23788.214 | 23140.676 | 523.307 | ns/op | 0.973

</google-sheets-html-origin>

With latset patch, MIME case performance as below:
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark | (addSpecial) | (errorIndex) | (lineSize) | (maxNumBytes) | Mode | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | Improvement
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1 | avgt | 10 | 240.501 | 201.761 | 3.126 | ns/op | 0.839
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 3 | avgt | 10 | 236.175 | 227.85 | 7.486 | ns/op | 0.965
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 7 | avgt | 10 | 584.142 | 541.063 | 0.98 | ns/op | 0.926
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 32 | avgt | 10 | 2030.001 | 1901.634 | 3.404 | ns/op | 0.937
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 64 | avgt | 10 | 4300.895 | 3949.644 | 6.415 | ns/op | 0.918
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 80 | avgt | 10 | 5377.374 | 5122.923 | 32.501 | ns/op | 0.953
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 96 | avgt | 10 | 6086.2 | 5546.335 | 8.686 | ns/op | 0.911
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 112 | avgt | 10 | 7506.78 | 6969.159 | 5.112 | ns/op | 0.928
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 512 | avgt | 10 | 32669.495 | 31921.418 | 4.913 | ns/op | 0.977
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 1000 | avgt | 10 | 62497.135 | 57552.972 | 40.188 | ns/op | 0.921
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 20000 | avgt | 10 9 | 91544.935 | 914449.121 | 91.182 | ns/op | 9.989
Base64Decode.testBase64MIMEDecode | 0 | 144 | 4 | 50000 | avgt | 10 22 | 78953.76 | 206748.186 | 61.744 | ns/op | 2.619
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 1 | avgt | 10 | 154.333 | 161.999 | 7.97 | ns/op | 1.05
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 3 | avgt | 10 | 197.941 | 195.536 | 0.466 | ns/op | 0.988
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 7 | avgt | 10 | 301.185 | 308.205 | 1.772 | ns/op | 1.023
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 32 | avgt | 10 | 855.663 | 894.838 | 1.361 | ns/op | 1.046
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 64 | avgt | 10 | 1599.578 | 1702.096 | 2.229 | ns/op | 1.064
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 80 | avgt | 10 | 2161.773 | 2256.243 | 15.275 | ns/op | 1.044
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 96 | avgt | 10 | 2410.724 | 2580.8 | 1.4 | ns/op | 1.071
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 112 | avgt | 10 | 3025.063 | 3212.42 | 1.392 | ns/op | 1.062
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 512 | avgt | 10 | 12836.04 | 13714.194 | 4.74 | ns/op | 1.068
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 1000 | avgt | 10 | 23009.573 | 24648.995 | 2.358 | ns/op | 1.071
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 20000 | avgt | 10 2 | 87745.171 | 324781.646 | 96.118 | ns/op | 3.701
Base64Decode.testBase64MIMEDecode | 0 | 144 | 32 | 50000 | avgt | 10 6 | 88805.99 | 800777.988 | 17.202 | ns/op | 9.017
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 1 | avgt | 10 | 162.18 | 151.062 | 1.984 | ns/op | 0.931
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 3 | avgt | 10 | 197.894 | 195.335 | 1.261 | ns/op | 0.987
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 7 | avgt | 10 | 301.012 | 318.607 | 2.875 | ns/op | 1.058
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 32 | avgt | 10 | 743.716 | 770.01 | 1.095 | ns/op | 1.035
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 64 | avgt | 10 | 1443.015 | 1549.228 | 2.714 | ns/op | 1.074
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 80 | avgt | 10 | 1841.23 | 2008.152 | 2.681 | ns/op | 1.091
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 96 | avgt | 10 | 2085.889 | 2334.91 | 0.696 | ns/op | 1.119
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 112 | avgt | 10 | 2581.392 | 2825.756 | 2.019 | ns/op | 1.095
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 512 | avgt | 10 | 11093.438 | 12072.401 | 49.43 | ns/op | 1.088
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 1000 | avgt | 10 | 19899.375 | 21965.728 | 2.75 | ns/op | 1.104
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 20000 | avgt | 10 3 | 32801.005 | 353076.979 | 82.059 | ns/op | 10.764
Base64Decode.testBase64MIMEDecode | 0 | 144 | 76 | 50000 | avgt | 10 5 | 56850.177 | 664287.226 | 45.683 | ns/op | 11.685

</google-sheets-html-origin>

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2210912450
PR Comment: https://git.openjdk.org/jdk/pull/20026#issuecomment-2298764766


More information about the hotspot-dev mailing list