RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64
Jamil Nimeh
jnimeh at openjdk.org
Thu Apr 3 16:45:25 UTC 2025
On Thu, 3 Apr 2025 16:31:39 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote:
> This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106. After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case. More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation.
>
> There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors. Comparative benchmarks can also be found below.
Benchmarks for Apple M1:
MacOS Sonoma 14.5, 8x Apple M1
Quarter Round Parallel, No Interleaving
---------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 3837175.980 ? 14108.076 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1150065.857 ? 2238.499 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 299444.203 ? 1914.377 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 76149.432 ? 81.343 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3457825.749 ? 95284.525 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1100458.180 ? 9856.390 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 296393.225 ? 1176.583 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 75271.693 ? 848.788 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 995936.643 ? 8252.270 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 518474.192 ? 2541.371 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 178582.085 ? 337.094 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50037.769 ? 60.497 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1189366.955 ? 3437.169 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 568044.693 ? 6057.314 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 181517.405 ? 248.283 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49339.073 ? 298.549 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50024.452 ? 53.838 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49459.758 ? 63.090 ops/s
Quarter Round Parallel, With Interleaving
-----------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 3880433.294 ? 9904.562 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1157285.625 ? 2415.082 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 301986.767 ? 339.147 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 75990.670 ? 194.671 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3486874.086 ? 93507.311 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1111966.942 ? 9602.005 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 297633.816 ? 1455.184 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 74817.230 ? 1737.888 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 998384.311 ? 7491.076 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 517031.021 ? 1756.181 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 179139.212 ? 401.008 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49796.519 ? 609.335 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1207581.459 ? 13757.759 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 576596.806 ? 4205.682 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 184108.182 ? 229.014 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50120.498 ? 300.391 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50053.528 ? 181.415 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50232.767 ? 62.234 ops/s
Block Parallel, No Interleaving
-------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 4107524.407 ? 9337.726 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1210532.736 ? 1111.846 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 315178.899 ? 375.858 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 78782.555 ? 856.939 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3601509.841 ? 103375.315 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1156918.875 ? 9666.447 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 312270.458 ? 1726.717 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 79394.369 ? 513.291 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1029546.842 ? 2317.072 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 532504.493 ? 2836.934 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 183874.028 ? 332.438 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51739.678 ? 122.138 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1263370.572 ? 15424.473 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 588853.049 ? 3419.509 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 188899.111 ? 160.103 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51516.978 ? 147.720 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51758.247 ? 39.852 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51441.519 ? 278.059 ops/s
Block Parallel, With Interleaving
---------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 4154482.236 ? 8208.082 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1221710.558 ? 5967.515 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 319918.165 ? 327.235 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 80602.283 ? 193.687 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3710733.896 ? 88631.462 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1168824.003 ? 10465.340 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 315040.718 ? 1389.500 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 80365.126 ? 586.286 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1007279.441 ? 8794.990 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 536758.995 ? 3346.320 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 184600.058 ? 362.456 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 52079.247 ? 38.558 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1233639.918 ? 7503.063 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 593298.939 ? 3886.323 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 190535.858 ? 215.443 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51953.765 ? 226.078 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 52073.085 ? 46.961 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51815.757 ? 331.563 ops/s
Benchmarks for Neoverse-N1:
System: 2x Neoverse-N1, 2 cores, 1 socket, 1 thread/core (var 0x3, part, 0xD0C)
Quarter-Round Parallel Intrinsics Implementation
------------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2219198.137 ± 13314.344 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 684200.661 ± 3601.031 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 181048.566 ± 942.201 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 46150.219 ± 118.031 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2049320.671 ± 9549.691 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 663456.090 ± 2722.964 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 179921.834 ± 573.613 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 45885.159 ± 102.974 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 476694.433 ± 4118.055 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 251749.129 ± 1535.415 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 87052.901 ± 436.111 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24099.749 ± 136.009 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 601333.942 ± 5414.186 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 280884.583 ± 2332.119 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90250.320 ± 604.948 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24346.217 ± 101.557 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23950.145 ± 119.081 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24405.675 ± 93.554 ops/s
Quarter-Round Parallel Intrinsics with Interleaving Implementation:
-------------------------------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2344673.121 ± 14885.986 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 724626.059 ± 3078.617 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 192723.841 ± 744.860 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 49050.992 ± 118.087 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2136919.832 ± 7229.740 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 703672.009 ± 2520.798 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 191748.973 ± 421.704 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 48939.791 ± 194.749 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 497137.864 ± 2915.527 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 262127.552 ± 1302.946 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90018.698 ± 425.574 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24987.421 ± 119.936 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 634980.497 ± 4191.567 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 293529.897 ± 1496.703 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 93230.690 ± 480.282 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24936.479 ± 112.139 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24897.542 ± 76.891 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25128.075 ± 120.033 ops/s
Block-Parallel Intrinsics Implementation
----------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2164945.312 ± 8845.473 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 659831.098 ± 1968.217 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 175252.222 ± 512.910 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 44329.489 ± 126.564 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 1975016.045 ± 11695.931 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 640856.881 ± 1830.533 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 173305.072 ± 366.240 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 44208.373 ± 107.018 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 466351.469 ± 3278.807 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 247662.489 ± 1165.507 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 85367.721 ± 404.796 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23492.360 ± 92.043 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 589645.973 ± 4262.663 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 278130.465 ± 1394.179 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 88081.739 ± 443.476 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23853.430 ± 104.346 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23620.475 ± 75.932 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23750.134 ± 118.572 ops/s
Block-Parallel with Interleaving Intrinsics Implementation
----------------------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2358246.820 ± 14256.312 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 734318.183 ± 2447.434 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 196243.937 ± 517.431 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 50008.245 ± 85.350 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2156054.908 ± 5432.249 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 713847.200 ± 1962.784 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 194383.466 ± 464.389 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 49652.092 ± 166.716 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 497410.798 ± 3632.927 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 261587.126 ± 1336.591 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90453.673 ± 429.630 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24963.118 ± 103.795 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 623876.407 ± 4655.637 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 292279.929 ± 1345.033 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 93352.350 ± 429.286 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25190.232 ± 121.961 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25128.018 ± 84.863 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25371.698 ± 129.837 ops/s
Benchmarks for Cortex-A72:
4 processor Cortex-A72, 1 cluster, 4 cores/cluster, 1 thread/core (var 0x0, part 0xD08)
Quarter Round Parallel Implementation, No Interleaving
------------------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 602983.483 ± 6556.879 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 186189.843 ± 628.835 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 49499.230 ± 139.811 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 12487.617 ± 69.484 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 592209.356 ± 3927.984 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 185091.856 ± 366.779 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 49491.296 ± 117.179 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 12512.907 ± 71.587 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 96212.313 ± 2482.928 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 65131.604 ± 1504.555 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 27746.783 ± 229.856 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8381.946 ± 32.122 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 129453.321 ± 3224.106 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 77091.625 ± 1470.684 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 29334.590 ± 303.107 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8460.356 ± 8.524 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8386.624 ± 34.163 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8471.573 ± 8.635 ops/s
Quarter Round Parallel Implementaion, With Interleaving
-------------------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 767143.826 ± 9195.715 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 254386.139 ± 1378.080 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 69152.606 ± 176.940 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 17609.457 ± 71.086 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 746643.194 ± 9077.375 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 251953.223 ± 959.588 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 69064.757 ± 197.231 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 17563.052 ± 97.678 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 105520.550 ± 2805.637 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 72902.046 ± 1738.503 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 33446.843 ± 377.742 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10437.913 ± 31.702 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 141153.205 ± 3693.280 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 89657.996 ± 1635.631 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 35926.981 ± 244.574 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10555.879 ± 18.698 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10440.037 ± 33.023 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10542.745 ± 45.282 ops/s
Block Parallel Implementation, No Interleaving
----------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 587100.753 ± 5754.708 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 178737.840 ± 730.445 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 47340.182 ± 121.627 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 11947.269 ± 66.887 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 574123.343 ± 3838.477 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 177870.311 ± 420.125 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 47409.796 ± 109.224 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 11967.672 ± 65.803 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 95867.086 ± 2228.000 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 63376.433 ± 1301.826 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 26988.391 ± 231.289 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8139.090 ± 20.871 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 127770.261 ± 3262.540 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 76019.408 ± 1226.583 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 28652.283 ± 214.896 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8208.186 ± 11.455 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8131.508 ± 27.548 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8207.550 ± 13.086 ops/s
Block Parallel Implementation, With Interleaving
------------------------------------------------
Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 826086.130 ± 9933.137 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 276583.128 ± 1434.611 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 75688.367 ± 228.277 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 19348.013 ± 77.810 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 800978.386 ± 10445.822 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 274107.264 ± 1606.978 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 75446.852 ± 209.379 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 19270.292 ± 105.573 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 105988.778 ± 3001.220 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 76162.169 ± 1692.042 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 34978.996 ± 468.786 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11040.040 ± 31.844 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 146046.188 ± 3471.952 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 94041.417 ± 1834.558 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 37770.658 ± 311.519 ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11183.053 ± 11.204 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11037.956 ± 39.522 ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11196.095 ± 33.796 ops/s
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776357177
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776369079
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776371619
More information about the hotspot-dev
mailing list