RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64

Jamil Nimeh jnimeh at openjdk.org
Thu Apr 3 16:45:25 UTC 2025


On Thu, 3 Apr 2025 16:31:39 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote:

> This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106.  After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case.  More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation.
> 
> There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors.  Comparative benchmarks can also be found below.

Benchmarks for Apple M1:

MacOS Sonoma 14.5, 8x Apple M1


Quarter Round Parallel, No Interleaving
---------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  3837175.980 ? 14108.076  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1150065.857 ?  2238.499  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   299444.203 ?  1914.377  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    76149.432 ?    81.343  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  3457825.749 ? 95284.525  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1100458.180 ?  9856.390  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   296393.225 ?  1176.583  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    75271.693 ?   848.788  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   995936.643 ?  8252.270  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   518474.192 ?  2541.371  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   178582.085 ?   337.094  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    50037.769 ?    60.497  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  1189366.955 ?  3437.169  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   568044.693 ?  6057.314  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   181517.405 ?   248.283  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    49339.073 ?   298.549  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    50024.452 ?    53.838  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    49459.758 ?    63.090  ops/s


Quarter Round Parallel, With Interleaving
-----------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  3880433.294 ?  9904.562  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1157285.625 ?  2415.082  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   301986.767 ?   339.147  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    75990.670 ?   194.671  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  3486874.086 ? 93507.311  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1111966.942 ?  9602.005  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   297633.816 ?  1455.184  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    74817.230 ?  1737.888  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   998384.311 ?  7491.076  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   517031.021 ?  1756.181  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   179139.212 ?   401.008  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    49796.519 ?   609.335  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  1207581.459 ? 13757.759  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   576596.806 ?  4205.682  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   184108.182 ?   229.014  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    50120.498 ?   300.391  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    50053.528 ?   181.415  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    50232.767 ?    62.234  ops/s


Block Parallel, No Interleaving
-------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score        Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  4107524.407 ?   9337.726  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1210532.736 ?   1111.846  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   315178.899 ?    375.858  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    78782.555 ?    856.939  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  3601509.841 ? 103375.315  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1156918.875 ?   9666.447  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   312270.458 ?   1726.717  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    79394.369 ?    513.291  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  1029546.842 ?   2317.072  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   532504.493 ?   2836.934  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   183874.028 ?    332.438  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    51739.678 ?    122.138  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  1263370.572 ?  15424.473  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   588853.049 ?   3419.509  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   188899.111 ?    160.103  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    51516.978 ?    147.720  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    51758.247 ?     39.852  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    51441.519 ?    278.059  ops/s


Block Parallel, With Interleaving
---------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  4154482.236 ?  8208.082  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1221710.558 ?  5967.515  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   319918.165 ?   327.235  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    80602.283 ?   193.687  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  3710733.896 ? 88631.462  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  1168824.003 ? 10465.340  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   315040.718 ?  1389.500  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    80365.126 ?   586.286  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  1007279.441 ?  8794.990  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   536758.995 ?  3346.320  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   184600.058 ?   362.456  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    52079.247 ?    38.558  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  1233639.918 ?  7503.063  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   593298.939 ?  3886.323  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   190535.858 ?   215.443  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    51953.765 ?   226.078  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    52073.085 ?    46.961  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    51815.757 ?   331.563  ops/s

Benchmarks for Neoverse-N1:

System: 2x Neoverse-N1, 2 cores, 1 socket, 1 thread/core (var 0x3, part, 0xD0C)


Quarter-Round Parallel Intrinsics Implementation
------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2219198.137 ± 13314.344  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   684200.661 ±  3601.031  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   181048.566 ±   942.201  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    46150.219 ±   118.031  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2049320.671 ±  9549.691  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   663456.090 ±  2722.964  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   179921.834 ±   573.613  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    45885.159 ±   102.974  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   476694.433 ±  4118.055  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   251749.129 ±  1535.415  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    87052.901 ±   436.111  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24099.749 ±   136.009  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   601333.942 ±  5414.186  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   280884.583 ±  2332.119  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    90250.320 ±   604.948  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24346.217 ±   101.557  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    23950.145 ±   119.081  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24405.675 ±    93.554  ops/s


Quarter-Round Parallel Intrinsics with Interleaving Implementation:
-------------------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2344673.121 ± 14885.986  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   724626.059 ±  3078.617  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   192723.841 ±   744.860  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    49050.992 ±   118.087  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2136919.832 ±  7229.740  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   703672.009 ±  2520.798  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   191748.973 ±   421.704  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    48939.791 ±   194.749  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   497137.864 ±  2915.527  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   262127.552 ±  1302.946  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    90018.698 ±   425.574  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24987.421 ±   119.936  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   634980.497 ±  4191.567  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   293529.897 ±  1496.703  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    93230.690 ±   480.282  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24936.479 ±   112.139  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24897.542 ±    76.891  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    25128.075 ±   120.033  ops/s


Block-Parallel Intrinsics Implementation
----------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2164945.312 ±  8845.473  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   659831.098 ±  1968.217  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   175252.222 ±   512.910  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    44329.489 ±   126.564  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  1975016.045 ± 11695.931  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   640856.881 ±  1830.533  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   173305.072 ±   366.240  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    44208.373 ±   107.018  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   466351.469 ±  3278.807  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   247662.489 ±  1165.507  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    85367.721 ±   404.796  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    23492.360 ±    92.043  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   589645.973 ±  4262.663  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   278130.465 ±  1394.179  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    88081.739 ±   443.476  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    23853.430 ±   104.346  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    23620.475 ±    75.932  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    23750.134 ±   118.572  ops/s


Block-Parallel with Interleaving Intrinsics Implementation
----------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt        Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2358246.820 ± 14256.312  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   734318.183 ±  2447.434  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   196243.937 ±   517.431  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    50008.245 ±    85.350  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  2156054.908 ±  5432.249  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40   713847.200 ±  1962.784  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   194383.466 ±   464.389  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40    49652.092 ±   166.716  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   497410.798 ±  3632.927  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   261587.126 ±  1336.591  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    90453.673 ±   429.630  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    24963.118 ±   103.795  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   623876.407 ±  4655.637  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   292279.929 ±  1345.033  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    93352.350 ±   429.286  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    25190.232 ±   121.961  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    25128.018 ±    84.863  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    25371.698 ±   129.837  ops/s

Benchmarks for Cortex-A72:

4 processor Cortex-A72, 1 cluster, 4 cores/cluster, 1 thread/core (var 0x0, part 0xD08)

Quarter Round Parallel Implementation, No Interleaving
------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score      Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  602983.483 ± 6556.879  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  186189.843 ±  628.835  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   49499.230 ±  139.811  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   12487.617 ±   69.484  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  592209.356 ± 3927.984  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  185091.856 ±  366.779  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   49491.296 ±  117.179  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   12512.907 ±   71.587  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   96212.313 ± 2482.928  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   65131.604 ± 1504.555  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   27746.783 ±  229.856  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8381.946 ±   32.122  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  129453.321 ± 3224.106  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   77091.625 ± 1470.684  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   29334.590 ±  303.107  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8460.356 ±    8.524  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8386.624 ±   34.163  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8471.573 ±    8.635  ops/s


Quarter Round Parallel Implementaion, With Interleaving
-------------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score      Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  767143.826 ± 9195.715  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  254386.139 ± 1378.080  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   69152.606 ±  176.940  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   17609.457 ±   71.086  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  746643.194 ± 9077.375  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  251953.223 ±  959.588  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   69064.757 ±  197.231  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   17563.052 ±   97.678  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  105520.550 ± 2805.637  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   72902.046 ± 1738.503  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   33446.843 ±  377.742  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   10437.913 ±   31.702  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  141153.205 ± 3693.280  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   89657.996 ± 1635.631  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   35926.981 ±  244.574  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   10555.879 ±   18.698  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   10440.037 ±   33.023  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   10542.745 ±   45.282  ops/s


Block Parallel Implementation, No Interleaving
----------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score      Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  587100.753 ± 5754.708  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  178737.840 ±  730.445  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   47340.182 ±  121.627  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   11947.269 ±   66.887  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  574123.343 ± 3838.477  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  177870.311 ±  420.125  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   47409.796 ±  109.224  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   11967.672 ±   65.803  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   95867.086 ± 2228.000  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   63376.433 ± 1301.826  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   26988.391 ±  231.289  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8139.090 ±   20.871  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  127770.261 ± 3262.540  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   76019.408 ± 1226.583  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   28652.283 ±  214.896  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8208.186 ±   11.455  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8131.508 ±   27.548  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40    8207.550 ±   13.086  ops/s


Block Parallel Implementation, With Interleaving
------------------------------------------------
Benchmark                                             (dataSize)  (keyLength)  (mode)  (padding)      (permutation)  (provider)   Mode  Cnt       Score       Error  Units
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  826086.130 ±  9933.137  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  276583.128 ±  1434.611  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   75688.367 ±   228.277  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   19348.013 ±    77.810  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                  256          256    None  NoPadding           ChaCha20              thrpt   40  800978.386 ± 10445.822  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 1024          256    None  NoPadding           ChaCha20              thrpt   40  274107.264 ±  1606.978  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                 4096          256    None  NoPadding           ChaCha20              thrpt   40   75446.852 ±   209.379  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt                16384          256    None  NoPadding           ChaCha20              thrpt   40   19270.292 ±   105.573  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  105988.778 ±  3001.220  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   76162.169 ±  1692.042  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   34978.996 ±   468.786  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   11040.040 ±    31.844  ops/s

o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt          256          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40  146046.188 ±  3471.952  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         1024          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   94041.417 ±  1834.558  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt         4096          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   37770.658 ±   311.519  ops/s
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt        16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   11183.053 ±    11.204  ops/s

o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   11037.956 ±    39.522  ops/s
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt       16384          256    None  NoPadding  ChaCha20-Poly1305              thrpt   40   11196.095 ±    33.796  ops/s

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776357177
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776369079
PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776371619


More information about the hotspot-dev mailing list