RFR: 8344144: AES/CBC slow at big payloads [v2]
Volodymyr Paprotski
vpaprotski at openjdk.org
Fri Nov 15 18:29:47 UTC 2024
On Thu, 14 Nov 2024 00:44:35 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:
>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p algorithm=AES/CBC/NoPadding -p dataSize=30000000 -p provider=SunJCE -p keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>>
>> Before:
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> AESBench.decrypt AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 25.383 ops/s
>> AESBench.decrypt2 AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 32.230 ops/s
>> AESBench.encrypt AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 20.489 ops/s
>> AESBench.encrypt2 AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 21.383 ops/s
>>
>>
>> After:
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> AESBench.decrypt AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 215.144 ops/s
>> AESBench.decrypt2 AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 411.265 ops/s
>> AESBench.encrypt AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 64.341 ops/s
>> AESBench.encrypt2 AES/CBC/NoPadding 30000000 128 SunJCE thrpt 2 73.114 ops/s
>>
>>
>> I have not deterministically proven why chunking works: before the change, the CBC intrinsic is not being used; and after chunking, it is. There is quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` versions isolate just crypto (see comment below).
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>
> comments from Kevin
Thanks for the reviews!
Re @artur-oracle
> Please include the benchmarking tests in this PR.
I want to clarify why I have not included the test, since I have included the diff as the first comment. There are three changes in that diff:
- encrypt2/decrypt2: these are probably fine to be added to the benchmark permanently
- reduce the set size from 128 to 8: this is perhaps fine to include to, but the benchmark regularly is used for much smaller payloads. (See the `Param` for payload in the test). I did not want to change existing results I know people are tracking.
- Increased heap size to 20G. I do not know your infrastructure, but that seems like a dangerous thing to do without consulting the build team
Re @mcpowers
> Any measurable change in existing AES/CBC benchmarks with smaller payloads?
Since you asked, here is a wall of text :)
Before:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESBench.decrypt AES/CBC/NoPadding 256 128 SunJCE thrpt 3 15466658.213 ± 894512.313 ops/s
AESBench.decrypt AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 4311452.996 ± 18242.611 ops/s
AESBench.decrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 513129.485 ± 3396.273 ops/s
AESBench.decrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 510214.982 ± 4344.772 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 256 128 SunJCE thrpt 3 19270331.648 ± 479823.535 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 12881063.065 ± 7450.889 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 1854688.581 ± 3139.717 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 1853681.576 ± 2428.282 ops/s
AESBench.encrypt AES/CBC/NoPadding 256 128 SunJCE thrpt 3 7172724.563 ± 61647.697 ops/s
AESBench.encrypt AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 918720.063 ± 877.626 ops/s
AESBench.encrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 112995.798 ± 57.118 ops/s
AESBench.encrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 113001.811 ± 254.675 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 256 128 SunJCE thrpt 3 8249489.798 ± 9262.345 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 1070631.891 ± 71.539 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 134439.301 ± 69.769 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 134441.136 ± 6.637 ops/s
After
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
AESBench.decrypt AES/CBC/NoPadding 256 128 SunJCE thrpt 3 15565078.411 ± 1036985.429 ops/s
AESBench.decrypt AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 4320714.508 ± 81474.132 ops/s
AESBench.decrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 511840.239 ± 1967.440 ops/s
AESBench.decrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 511477.714 ± 1697.375 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 256 128 SunJCE thrpt 3 21913765.368 ± 106973.286 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 12918625.945 ± 142872.155 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 1855977.481 ± 3097.924 ops/s
AESBench.decrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 1855141.602 ± 2071.454 ops/s
AESBench.encrypt AES/CBC/NoPadding 256 128 SunJCE thrpt 3 7148105.241 ± 1121822.184 ops/s
AESBench.encrypt AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 914586.023 ± 13531.625 ops/s
AESBench.encrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 112926.287 ± 232.451 ops/s
AESBench.encrypt AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 113047.201 ± 84.197 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 256 128 SunJCE thrpt 3 8249585.271 ± 7846.941 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 2048 128 SunJCE thrpt 3 1070618.927 ± 3435.745 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 134456.819 ± 87.493 ops/s
AESBench.encrypt2 AES/CBC/NoPadding 16384 128 SunJCE thrpt 3 134455.033 ± 14.600 ops/s
(All within 1% except decrypt2 with datasize=256, might be noise, but looks like it got better too)
Re @ferakocz
> I think this is exactly the reason for the speedup. It takes quite a few calls before hotspot switches to the intrinsic.
The confusion I had, that despite giving it a LOT more warmup, it still would not switch to the intrinsic! Spent some weeks digging. (Though to be fair, I am new to a lot of the codebase, so 'weeks' is also learning)
> Was this a problem in a real-world application, or just in the benchmark?
It was reported to me via a benchmark, but I am not sure if that was the 'cleaned up' report.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22086#issuecomment-2479676549
More information about the security-dev
mailing list