RFR: 8344144: AES/CBC slow at big payloads [v2]

Volodymyr Paprotski vpaprotski at openjdk.org
Fri Nov 15 18:29:47 UTC 2024


On Thu, 14 Nov 2024 00:44:35 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30  -p algorithm=AES/CBC/NoPadding -p dataSize=30000000 -p provider=SunJCE -p keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>> 
>> Before:
>> 
>> Benchmark                (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt   Score   Error  Units
>> AESBench.decrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  25.383          ops/s
>> AESBench.decrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  32.230          ops/s
>> AESBench.encrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  20.489          ops/s
>> AESBench.encrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  21.383          ops/s
>> 
>> 
>> After:
>> 
>> Benchmark                (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt    Score   Error  Units
>> AESBench.decrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  215.144          ops/s
>> AESBench.decrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  411.265          ops/s
>> AESBench.encrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2   64.341          ops/s
>> AESBench.encrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2   73.114          ops/s
>> 
>> 
>> I have not deterministically proven why chunking works: before the change, the CBC intrinsic is not being used; and after chunking, it is. There is quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` versions isolate just crypto (see comment below).
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   comments from Kevin

Thanks for the reviews!

Re @artur-oracle 
> Please include the benchmarking tests in this PR.

I want to clarify why I have not included the test, since I have included the diff as the first comment. There are three changes in that diff:
- encrypt2/decrypt2: these are probably fine to be added to the benchmark permanently
- reduce the set size from 128 to 8: this is perhaps fine to include to, but the benchmark regularly is used for much smaller payloads. (See the `Param` for payload in the test). I did not want to change existing results I know people are tracking.
- Increased heap size to 20G. I do not know your infrastructure, but that seems like a dangerous thing to do without consulting the build team

Re @mcpowers
> Any measurable change in existing AES/CBC benchmarks with smaller payloads?

Since you asked, here is a wall of text :) 
Before:

Benchmark                (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt         Score        Error  Units
AESBench.decrypt   AES/CBC/NoPadding         256          128      SunJCE  thrpt    3  15466658.213 ± 894512.313  ops/s
AESBench.decrypt   AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3   4311452.996 ±  18242.611  ops/s
AESBench.decrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    513129.485 ±   3396.273  ops/s
AESBench.decrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    510214.982 ±   4344.772  ops/s
AESBench.decrypt2  AES/CBC/NoPadding         256          128      SunJCE  thrpt    3  19270331.648 ± 479823.535  ops/s
AESBench.decrypt2  AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3  12881063.065 ±   7450.889  ops/s
AESBench.decrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3   1854688.581 ±   3139.717  ops/s
AESBench.decrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3   1853681.576 ±   2428.282  ops/s
AESBench.encrypt   AES/CBC/NoPadding         256          128      SunJCE  thrpt    3   7172724.563 ±  61647.697  ops/s
AESBench.encrypt   AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3    918720.063 ±    877.626  ops/s
AESBench.encrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    112995.798 ±     57.118  ops/s
AESBench.encrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    113001.811 ±    254.675  ops/s
AESBench.encrypt2  AES/CBC/NoPadding         256          128      SunJCE  thrpt    3   8249489.798 ±   9262.345  ops/s
AESBench.encrypt2  AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3   1070631.891 ±     71.539  ops/s
AESBench.encrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    134439.301 ±     69.769  ops/s
AESBench.encrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    134441.136 ±      6.637  ops/s

After

Benchmark                (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt         Score         Error  Units
AESBench.decrypt   AES/CBC/NoPadding         256          128      SunJCE  thrpt    3  15565078.411 ± 1036985.429  ops/s
AESBench.decrypt   AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3   4320714.508 ±   81474.132  ops/s
AESBench.decrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    511840.239 ±    1967.440  ops/s
AESBench.decrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    511477.714 ±    1697.375  ops/s
AESBench.decrypt2  AES/CBC/NoPadding         256          128      SunJCE  thrpt    3  21913765.368 ±  106973.286  ops/s
AESBench.decrypt2  AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3  12918625.945 ±  142872.155  ops/s
AESBench.decrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3   1855977.481 ±    3097.924  ops/s
AESBench.decrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3   1855141.602 ±    2071.454  ops/s
AESBench.encrypt   AES/CBC/NoPadding         256          128      SunJCE  thrpt    3   7148105.241 ± 1121822.184  ops/s
AESBench.encrypt   AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3    914586.023 ±   13531.625  ops/s
AESBench.encrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    112926.287 ±     232.451  ops/s
AESBench.encrypt   AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    113047.201 ±      84.197  ops/s
AESBench.encrypt2  AES/CBC/NoPadding         256          128      SunJCE  thrpt    3   8249585.271 ±    7846.941  ops/s
AESBench.encrypt2  AES/CBC/NoPadding        2048          128      SunJCE  thrpt    3   1070618.927 ±    3435.745  ops/s
AESBench.encrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    134456.819 ±      87.493  ops/s
AESBench.encrypt2  AES/CBC/NoPadding       16384          128      SunJCE  thrpt    3    134455.033 ±      14.600  ops/s

(All within 1% except decrypt2 with datasize=256, might be noise, but looks like it got better too)

Re @ferakocz
> I think this is exactly the reason for the speedup. It takes quite a few calls before hotspot switches to the intrinsic.

The confusion I had, that despite giving it a LOT more warmup, it still would not switch to the intrinsic! Spent some weeks digging. (Though to be fair, I am new to a lot of the codebase, so 'weeks' is also learning)

> Was this a problem in a real-world application, or just in the benchmark? 

It was reported to me via a benchmark, but I am not sure if that was the 'cleaned up' report.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22086#issuecomment-2479676549


More information about the security-dev mailing list