RFR: 8344144: AES/CBC slow at big payloads [v2]

Ferenc Rakoczi duke at openjdk.org
Thu Nov 14 10:12:45 UTC 2024


On Thu, 14 Nov 2024 00:44:35 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30  -p algorithm=AES/CBC/NoPadding -p dataSize=30000000 -p provider=SunJCE -p keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>> 
>> Before:
>> 
>> Benchmark                (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt   Score   Error  Units
>> AESBench.decrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  25.383          ops/s
>> AESBench.decrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  32.230          ops/s
>> AESBench.encrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  20.489          ops/s
>> AESBench.encrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  21.383          ops/s
>> 
>> 
>> After:
>> 
>> Benchmark                (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt    Score   Error  Units
>> AESBench.decrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  215.144          ops/s
>> AESBench.decrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2  411.265          ops/s
>> AESBench.encrypt   AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2   64.341          ops/s
>> AESBench.encrypt2  AES/CBC/NoPadding    30000000          128      SunJCE  thrpt    2   73.114          ops/s
>> 
>> 
>> I have not deterministically proven why chunking works: before the change, the CBC intrinsic is not being used; and after chunking, it is. There is quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` versions isolate just crypto (see comment below).
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   comments from Kevin

> I have not deterministically proven why chunking works: before the change, the CBC intrinsic is not being used; and after chunking, it is. 

I think this is exactly the reason for the speedup. It takes quite a few calls before hotspot switches to the intrinsic. 

Was this a problem in a real-world application, or just in the benchmark?

The change looks good to me.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22086#issuecomment-2475926066


More information about the security-dev mailing list