RFR: 8376164: Optimize AES/ECB/PKCS5Padding implementation using full-message intrinsic stub and parallel RoundKey addition [v5]
Marc Chevalier
mchevalier at openjdk.org
Wed Feb 25 11:07:48 UTC 2026
On Tue, 24 Feb 2026 04:00:49 GMT, xinyangwu <duke at openjdk.org> wrote:
>> ### Summary
>> This PR introduces a parallel intrinsic for AES/ECB operations to replace the current per-block processing approach, reducing native call overhead and improving throughput for multi-block operations.
>> ### Problem
>> Except supporting AVX512, The existing AES/ECB/PKCS5Padding implementation suffers from three major performance issues:
>> 1. Excessive stub call overhead: Each 16-byte block requires a separate intrinsic call, resulting in high invocation frequency
>>
>> 2. Inefficient instruction-level parallelism: The serialized block processing fails to fully utilize instruction-level parallelism
>>
>> 3. Redundant setup/teardown: Repeated initialization of encryption state for each block
>> ### Changes
>> Added parallel AES intrinsic implementation
>> ### Testing
>> JMH benchmarks
>>
>> It can bring about a **37.43%** performance improvement.
>>
>> On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> AesTest.test avgt 5 11518.846 ± 68.621 ns/op
>>
>>
>> On the same machine with optimized implements:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> AesTest.test avgt 5 8381.499 ± 57.751 ns/op
>>
>>
>> All Tier-1 tests pass on linux-x64. This modification does not involve changing the encryption or decryption logic.
>
> xinyangwu has updated the pull request incrementally with one additional commit since the last revision:
>
> 8376164: Optimize AES/ECB/PKCS5Padding implementation using full-message intrinsic stub and parallel RoundKey addition
Not a review! I've seen the `hotspot-compiler` label and I've just run some testing. I've got a failure on the test `compiler/codegen/aes/TestAESMain.java` using flags `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` on a machine with an Intel Xeon Platinum 8358 Processor.
Here is the relevant part of the output:
100000 iterations
For random generator using seed: 6209567500795428124
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=6209567500795428124" to command line.
algorithm=AES, mode=ECB, paddingStr=PKCS5Padding, msgSize=646, keySize=128, noReinit=false, checkOutput=true, encInputOffset=0, encOutputOffset=0, decOutputOffset=0, lastChunkSize=32
Algorithm: AES(128bit)
Encryption cipher provider: SunJCE version 27
Encryption cipher algorithm: AES/ECB/PKCS5Padding
key: [16]: f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07
input: [646]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
encode: [656]: 4c d8 7d 61 57 30 6a f1 14 c3 d4 f9 85 2e 29 a1 48 af f7 ec cc a7 47 38 7a bf 33 ee 41 3c 07 fd
decode: [656]: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Starting encryption warm-up
output error at index 0: got 00, expected 4c
test: [656]: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
exp : [656]: 4c d8 7d 61 57 30 6a f1 14 c3 d4 f9 85 2e 29 a1 48 af f7 ec cc a7 47 38 7a bf 33 ee 41 3c 07 fd
`test` being just full of 0s seems fishy (and wrong, according to the test). I've never seen this test being flaky.
For a real review, I'll leave that to people more expert than me on cryptography.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29385#issuecomment-3957839406
More information about the security-dev
mailing list