RFR: 8376164: Optimize AES/ECB/PKCS5Padding implementation using full-message intrinsic stub and parallel RoundKey addition [v5]

xinyangwu duke at openjdk.org
Thu Feb 26 12:47:49 UTC 2026


On Tue, 24 Feb 2026 04:00:49 GMT, xinyangwu <duke at openjdk.org> wrote:

>> ### Summary
>> This PR introduces a parallel intrinsic for AES/ECB operations to replace the current per-block processing approach, reducing native call overhead and improving throughput for multi-block operations.
>> ### Problem
>> Except supporting AVX512, The existing AES/ECB/PKCS5Padding implementation suffers from three major performance issues:
>> 1. Excessive stub call overhead: Each 16-byte block requires a separate intrinsic call, resulting in high invocation frequency
>> 
>> 2. Inefficient instruction-level parallelism: The serialized block processing fails to fully utilize instruction-level parallelism
>> 
>> 3. Redundant setup/teardown: Repeated initialization of encryption state for each block
>> ### Changes
>> Added parallel AES intrinsic implementation
>> ### Testing
>> JMH benchmarks
>> 
>> It can bring about a **37.43%** performance improvement.
>> 
>> On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements:
>> 
>> 
>> Benchmark     Mode  Cnt      Score    Error  Units
>> AesTest.test  avgt    5  11518.846 ± 68.621  ns/op
>> 
>> 
>> On the same machine with optimized implements:
>> 
>> 
>> Benchmark     Mode  Cnt     Score    Error  Units
>> AesTest.test  avgt    5  8381.499 ± 57.751  ns/op
>> 
>> 
>> All Tier-1 tests pass on linux-x64. This modification does not involve changing the encryption or decryption logic.
>
> xinyangwu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8376164: Optimize AES/ECB/PKCS5Padding implementation using full-message intrinsic stub and parallel RoundKey addition

I managed to find two Intel machines with AVX-512 support:
**Intel(R) Xeon(R) 6982P-C** and **Intel(R) Xeon(R) Platinum 8480**.
On both systems, I confirmed that `UseAVX=3` and `UseKNLSetting=true` were enabled, and that execution indeed reached my newly added `generate_electronicCodeBook_[en/de]cryptAESCrypt_Parallel` path.

However, I still was not able to reproduce the failure you observed. I ran the test with the following command:


jtreg -vmoptions:"-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" \
      -va -jdk build/debug-aes/images/jdk \
      test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java


@marc-chevalier Could you please let me know:

- Were there any additional JVM flags or test parameters used in your run?
- Did you run only `TestAESMain.java`, or was this part of a larger test suite execution?

Any extra details would be very helpful for me to reproduce and investigate this issue.
Thanks again for reporting this and for your help in tracking it down!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29385#issuecomment-3966401774



More information about the security-dev mailing list