RFR: 8337632: AES-GCM Algorithm optimization for x86_64
Jatin Bhateja
jbhateja at openjdk.org
Fri Aug 16 08:43:52 UTC 2024
On Mon, 22 Jan 2024 09:38:25 GMT, Smita Kamath <svkamath at openjdk.org> wrote:
> Hi,
> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you.
>
> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain
> -- | -- | -- | -- | --
> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67
> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72
> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73
> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34
> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66
> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75
> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42
> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6
> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54
> | | | |
> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45
> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39
> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26
> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52
> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08
> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94
> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91
> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81
> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597
src/hotspot/cpu/x86/assembler_x86.cpp line 8977:
> 8975:
> 8976: void Assembler::vinserti64x2(XMMRegister dst, XMMRegister nds, XMMRegister src, uint8_t imm8) {
> 8977: assert(VM_Version::supports_avx512dq(), "");
You may all add an assertion for VL feature, some VM instances may have custom features
src/hotspot/cpu/x86/assembler_x86.cpp line 11049:
> 11047:
> 11048: void Assembler::evbroadcastf64x2(XMMRegister dst, Address src, int vector_len) {
> 11049: assert(VM_Version::supports_avx512dq(), "");
Same a above.
src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 181:
> 179: 0x0000000000000000UL, 0x0400000000000000UL,
> 180: 0x0000000000000000UL, 0x0400000000000000UL,
> 181: };
Even though this file will be compiled only on 64bit targets (LP64) where unsigned longs are 8 bytes, still good to follow a convention of defining wordsize agnostic long long constants, please update suffix to ULL.
src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 191:
> 189: 0x0000000000000000UL, 0x0200000000000000UL,
> 190: 0x0000000000000000UL, 0x0300000000000000UL,
> 191: 0x0000000000000000UL, 0x0400000000000000UL,
Same as above.
src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 202:
> 200: 0x0000000000000002UL, 0x0000000000000000UL,
> 201: 0x0000000000000003UL, 0x0000000000000000UL,
> 202: 0x0000000000000004UL, 0x0000000000000000UL,
Same as above.
src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 2827:
> 2825: //compute HashKey ^ (8 + n), HashKey ^ (7 + n), ... HashKey ^ (5 + n)
> 2826: gfmul_avx512(ZT7, ZT5);
> 2827: __ evmovdquq(Address(avx512_htbl, 16 * 16), ZT7, Assembler::AVX_512bit);
You can define a macro for this repetitive sequence and pass varying argument to it for brevity.
src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp line 63:
> 61: 0x0000000000000001UL, 0xC200000000000000UL,
> 62: 0x0000000000000001UL, 0xC200000000000000UL,
> 63: 0x0000000000000001UL, 0xC200000000000000UL
Please suffix it with ULL, our intent is to declare wordsize agnostic long constants
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719466371
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719466856
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719486366
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719498315
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719498445
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719529833
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1719458801
More information about the hotspot-compiler-dev
mailing list