RFR: 8337632: AES-GCM Algorithm optimization for x86_64

Sandhya Viswanathan sviswanathan at openjdk.org
Thu Aug 15 00:43:54 UTC 2024


On Mon, 22 Jan 2024 09:38:25 GMT, Smita Kamath <svkamath at openjdk.org> wrote:

> Hi, 
> I want to submit an AES-GCM algorithm optimization. This implementation is using AVX512/VAES Instructions. Additionally, it reduces PARALLEL_LEN from 7680 to 512 bytes. The performance numbers are as below. Kindly review the code. Thank you.
> 
> Benchmark | Datasize | BaseJDK (ops/s) | Patch(ops/s) | %Gain
> -- | -- | -- | -- | --
> full.AESGCMBench.decrypt | 512 | 2928259.197 | 3269964.387 | 11.67
> full.AESGCMBench.decrypt | 1024 | 2494254.611 | 3010987.731 | 20.72
> full.AESGCMBench.decrypt | 1500 | 1883453.546 | 1934915.846 | 2.73
> full.AESGCMBench.decrypt | 2048 | 1825780.711 | 2452861.368 | 34.34
> full.AESGCMBench.decrypt | 4096 | 1275108.345 | 1806329.066 | 41.66
> full.AESGCMBench.decrypt | 8192 | 1033936.634 | 1196836.052 | 15.75
> full.AESGCMBench.decrypt | 16384 | 681494.768 | 711630.498 | 4.42
> full.AESGCMBench.decrypt | 32768 | 385026.017 | 395043.193 | 2.6
> full.AESGCMBench.decrypt | 65536 | 207373.924 | 214723.588 | 3.54
>   |   |   |   |  
> full.AESGCMBench.encrypt | 512 | 2658008.476 | 2882496.94 | 8.45
> full.AESGCMBench.encrypt | 1024 | 2283709.63 | 2589534.403 | 13.39
> full.AESGCMBench.encrypt | 1500 | 1794993.519 | 1817669.531 | 1.26
> full.AESGCMBench.encrypt | 2048 | 1745532.435 | 2191097.29 | 25.52
> full.AESGCMBench.encrypt | 4096 | 1203301.174 | 1649593.953 | 37.08
> full.AESGCMBench.encrypt | 8192 | 985174.988 | 1132407.54 | 14.94
> full.AESGCMBench.encrypt | 16384 | 658980.441 | 684765.771 | 3.91
> full.AESGCMBench.encrypt | 32768 | 373543.798 | 391518.837 | 4.81
> full.AESGCMBench.encrypt | 65536 | 202532.315 | 205084.833 | 1.260301597

src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3371:

> 3369:   //B00_03, B04_07, B08_11, B12_15 overwritten with shuffled cipher text
> 3370:   __ bind(cont);
> 3371:   if (no_ghash) {

We are always calling initial_blocks_16_avx512 with no_ghash as false, so we could remove the no_ghash parameter and code associated with no_ghash as true. The GHASH parameter is also then not required.

src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3454:

> 3452:   __ evmovdquq(ADD_1234, ExternalAddress(counter_mask_add_1234_addr()), Assembler::AVX_512bit, rbx /*rscratch*/);
> 3453: 
> 3454:   //Shuffle counter, Broadcast counter value to 512 bit register and subtract 1 from the pre-incremented counter value

Comment should be:
// Shuffle counter, subtract 1 from the pre-incremented counter value, and broadcast counter value to 512 bit register

src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3480:

> 3478:   __ cmpl(len, 2 * 32 * 16);
> 3479:   __ jcc(Assembler::below, ENCRYPT_BIG_NBLKS);
> 3480:   ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, true, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32);

ghash16_encrypt_parallel16_avx512  needs to pass in CTL_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, AAD_HASHx, SHUF_MASK just like we do in initial_blocks_16_avx512. Also the GL and GH needs to be passed in.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1717646768
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1717518011
PR Review Comment: https://git.openjdk.org/jdk/pull/17515#discussion_r1717678309


More information about the hotspot-compiler-dev mailing list