RFR: 8341527: AVX-512 intrinsic for SHA3 [v6]

Tue Oct 22 00:14:22 UTC 2024

On Mon, 21 Oct 2024 19:46:41 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:

>> There is already an intrinsic for SHA-3 for aarch64, which gives significant speed improvement on that architecture, so this pull request is bringing similar improvement for tha x64 family of systems that have the AVX-512 extension. Rudimentary measurements show that 30-40% speed improvement can be achieved.
>
> Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
> 
>  - fix mismerge
>  - Merge master
>  - accepting review suggestions from Volodymyr and Vladimir
>  - Merge branch 'master' into sha3-avx512-intrinsic
>  - fix windows build
>  - fix debug build
>  - 8341527: AVX-512 intrinsic for SHA3

src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 138:

> 136: 
> 137:   // set up the masks
> 138:   __ mov64(rax, 0x1F);

This could just be a movl or movd.

src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 147:

> 145:   __ kmovwl(k2, rax);
> 146:   __ shrl(rax, 1);
> 147:   __ kmovwl(k1, rax);

The same could be achieved by:
__ kshiftrwl(k4, k5, 1);
__ kshiftrwl(k3, k5, 2);
__ kshiftrwl(k2, k5, 3);
__ kshiftrwl(k1, k5, 4);

src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 288:

> 286:     __ movq(rax, ofs); // return ofs
> 287:   } else {
> 288:     __ mov64(rax, 0);

This could be  xorq(rax, rax).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1809659854
PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1809661605
PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1809672304