RFR: 8341527: AVX-512 intrinsic for SHA3 [v6]
Sandhya Viswanathan
sviswanathan at openjdk.org
Tue Oct 22 00:14:22 UTC 2024
On Mon, 21 Oct 2024 19:46:41 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> There is already an intrinsic for SHA-3 for aarch64, which gives significant speed improvement on that architecture, so this pull request is bringing similar improvement for tha x64 family of systems that have the AVX-512 extension. Rudimentary measurements show that 30-40% speed improvement can be achieved.
>
> Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
>
> - fix mismerge
> - Merge master
> - accepting review suggestions from Volodymyr and Vladimir
> - Merge branch 'master' into sha3-avx512-intrinsic
> - fix windows build
> - fix debug build
> - 8341527: AVX-512 intrinsic for SHA3
src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 138:
> 136:
> 137: // set up the masks
> 138: __ mov64(rax, 0x1F);
This could just be a movl or movd.
src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 147:
> 145: __ kmovwl(k2, rax);
> 146: __ shrl(rax, 1);
> 147: __ kmovwl(k1, rax);
The same could be achieved by:
__ kshiftrwl(k4, k5, 1);
__ kshiftrwl(k3, k5, 2);
__ kshiftrwl(k2, k5, 3);
__ kshiftrwl(k1, k5, 4);
src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 288:
> 286: __ movq(rax, ofs); // return ofs
> 287: } else {
> 288: __ mov64(rax, 0);
This could be xorq(rax, rax).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1809659854
PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1809661605
PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1809672304
More information about the hotspot-dev
mailing list