RFR: 8341527: AVX-512 intrinsic for SHA3 [v4]
Volodymyr Paprotski
duke at openjdk.org
Thu Oct 10 17:00:24 UTC 2024
On Tue, 8 Oct 2024 23:57:15 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>> Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>>
>> - Merge branch 'master' into sha3-avx512-intrinsic
>> - fix windows build
>> - fix debug build
>> - 8341527: AVX-512 intrinsic for SHA3
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 148:
>
>> 146: __ addl(rax, 8);
>> 147: __ kmovbl(k4, rax);
>> 148: __ addl(rax, 16);
>
> Since you need k5 soonest, you could save a few cycles by removing the propagation dependency on rax and loading the immediate directly..
>
> (If you really want to get clever,
>
> KRegister masks[] = {k1,k2,k3,k4,k5};
> for (long i=2; i<=32; i*=2) {
> __ mov64(rax, i-1);
> __ kmovbl(masks[i], rax);
> }
> ```
> Highly debatable if its actually any more readable.. so up to you)
Another alternative that is closer to the structure of your code (And uses smaller instructions..).
- Start from the end, with `k5`, load `0x1f` constant
- Shift constant down by one and load into next KRegister
- (still could be done with a loop. but you decide what you find more readable..)
This way k5 is available immediately for the `evmovdquq`
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1795735893
More information about the hotspot-dev
mailing list