RFR: 8341527: AVX-512 intrinsic for SHA3 [v4]

Volodymyr Paprotski duke at openjdk.org
Thu Oct 10 17:00:24 UTC 2024


On Tue, 8 Oct 2024 23:57:15 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into sha3-avx512-intrinsic
>>  - fix windows build
>>  - fix debug build
>>  - 8341527: AVX-512 intrinsic for SHA3
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 148:
> 
>> 146:   __ addl(rax, 8);
>> 147:   __ kmovbl(k4, rax);
>> 148:   __ addl(rax, 16);
> 
> Since you need k5 soonest, you could save a few cycles by removing the propagation dependency on rax and loading the immediate directly..
> 
> (If you really want to get clever, 
> 
>   KRegister masks[] = {k1,k2,k3,k4,k5};
>   for (long i=2; i<=32; i*=2) {
>     __ mov64(rax, i-1);
>     __ kmovbl(masks[i], rax);
>   }
>   ```
>   Highly debatable if its actually any more readable.. so up to you)

Another alternative that is closer to the structure of your code (And uses smaller instructions..). 

- Start from the end, with `k5`, load `0x1f` constant
- Shift constant down by one and load into next KRegister
- (still could be done with a loop. but you decide what you find more readable..)

This way k5 is available immediately for the `evmovdquq`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21352#discussion_r1795735893


More information about the hotspot-dev mailing list