RFR: 8316592: RISC-V: implement poly1305 intrinsic [v2]
Vladimir Kempik
vkempik at openjdk.org
Mon Oct 30 16:31:33 UTC 2023
On Mon, 30 Oct 2023 16:07:41 GMT, null <duke at openjdk.org> wrote:
>> Hi everyone, please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L7124) `_poly1305_processBlocks` intrinsic to RISC-V platform.
>>
>> ### Correctness checks
>>
>> Tier 1 tests are passed. Also I explicitly ran the `test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java` test for multiple times and it passes.
>>
>> ### Performance results on T-Head board
>>
>> #### Results for enabled intrinsic:
>>
>> Benchmark | (dataSize) | Mode | Cnt | Score | Error | Units
>> -- | -- | -- | -- | -- | -- | --
>> Poly1305DigestBench.digestBuffer | 64| thrpt | 3 | 247207.525 | 2853.920 | ops/s
>> Poly1305DigestBench.digestBuffer | 256 | thrpt | 3 | 221994.065 | 6891.601 | ops/s
>> Poly1305DigestBench.digestBuffer | 1024 | thrpt | 3 | 164485.375 | 4979.286 | ops/s
>> Poly1305DigestBench.digestBuffer | 16384 | thrpt | 3 | 27261.181 | 448.178 | ops/s
>> Poly1305DigestBench.digestBuffer | 1048576 | thrpt | 3 | 270.784 | 3445.077 | ops/s
>> Poly1305DigestBench.digestBytes | 64 | thrpt | 3 | 266049.018 | 9909.155 | ops/s
>> Poly1305DigestBench.digestBytes | 256 | thrpt | 3 | 231891.890 | 715.000 | ops/s
>> Poly1305DigestBench.digestBytes | 1024 | thrpt | 3 | 172746.932 | 1202.374 | ops/s
>> Poly1305DigestBench.digestBytes | 16384 | thrpt | 3 | 27626.478 | 341.915 | ops/s
>> Poly1305DigestBench.digestBytes | 1048576 | thrpt | 3 | 265.235 | 3522.458 | ops/s
>> Poly1305DigestBench.updateBytes | 64 | thrpt | 3 | 3394516.156 | 14656.687 | ops/s
>> Poly1305DigestBench.updateBytes | 256 | thrpt | 3 | 1463745.045 | 19608.937 | ops/s
>> Poly1305DigestBench.updateBytes | 1024 | thrpt | 3 | 459312.198 | 1720.655 | ops/s
>> Poly1305DigestBench.updateBytes | 16384 | thrpt | 3 | 30969.117 | 813.712 | ops/s
>> Poly1305DigestBench.updateBytes | 1048576 | thrpt | 3 | 300.773 | 3345.716 | ops/s
>>
>> #### Results for disabled intrinsic:
>>
>> Benchmark | (dataSize) | Mode | Cnt | Score | Error | Units
>> -- | -- | -- | -- | -- | -- | --
>> Poly1305DigestBench.digestBuffer | 64 | thrpt | 3 | 225424.813 | 1083.844 | ops/s
>> Poly1305DigestBench.digestBuffer | 256 | thrpt | 3 | 167848.372 | 3488.837 | ops/s
>> Poly1305DigestBench.digestBuffer | 1024 | thrpt | 3 | 81802.600 | 1839.218 | ops/s
>> Poly1305DigestBench.digestBuffer | 16384 | thrpt | 3 | 7781.049 | 1101.150 | ops/s
>> Poly1305DigestBench.digestBuffer | 1048576 | thrpt | 3 | 118.778 | 74.388 | ops/s
>> Poly1305DigestBench.digestBytes | 64 | thrpt | 3 | 23510...
>
> null has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
>
> 8316592: RISC-V: implement poly1305 intrinsic
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4568:
> 4566: __ andi(U_2, U_2, bits2); // Clear U_2 except for the first two bits
> 4567: __ slli(tmp2, tmp1, 2);
> 4568: __ add(tmp1, tmp1, tmp2); // Impossible to overflow since two leftmost bits are zero'ed in 'srli(tmp1, U_2, 2)'
can we replace lines 4567 and 4568 with shadd(tmp1, tmp1, tmp1, tmp2, 2); ?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4582:
> 4580: __ srli(tmp1, U_2, 2);
> 4581: __ slli(tmp2, tmp1, 2);
> 4582: __ add(tmp1, tmp1, tmp2); // tmp1 = U_2 * 5
can we replace lines 4581 and 4582 with shadd(tmp1, tmp1, tmp1, tmp2, 2); ?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1376503576
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1376504290
More information about the hotspot-compiler-dev
mailing list