RFR: 8316592: RISC-V: implement poly1305 intrinsic [v12]

ArsenyBochkarev duke at openjdk.org
Mon Nov 20 15:26:49 UTC 2023


On Thu, 16 Nov 2023 17:09:59 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hi everyone, please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L7124) `_poly1305_processBlocks` intrinsic to RISC-V platform. 
>> 
>> ### Correctness checks
>> 
>> Tier 1 tests are passed. Also I explicitly ran the `test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java` test for multiple times and it passes.
>> 
>> ### Performance results on T-Head board
>> 
>> #### Results for enabled intrinsic:
>> 
>> Benchmark | (dataSize) | Mode | Cnt | Score | Error | Units
>> -- | -- | -- | -- | -- | -- | --
>> Poly1305DigestBench.digestBuffer | 64| thrpt | 3 | 247207.525 | 2853.920 | ops/s
>> Poly1305DigestBench.digestBuffer | 256 | thrpt | 3 | 221994.065 | 6891.601 | ops/s
>> Poly1305DigestBench.digestBuffer | 1024 | thrpt | 3 | 164485.375 | 4979.286 | ops/s
>> Poly1305DigestBench.digestBuffer | 16384 | thrpt | 3 | 27261.181 | 448.178 | ops/s
>> Poly1305DigestBench.digestBuffer | 1048576 | thrpt | 3 | 270.784 | 3445.077 | ops/s
>> Poly1305DigestBench.digestBytes | 64 | thrpt | 3 | 266049.018 | 9909.155 | ops/s
>> Poly1305DigestBench.digestBytes | 256 | thrpt | 3 | 231891.890 | 715.000 | ops/s
>> Poly1305DigestBench.digestBytes | 1024 | thrpt | 3 | 172746.932 | 1202.374 | ops/s
>> Poly1305DigestBench.digestBytes | 16384 | thrpt | 3 | 27626.478 | 341.915 | ops/s
>> Poly1305DigestBench.digestBytes | 1048576 | thrpt | 3 | 265.235 | 3522.458 | ops/s
>> Poly1305DigestBench.updateBytes | 64 | thrpt | 3 | 3394516.156 | 14656.687 | ops/s
>> Poly1305DigestBench.updateBytes | 256 | thrpt | 3 | 1463745.045 | 19608.937 | ops/s
>> Poly1305DigestBench.updateBytes | 1024 | thrpt | 3 | 459312.198 | 1720.655 | ops/s
>> Poly1305DigestBench.updateBytes | 16384 | thrpt | 3 | 30969.117 | 813.712 | ops/s
>> Poly1305DigestBench.updateBytes | 1048576 | thrpt | 3 | 300.773 | 3345.716 | ops/s
>> 
>> #### Results for disabled intrinsic:
>> 
>> Benchmark | (dataSize) | Mode | Cnt | Score | Error | Units
>> -- | -- | -- | -- | -- | -- | -- 
>> Poly1305DigestBench.digestBuffer | 64 | thrpt | 3 | 225424.813 | 1083.844 | ops/s
>> Poly1305DigestBench.digestBuffer | 256 | thrpt | 3 | 167848.372 | 3488.837 | ops/s
>> Poly1305DigestBench.digestBuffer | 1024 | thrpt | 3 | 81802.600 | 1839.218 | ops/s
>> Poly1305DigestBench.digestBuffer | 16384 | thrpt | 3 | 7781.049 | 1101.150 | ops/s
>> Poly1305DigestBench.digestBuffer | 1048576 | thrpt | 3 | 118.778 | 74.388 | ops/s
>> Poly1305DigestBench.digestBytes | 64 | thrpt | 3 | 23510...
>
> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comment: first -> lowest

Updated JMH test results on T-Head:

| Benchmark                                      |  (dataSize) | (provider)  | Mode | Cnt    |    Score    |   Error |  Units |
| ---------------------------------------------- | --------------- | ------------- | ------- | -------- | ------------- | ------- | ---------|
| Poly1305DigestBench.digestBuffer |         64      |      |  thrpt   | 5  | 250053.046   | 587.637 | ops/s |
| Poly1305DigestBench.digestBuffer |        256     |      |   thrpt   | 5 |  220166.575  | 1187.463 | ops/s |
| Poly1305DigestBench.digestBuffer |       1024    |      |    thrpt  |  5 |  151100.309 | 16185.043 | ops/s |
| Poly1305DigestBench.digestBuffer |      16384   |      |     thrpt |   5  |  25023.730 |  1041.332 | ops/s |
| Poly1305DigestBench.digestBuffer |    1048576 |      |       thrpt|    5  |    439.427 |     5.617 | ops/s |
| Poly1305DigestBench.digestBytes  |         64      |      |  thrpt   | 5 |  263639.669  | 1507.805 | ops/s |
| Poly1305DigestBench.digestBytes  |        256     |      |   thrpt   | 5 |  232732.517 |   602.418 | ops/s |
| Poly1305DigestBench.digestBytes  |       1024    |      |    thrpt   | 5 |  164114.192 | 10465.749 | ops/s |
| Poly1305DigestBench.digestBytes  |      16384   |      |     thrpt  |  5  |  25377.886  |   99.310 | ops/s |
| Poly1305DigestBench.digestBytes  |    1048576 |      |       thrpt |   5 |     437.981  |   16.345 | ops/s |
| Poly1305DigestBench.updateBytes |          64     |     |     thrpt   | 5 | 3382150.723 | 46293.745 | ops/s |
| Poly1305DigestBench.updateBytes |         256    |     |     thrpt   | 5 | 1402297.248 |   660.704 | ops/s |
| Poly1305DigestBench.updateBytes |        1024   |     |      thrpt  |  5 |  423519.238 |  2663.734 | ops/s |
| Poly1305DigestBench.updateBytes |       16384  |     |       thrpt |   5 |   28370.601 |    47.013 | ops/s |
| Poly1305DigestBench.updateBytes |    1048576 |     |   thrpt    | 5  |    445.557    |  1.063 | ops/s |

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16417#issuecomment-1819273603


More information about the hotspot-compiler-dev mailing list