RFR: 8296411: AArch64: Accelerated Poly1305 intrinsics

Andrew Haley aph at openjdk.org
Mon May 22 16:20:01 UTC 2023


This provides a solid speedup of about 3-4x over the Java implementation.

I have a vectorized version of this which uses a bunch of tricks to speed it up, but it's complex and can still be improved. We're getting close to ramp down, so I'm submitting this simple intrinsic so that we can get it reviewed in time.

Benchmarks:


ThunderX (2, I think):

Benchmark                        (dataSize)  (provider)   Mode  Cnt         Score         Error  Units
Poly1305DigestBench.updateBytes          64              thrpt    3  14078352.014 ± 4201407.966  ops/s
Poly1305DigestBench.updateBytes         256              thrpt    3   5154958.794 ± 1717146.980  ops/s
Poly1305DigestBench.updateBytes        1024              thrpt    3   1416563.273 ± 1311809.454  ops/s
Poly1305DigestBench.updateBytes       16384              thrpt    3     94059.570 ±    2913.021  ops/s
Poly1305DigestBench.updateBytes     1048576              thrpt    3      1441.024 ±     164.443  ops/s

Benchmark                        (dataSize)  (provider)   Mode  Cnt        Score        Error  Units
Poly1305DigestBench.updateBytes          64              thrpt    3  4516486.795 ± 419624.224  ops/s
Poly1305DigestBench.updateBytes         256              thrpt    3  1228542.774 ± 202815.694  ops/s
Poly1305DigestBench.updateBytes        1024              thrpt    3   316051.912 ±  23066.449  ops/s
Poly1305DigestBench.updateBytes       16384              thrpt    3    20649.561 ±   1094.687  ops/s
Poly1305DigestBench.updateBytes     1048576              thrpt    3      310.564 ±     31.053  ops/s

Apple M1:

Benchmark                        (dataSize)  (provider)   Mode  Cnt         Score        Error  Units
Poly1305DigestBench.updateBytes          64              thrpt    3  33551968.946 ± 849843.905  ops/s
Poly1305DigestBench.updateBytes         256              thrpt    3   9911637.214 ±  63417.224  ops/s
Poly1305DigestBench.updateBytes        1024              thrpt    3   2604370.740 ±  29208.265  ops/s
Poly1305DigestBench.updateBytes       16384              thrpt    3    165183.633 ±   1975.998  ops/s
Poly1305DigestBench.updateBytes     1048576              thrpt    3      2587.132 ±     40.240  ops/s

Benchmark                        (dataSize)  (provider)   Mode  Cnt         Score        Error  Units
Poly1305DigestBench.updateBytes          64              thrpt    3  12373649.589 ± 184757.721  ops/s
Poly1305DigestBench.updateBytes         256              thrpt    3   3112536.605 ±  14436.410  ops/s
Poly1305DigestBench.updateBytes        1024              thrpt    3    777184.018 ±   8774.478  ops/s
Poly1305DigestBench.updateBytes       16384              thrpt    3     50224.072 ±     29.004  ops/s
Poly1305DigestBench.updateBytes     1048576              thrpt    3       776.229 ±      8.086  ops/s

-------------

Commit messages:
 - Test
 - Cleanup
 - Initial commit

Changes: https://git.openjdk.org/jdk/pull/14085/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14085&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8296411
  Stats: 171 lines in 4 files changed: 170 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/14085.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14085/head:pull/14085

PR: https://git.openjdk.org/jdk/pull/14085


More information about the hotspot-dev mailing list