RFR: 8296411: AArch64: Accelerated Poly1305 intrinsics
Andrew Haley
aph at openjdk.org
Mon May 22 16:20:01 UTC 2023
This provides a solid speedup of about 3-4x over the Java implementation.
I have a vectorized version of this which uses a bunch of tricks to speed it up, but it's complex and can still be improved. We're getting close to ramp down, so I'm submitting this simple intrinsic so that we can get it reviewed in time.
Benchmarks:
ThunderX (2, I think):
Benchmark (dataSize) (provider) Mode Cnt Score Error Units
Poly1305DigestBench.updateBytes 64 thrpt 3 14078352.014 ± 4201407.966 ops/s
Poly1305DigestBench.updateBytes 256 thrpt 3 5154958.794 ± 1717146.980 ops/s
Poly1305DigestBench.updateBytes 1024 thrpt 3 1416563.273 ± 1311809.454 ops/s
Poly1305DigestBench.updateBytes 16384 thrpt 3 94059.570 ± 2913.021 ops/s
Poly1305DigestBench.updateBytes 1048576 thrpt 3 1441.024 ± 164.443 ops/s
Benchmark (dataSize) (provider) Mode Cnt Score Error Units
Poly1305DigestBench.updateBytes 64 thrpt 3 4516486.795 ± 419624.224 ops/s
Poly1305DigestBench.updateBytes 256 thrpt 3 1228542.774 ± 202815.694 ops/s
Poly1305DigestBench.updateBytes 1024 thrpt 3 316051.912 ± 23066.449 ops/s
Poly1305DigestBench.updateBytes 16384 thrpt 3 20649.561 ± 1094.687 ops/s
Poly1305DigestBench.updateBytes 1048576 thrpt 3 310.564 ± 31.053 ops/s
Apple M1:
Benchmark (dataSize) (provider) Mode Cnt Score Error Units
Poly1305DigestBench.updateBytes 64 thrpt 3 33551968.946 ± 849843.905 ops/s
Poly1305DigestBench.updateBytes 256 thrpt 3 9911637.214 ± 63417.224 ops/s
Poly1305DigestBench.updateBytes 1024 thrpt 3 2604370.740 ± 29208.265 ops/s
Poly1305DigestBench.updateBytes 16384 thrpt 3 165183.633 ± 1975.998 ops/s
Poly1305DigestBench.updateBytes 1048576 thrpt 3 2587.132 ± 40.240 ops/s
Benchmark (dataSize) (provider) Mode Cnt Score Error Units
Poly1305DigestBench.updateBytes 64 thrpt 3 12373649.589 ± 184757.721 ops/s
Poly1305DigestBench.updateBytes 256 thrpt 3 3112536.605 ± 14436.410 ops/s
Poly1305DigestBench.updateBytes 1024 thrpt 3 777184.018 ± 8774.478 ops/s
Poly1305DigestBench.updateBytes 16384 thrpt 3 50224.072 ± 29.004 ops/s
Poly1305DigestBench.updateBytes 1048576 thrpt 3 776.229 ± 8.086 ops/s
-------------
Commit messages:
- Test
- Cleanup
- Initial commit
Changes: https://git.openjdk.org/jdk/pull/14085/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14085&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8296411
Stats: 171 lines in 4 files changed: 170 ins; 0 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/14085.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14085/head:pull/14085
PR: https://git.openjdk.org/jdk/pull/14085
More information about the security-dev
mailing list