RFR: 8349106: Change ChaCha20 intrinsic to use quarter-round parallel implementation on aarch64 [v2]
Jamil Nimeh
jnimeh at openjdk.org
Mon Feb 3 23:56:18 UTC 2025
> This enhancement makes a change to the ChaCha20 block function intrinsic on aarch64, moving away from the block parallel implementation and to the quarter-round parallel implementation that was done on x86_64. Assembly language profiling yielded an 11% improvement in throughput. When put together as an intrinsic and hooked into the JCE ChaCha20 cipher, the gains are more modest, somewhere in the 2-4% range depending on job size, but still an improvement.
Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision:
Add explanatory comment and reference for quarter round intrinsic
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/23397/files
- new: https://git.openjdk.org/jdk/pull/23397/files/41817c77..6ba0770b
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=23397&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=23397&range=00-01
Stats: 25 lines in 1 file changed: 25 ins; 0 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/23397.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23397/head:pull/23397
PR: https://git.openjdk.org/jdk/pull/23397
More information about the hotspot-compiler-dev
mailing list