Integrated: 8349106: Change ChaCha20 intrinsic to use quarter-round parallel implementation on aarch64
Jamil Nimeh
jnimeh at openjdk.org
Tue Feb 4 16:31:17 UTC 2025
On Fri, 31 Jan 2025 16:48:09 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote:
> This enhancement makes a change to the ChaCha20 block function intrinsic on aarch64, moving away from the block parallel implementation and to the quarter-round parallel implementation that was done on x86_64. Assembly language profiling yielded an 11% improvement in throughput. When put together as an intrinsic and hooked into the JCE ChaCha20 cipher, the gains are more modest, somewhere in the 2-4% range depending on job size, but still an improvement.
This pull request has now been integrated.
Changeset: ee4caa41
Author: Jamil Nimeh <jnimeh at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/ee4caa4180e76911ee75148583c2923f847f8605
Stats: 166 lines in 1 file changed: 71 ins; 1 del; 94 mod
8349106: Change ChaCha20 intrinsic to use quarter-round parallel implementation on aarch64
Reviewed-by: aph
-------------
PR: https://git.openjdk.org/jdk/pull/23397
More information about the hotspot-compiler-dev
mailing list