RFR: 8349106: Change ChaCha20 intrinsic to use quarter-round parallel implementation on aarch64 [v2]

Andrew Haley aph at openjdk.org
Tue Feb 4 09:23:12 UTC 2025


On Mon, 3 Feb 2025 23:56:18 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote:

>> This enhancement makes a change to the ChaCha20 block function intrinsic on aarch64, moving away from the block parallel implementation and to the quarter-round parallel implementation that was done on x86_64.  Assembly language profiling yielded an 11% improvement in throughput.  When put together as an intrinsic and hooked into the JCE ChaCha20 cipher, the gains are more modest, somewhere in the 2-4% range depending on job size, but still an improvement.
>
> Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add explanatory comment and reference for quarter round intrinsic

Marked as reviewed by aph (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/23397#pullrequestreview-2592206831


More information about the hotspot-compiler-dev mailing list