RFR: 8247645: ChaCha20 intrinsics

Jamil Nimeh jnimeh at openjdk.org
Mon Nov 7 07:34:17 UTC 2022


On Fri, 16 Sep 2022 09:27:39 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Interesting, I had not considered that.  Thanks for pointing that out.  I'm honestly not sure how to evaluate the impact of the generated code on the icache.  I'll look at the logic surrounding the ghash processBlocks(_wide) code to see how that decision is made.  I don't have an aversion to going back to an assembly-based loop using the suggestions that @dchuyko made and maybe that's the right choice if it means more compact code.
>
> It's not so complicated. if you can make the code smaller with negligible impact on throughput, do so. If not, don't.

I really didn't see a noticeable impact on performance with the loop unrolled so I'm going with the SUB/CBNZ approach.  Seems like it does the best job of keeping the generated stub smaller and still be a tiny bit more efficient than what I started with.  As always, I appreciate the suggestions.

-------------

PR: https://git.openjdk.org/jdk/pull/7702


More information about the security-dev mailing list