RFR: 8247645: ChaCha20 intrinsics

Andrew Haley aph at openjdk.org
Mon Nov 7 07:34:16 UTC 2022


On Thu, 18 Aug 2022 14:43:51 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4306:
>> 
>>> 4304:     __ subs(loopCtr, loopCtr, 1);
>>> 4305:     __ cmp(loopCtr, (u1)0);
>>> 4306:     __ br(Assembler::NE, L_twoRounds);
>> 
>> Same thing about subs-cmp0-bne.
>
> Thanks for the suggestion.  I actually have a version of the _blockpar cc20 block function intrinsic that uses a C++ for-loop around the cc20_quarter_round macro calls to generate that portion of the stub.  I believe that effectively unrolls the loop in the resulting stub and removes the need for the subs, cmp and br for all 10 iterations.  Right now the aarch64 has two versions of the same block function as I was play testing both.  I will probably end up removing the _qr (quarter-round parallel) version and favor the _blockpar (block-parallel) version as they both are pretty comparable in terms of speed, but the block parallel version seems to be a little better.
> 
> I'm always open to these other ways of handling the loop control as assembly is not my strong suit so I appreciate the suggestion!

Be careful about the code expansion. If you're not careful you'll blow away much of the icache for little benefit. For AES/GCM on AArch64 we have generate `_ghash_processBlocks()` and `generate_ghash_processBlocks_wide()`. We don't call the big one unless it's worth it. It all depends on how big the code turns out to be.

> Interesting, I had not considered that.  Thanks for pointing that out.  I'm honestly not sure how to evaluate the impact of the generated code on the icache.  I'll look at the logic surrounding the ghash processBlocks(_wide) code to see how that decision is made.  I don't have an aversion to going back to an assembly-based loop using the suggestions that @dchuyko made and maybe that's the right choice if it means more compact code.

It's not so complicated. if you can make the code smaller with negligible impact on throughput, do so. If not, don't.

-------------

PR: https://git.openjdk.org/jdk/pull/7702



More information about the security-dev mailing list