RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]
Volodymyr Paprotski
duke at openjdk.org
Wed Nov 16 21:34:22 UTC 2022
On Tue, 15 Nov 2022 19:38:56 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>>> On other hand, there are functions like poly1305_multiply8_avx512 and poly1305_process_blocks_avx512 that use a lot of temp registers. I think it makes sense to keep those as 'function-header declarations'.
>>
>> I agree with you on `poly1305_process_blocks_avx512`, but `poly1305_multiply8_avx512` already takes 8 arguments. Putting 8 more arguments for temps doesn't look prohibitive.
>>
>>> I think it makes sense to keep those as 'function-header declarations'.
>>
>> IMO it's not enough. Ideally, if there are any implicit usages, those should be clearly spelled out at every call site.
>
> Changed just the three `*limbs*` functions.
Lifted everything pretty much to just `poly1305_process_blocks_avx512` and `generate_poly1305_processBlocks` (i.e. two register maps)
Took some time to make it 'reasonable' again, but I think it makes sense. (But then, true test would be me looking a month later or if it makes sense to others)
Had to cleanup the names; 'local' names could all be play on `tmp`.. but the register reuse is much clearer from the 'global' names.
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the security-dev
mailing list