RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

Volodymyr Paprotski duke at openjdk.org
Tue Nov 15 19:43:18 UTC 2022


On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384:
>> 
>>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const Register a0, const Register a1, const Register a2, bool only128)
>>> 383: {
>>> 384:   const Register t1 = r13;
>> 
>> Please, make the temps explicit and lift them into arguments. Otherwise, it's hard to see what registers are clobbered when helper methods are called.
>
> Thanks for pointing this out.. I spent quite a bit of time and went back and forth on 'register allocation'... it does make sense to pass all the temps needed, when the number of temps is small. This is the case for the three `*_limbs_*` functions. Maybe I should indeed do that...
> 
> On other hand, there are functions like `poly1305_multiply8_avx512` and `poly1305_process_blocks_avx512` that use a _lot_ of temp registers. I think it makes sense to keep those as 'function-header declarations'. 
> 
> Then there are functions like `poly1305_multiply_scalar` that could go either way, has some temps and 'implicitly clobbered' registers, but probably should stay 'as is'..
> 
> I ended up being 'pedantic' and making _all_ temps into 'header variables'. I also tried to comment, but those probably mean more to me then anyone else in hindsight?
> 
> 
> // Register Map:
> // GPRs:
> //   input        = rdi
> //   length       = rbx
> //   accumulator  = rcx
> //   R   = r8
> //   a0  = rsi
> //   a1  = r9
> //   a2  = r10
> //   r0  = r11
> //   r1  = r12
> //   c1  = r8;
> //   t1  = r13
> //   t2  = r14
> //   t3  = r15
> //   t0  = r14
> //   rscratch = r13
> //   stack(rsp, rbp)
> //   imul(rax, rdx)
> // ZMMs:
> //   T: xmm0-6
> //   C: xmm7-9
> //   A: xmm13-18
> //   B: xmm19-24
> //   R: xmm25-29
> ...
>   // Register Map:
>   // reserved: rsp, rbp, rcx
>   // PARAMs: rdi, rbx, rsi, r8-r12
>   // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx
>   const Register t0 = r14;
>   const Register t1 = r13;
>   const Register rscratch = r13;
> 
>   // poly1305_limbs_avx512 clobbers: xmm0, xmm1
>   // poly1305_multiply8_avx512 clobbers: xmm0-xmm6
>   const XMMRegister T0 = xmm2;
> ...
> 
> 
> I think I am ok changing the `*limbs*` functions (even started, before I remembered my train of thought from months back..) but let me know if you agree with the rest of the reasoning?

Changed just the three `*limbs*` functions.

-------------

PR: https://git.openjdk.org/jdk/pull/10582


More information about the security-dev mailing list