RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]
Vladimir Ivanov
vlivanov at openjdk.org
Tue Nov 15 23:56:59 UTC 2022
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384:
>>
>>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const Register a0, const Register a1, const Register a2, bool only128)
>>> 383: {
>>> 384: const Register t1 = r13;
>>
>> Please, make the temps explicit and lift them into arguments. Otherwise, it's hard to see what registers are clobbered when helper methods are called.
>
> Thanks for pointing this out.. I spent quite a bit of time and went back and forth on 'register allocation'... it does make sense to pass all the temps needed, when the number of temps is small. This is the case for the three `*_limbs_*` functions. Maybe I should indeed do that...
>
> On other hand, there are functions like `poly1305_multiply8_avx512` and `poly1305_process_blocks_avx512` that use a _lot_ of temp registers. I think it makes sense to keep those as 'function-header declarations'.
>
> Then there are functions like `poly1305_multiply_scalar` that could go either way, has some temps and 'implicitly clobbered' registers, but probably should stay 'as is'..
>
> I ended up being 'pedantic' and making _all_ temps into 'header variables'. I also tried to comment, but those probably mean more to me then anyone else in hindsight?
>
>
> // Register Map:
> // GPRs:
> // input = rdi
> // length = rbx
> // accumulator = rcx
> // R = r8
> // a0 = rsi
> // a1 = r9
> // a2 = r10
> // r0 = r11
> // r1 = r12
> // c1 = r8;
> // t1 = r13
> // t2 = r14
> // t3 = r15
> // t0 = r14
> // rscratch = r13
> // stack(rsp, rbp)
> // imul(rax, rdx)
> // ZMMs:
> // T: xmm0-6
> // C: xmm7-9
> // A: xmm13-18
> // B: xmm19-24
> // R: xmm25-29
> ...
> // Register Map:
> // reserved: rsp, rbp, rcx
> // PARAMs: rdi, rbx, rsi, r8-r12
> // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx
> const Register t0 = r14;
> const Register t1 = r13;
> const Register rscratch = r13;
>
> // poly1305_limbs_avx512 clobbers: xmm0, xmm1
> // poly1305_multiply8_avx512 clobbers: xmm0-xmm6
> const XMMRegister T0 = xmm2;
> ...
>
>
> I think I am ok changing the `*limbs*` functions (even started, before I remembered my train of thought from months back..) but let me know if you agree with the rest of the reasoning?
> On other hand, there are functions like poly1305_multiply8_avx512 and poly1305_process_blocks_avx512 that use a lot of temp registers. I think it makes sense to keep those as 'function-header declarations'.
I agree with you on `poly1305_process_blocks_avx512`, but `poly1305_multiply8_avx512` already takes 8 arguments. Putting 8 more arguments for temps doesn't look prohibitive.
> I think it makes sense to keep those as 'function-header declarations'.
IMO it's not enough. Ideally, if there are any implicit usages, those should be clearly spelled out at every call site.
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the hotspot-dev
mailing list