RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

Volodymyr Paprotski duke at openjdk.org
Wed Nov 9 21:57:46 UTC 2022


On Tue, 8 Nov 2022 23:59:42 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix 32-bit build
>
> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
> 
>> 173: 
>> 174:         int blockMultipleLength = len & (~(BLOCK_LENGTH-1));
>> 175:         Objects.checkFromIndexSize(offset, blockMultipleLength, input.length);
> 
> I suggest to move the checks into `processMultipleBlocks`, introduce new static helper method specifically for the intrinsic part, and lift more logic (e.g., field loads) from the intrinsic into Java code.
> 
> As an additional step, you can switch to double-register addressing mode (base + offset) for input data (`input`, `alimbs`, `rlimbs`) and simplify the intrinsic part even more (will involve a switch from `array_element_address` to `make_unsafe_address`).

`array_element_address` vs `make_unsafe_address`. Don't know that I understood.. but going to guess :)
"It might be cleaner to encode base+offset into the instruction opcode, save some `lea`s" 

I think that ship has 'sailed'?
- `input`: I went and removed `offset` from intrinsic stub parameter list and instead passed it to `array_element_address`. But also, because I was really running out of GPRs, I had to do a `lea` before that at the function entry. Can't keep the offset register free for encoding..
- `alimbs`: offset already 0. Also, I mostly keep the actual value `a2:a1:a0` around. Just need address to write result back out.
- `rlimbs`: offset already 0 and address itself discarded right after loading the R value into 2 GPRs.

-------------

PR: https://git.openjdk.org/jdk/pull/10582


More information about the security-dev mailing list