RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]
Volodymyr Paprotski
duke at openjdk.org
Wed Nov 9 02:22:01 UTC 2022
On Wed, 9 Nov 2022 00:38:45 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
>> @iwanowww moved to StubGenerator as suggested.. moving functions to the stubGenerator_x86_64.hpp header doesn't seem 'clean' but I think that's the pattern.
>>
>> The constant pool.. stared at it for a while and ended up keeping it mostly intact (its now a static function, not a member function; header bit cleaner; followed AES pattern).
>>
>> Did not split it up into individual constants. The main 'problem' is that `Address` and `ExternalAddress` are not compatible. Most instructions do not take `AddressLiteral`, so can't use `ExternalAddress` to refer to those constants. (If I did get the instructions I use to take `AddressLiteral`, I think we would end up with more `lea(rscratch)`s generated; but that's more of a silver-lining)
>>
>> I also thought of loading constants at run-time, (load and replicate for vector.. what I mentioned in my comment above) but that seems needlessly complicated in hindsight..
>
>> Did not split it up into individual constants. The main 'problem' is that Address and ExternalAddress are not compatible.
>
> There's a reason for that and it's because RIP-relative addressing doesn't always work, so additional register may be needed.
>
>> Most instructions do not take AddressLiteral, so can't use ExternalAddress to refer to those constants.
>
> I counted 4 instructions accessing the constants (`evpandq`, `andq`, `evporq`, and `vpternlogq`) in your patch.
>
> `macroAssembler_x86.hpp` is the place for `AddressLiteral`-related overloads (there are already numerous cases present) and it's trivial to add new ones.
>
>> (If I did get the instructions I use to take AddressLiteral, I think we would end up with more lea(rscratch)s generated; but that's more of a silver-lining)
>
> It depends on memory layout. If constants end up placed close enough in the address space, there'll be no additional instructions generated.
>
> Anyway, it doesn't look like something important from throughput perspective. Overall, I find it clearer when the code refers to individual constants through `AddressLiteral`s, but I'm also fine with it as it is now.
Makes sense to me, that would indeed be cleaner, will add a couple more overloads. (Still getting used to what is 'clean' in this code base).
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the hotspot-compiler-dev
mailing list