RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3]

Mon Oct 31 13:22:32 UTC 2022

On Mon, 31 Oct 2022 02:35:18 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Require UseSSE >= 3 due transitive use of sse3 instructions from ReduceI
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3493:
> 
>> 3491:   // vnext = IntVector.broadcast(I256, power_of_31_backwards[0]);
>> 3492:   movdl(vnext, InternalAddress(power_of_31_backwards + (0 * sizeof(jint))));
>> 3493:   vpbroadcastd(vnext, vnext, Assembler::AVX_256bit);
> 
> `vpbroadcastd` can take an `Address` argument instead.

An `InternalAddress` isn't an `Address` but an `AddressLiteral`. You can however do `as_Address(InternalAddress(power_of_31_backwards + (0 * sizeof(jint))))`

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528:
> 
>> 3526:     vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit);
>> 3527:   }
>> 3528:   jmp(LONG_VECTOR_LOOP_BEGIN);
> 
> Calculating backward forces you to do calculating the coefficients on each iteration, I think doing this normally would be better.

But doing it forward requires a `reduceLane` on each iteration. It's faster to do it backward.

-------------

PR: https://git.openjdk.org/jdk/pull/10847