RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3]
Ludovic Henry
luhenry at openjdk.org
Mon Oct 31 13:22:32 UTC 2022
On Mon, 31 Oct 2022 02:35:18 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Require UseSSE >= 3 due transitive use of sse3 instructions from ReduceI
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3493:
>
>> 3491: // vnext = IntVector.broadcast(I256, power_of_31_backwards[0]);
>> 3492: movdl(vnext, InternalAddress(power_of_31_backwards + (0 * sizeof(jint))));
>> 3493: vpbroadcastd(vnext, vnext, Assembler::AVX_256bit);
>
> `vpbroadcastd` can take an `Address` argument instead.
An `InternalAddress` isn't an `Address` but an `AddressLiteral`. You can however do `as_Address(InternalAddress(power_of_31_backwards + (0 * sizeof(jint))))`
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528:
>
>> 3526: vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit);
>> 3527: }
>> 3528: jmp(LONG_VECTOR_LOOP_BEGIN);
>
> Calculating backward forces you to do calculating the coefficients on each iteration, I think doing this normally would be better.
But doing it forward requires a `reduceLane` on each iteration. It's faster to do it backward.
-------------
PR: https://git.openjdk.org/jdk/pull/10847
More information about the shenandoah-dev
mailing list