RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]
Claes Redestad
redestad at openjdk.org
Mon Oct 31 12:32:34 UTC 2022
On Mon, 31 Oct 2022 02:21:44 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Reorder loops and some other suggestions from @merykitty
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3358:
>
>> 3356: movl(result, is_string_hashcode ? 0 : 1);
>> 3357:
>> 3358: // if (cnt1 == 0) {
>
> You may want to reorder the execution of the loops, a short array suffers more from processing than a big array, so you should have minimum extra hops for those. For example, I think this could be:
>
> if (cnt1 >= 4) {
> if (cnt1 >= 16) {
> UNROLLED VECTOR LOOP
> SINGLE VECTOR LOOP
> }
> UNROLLED SCALAR LOOP
> }
> SINGLE SCALAR LOOP
>
> The thresholds are arbitrary and need to be measured carefully.
Fixed
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3374:
>
>> 3372:
>> 3373: // int i = 0;
>> 3374: movl(index, 0);
>
> `xorl(index, index)`
Fixed
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3418:
>
>> 3416: // } else { // cnt1 >= 32
>> 3417: address power_of_31_backwards = pc();
>> 3418: emit_int32( 2111290369);
>
> Can this giant table be shared among compilations instead?
Probably, though I'm not entirely sure on how. Maybe the "long" cases should be factored out into a set of stub routines so that it's not inlined in numerous places anyway.
-------------
PR: https://git.openjdk.org/jdk/pull/10847
More information about the shenandoah-dev
mailing list