RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

Claes Redestad redestad at openjdk.org
Mon Oct 31 12:32:34 UTC 2022


On Mon, 31 Oct 2022 02:21:44 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Reorder loops and some other suggestions from @merykitty
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3358:
> 
>> 3356:   movl(result, is_string_hashcode ? 0 : 1);
>> 3357: 
>> 3358:   // if (cnt1 == 0) {
> 
> You may want to reorder the execution of the loops, a short array suffers more from processing than a big array, so you should have minimum extra hops for those. For example, I think this could be:
> 
>     if (cnt1 >= 4) {
>         if (cnt1 >= 16) {
>             UNROLLED VECTOR LOOP
>             SINGLE VECTOR LOOP
>         }
>         UNROLLED SCALAR LOOP
>     }
>     SINGLE SCALAR LOOP
> 
> The thresholds are arbitrary and need to be measured carefully.

Fixed

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3374:
> 
>> 3372: 
>> 3373:   // int i = 0;
>> 3374:   movl(index, 0);
> 
> `xorl(index, index)`

Fixed

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3418:
> 
>> 3416:   // } else { // cnt1 >= 32
>> 3417:   address power_of_31_backwards = pc();
>> 3418:   emit_int32( 2111290369);
> 
> Can this giant table be shared among compilations instead?

Probably, though I'm not entirely sure on how. Maybe the "long" cases should be factored out into a set of stub routines so that it's not inlined in numerous places anyway.

-------------

PR: https://git.openjdk.org/jdk/pull/10847


More information about the shenandoah-dev mailing list