RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

Scott Gibbons sgibbons at openjdk.org
Tue May 28 18:30:31 UTC 2024


On Tue, 28 May 2024 12:48:19 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix tests
>
> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239:
> 
>> 237:   //  the needle size is less than 32 bytes, we default to a
>> 238:   //  byte-by-byte comparison (this will be rare).
>> 239:   //
> 
> Is this still true?

Yes.  For UL, the code within `L_compareFull` effectively does byte-by-byte.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278:
> 
>> 276:   __ bind(L_nextCheck);
>> 277:   __ testq(haystack_len_p, haystack_len_p);
>> 278:   __ je(L_zeroCheckFailed);
> 
> This check could be removed as the next check covers this one.

No.  This is checking for a zero length haystack.  The following compare checks for needle length longer than haystack, regardless of the value in each.  The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 360:
> 
>> 358:   __ push(rcx);
>> 359:   __ push(r8);
>> 360:   __ push(r9);
> 
> No need to save/restore rcx/r8/r9 on windows platform as well.

OK.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 379:
> 
>> 377: 
>> 378:   // Assume failure
>> 379:   __ movq(rbp, -1);
> 
> We are no more using rbp at return point so this is not needed now?

Removed.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488:
> 
>> 486:   __ cmpq(r11, nMinusK);
>> 487:   __ ja_b(L_return);
>> 488:   __ movq(rax, r11);
> 
> At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label.

I don't want to change this because its reason for existence is to ensure we don't return a value that's beyond the end of the haystack.  We don't yet have a good enough test to validate whether we're reading past the end of the haystack, so I like this as insurance.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 566:
> 
>> 564:     //  rbp: -1
>> 565:     //  XMM_BYTE_0 - first element of needle broadcast
>> 566:     //  XMM_BYTE_K - last element of needle broadcast
> 
> The only registers that are used as input in the switch case are:
> r14 = needle
> rbx = haystack
> rsi = haystack length (n)
> r12 = needle length (k)
> r10 = n - k (where k is needle length)
> XMM_BYTE_0 = first element of needle, broadcast
> XMM_BYTE_K = last element of needle, broadcast
> So we could only list these, making it easier to comprehend.

I listed these registers to make it clear which registers had no expected value and could be used for temps, etc.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578:
> 
>> 576:     // helper jumps to L_checkRangeAndReturn with a (-1) return value.
>> 577:     big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen,
>> 578:                          needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm);
> 
> If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to  L_retrunError.

Again, I think we ought to leave this in.  Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check.  Once we have a good enough test to check reading past the end of the haystack we can change it.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 597:
> 
>> 595: 
>> 596:     // Need a lot of registers here to preserve state across arrays_equals call
>> 597: 
> 
> This comment is no longer valid, could be removed.

OK

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621:
> 
>> 619:     __ addq(hsPtrRet, index);
>> 620:     __ movq(r11, hsPtrRet);
>> 621:     __ jmp(L_checkRangeAndReturn);
> 
> Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax).

There are cases where r11 could have a value that, when added to (k - 1) would go past the end of the haystack.  I did all in my power to ensure that it doesn't but there's no test I know of to ensure that condition.  I would recommend leaving this in for now.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 660:
> 
>> 658:   //  Haystack always copied to stack, so 32-byte reads OK
>> 659:   //  Haystack length < 32
>> 660:   //  10 < needle length < 32
> 
> Haystack length <= 32
> 10 < needle length <= 32

Changed.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 721:
> 
>> 719:                        false /* char */, knoreg);
>> 720:     __ testl(rTmp3, rTmp3);
>> 721:     __ jne(L_checkRangeAndReturn);
> 
> Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax).

OK

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333:
> 
>> 1331: 
>> 1332:   __ cmpq(nMinusK, 32);
>> 1333:   __ jae_b(L_greaterThan32);
> 
> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31
>  __ cmpq(nMinusK, 31);
>  __ jae_b(L_greaterThan32);

No. For (n-k)==32 we can do full reads.  I'll clarify by changing the label name.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1382:
> 
>> 1380: 
>> 1381:   __ testl(eq_mask, eq_mask);
>> 1382:   __ je(noMatch);
> 
> We are mixing operation width l and q here.

Fixed.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750:
> 
>> 1748:     //  r15 = unused
>> 1749:     //  XMM_BYTE_0 - first element of needle, broadcast
>> 1750:     //  XMM_BYTE_K - last element of needle, broadcast
> 
> This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. 
> Also the only registers that are used as input in the switch case are:
> r14 = needle
> rbx = haystack
> rsi = haystack length (n)
> r12 = needle length (k)
> r10 = n - k (where k is needle length)
> XMM_BYTE_0 = first element of needle, broadcast
> XMM_BYTE_K = last element of needle, broadcast
> So we could only list these, making it easier to comprehend.

I listed all registers for clarity.  This ensures that we know what can be used as values or as scratch registers with no ambiguity.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1758:
> 
>> 1756:     //
>> 1757:     // If a match is found, jump to L_checkRange, which ensures the
>> 1758:     // matched needle is not past the end of the haystack.
> 
> Another comment here would be useful:
> // The found index is returned in set_bit (r11).

Added.

> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1810:
> 
>> 1808:     //  XMM_BYTE_K - last element of needle, broadcast
>> 1809:     //
>> 1810:     //  The haystack is > 32 bytes
> 
> Good to mention some info about the return found index value in comment about how it is a combination of set_bit (r8), hs_ptr, and haystack.

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617663227
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617667775
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617669103
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617671612
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617673870
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617680570
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617699879
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617700813
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617704836
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617705505
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617711973
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617713299
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617714825
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617716598
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617717873
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617726261


More information about the hotspot-compiler-dev mailing list