RFR: 8320448: Accelerate IndexOf using AVX2 [v43]
Sandhya Viswanathan
sviswanathan at openjdk.org
Tue May 28 20:40:17 UTC 2024
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333:
>>
>>> 1331:
>>> 1332: __ cmpq(nMinusK, 32);
>>> 1333: __ jae_b(L_greaterThan32);
>>
>> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31
>> __ cmpq(nMinusK, 31);
>> __ jae_b(L_greaterThan32);
>
> No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name.
We can also do full reads for (n-k) == 31, as we also compare the kth byte.
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750:
>>
>>> 1748: // r15 = unused
>>> 1749: // XMM_BYTE_0 - first element of needle, broadcast
>>> 1750: // XMM_BYTE_K - last element of needle, broadcast
>>
>> This comment is duplicated for both small haystack case and big haystack case, could be made a common comment.
>> Also the only registers that are used as input in the switch case are:
>> r14 = needle
>> rbx = haystack
>> rsi = haystack length (n)
>> r12 = needle length (k)
>> r10 = n - k (where k is needle length)
>> XMM_BYTE_0 = first element of needle, broadcast
>> XMM_BYTE_K = last element of needle, broadcast
>> So we could only list these, making it easier to comprehend.
>
> I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity.
Sounds good. We could keep only comment out of the two as it is the same for both small haystack and big haystack.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617862799
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617865049
More information about the hotspot-compiler-dev
mailing list