RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

Sandhya Viswanathan sviswanathan at openjdk.org
Mon May 6 22:43:57 UTC 2024


On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions.  This change accelerates String.IndexOf on average 1.3x for AVX2.  The benchmark numbers:
>> 
>> 
>> Benchmark	                                               Score		Latest		
>> StringIndexOf.advancedWithMediumSub   343.573		317.934		0.925375393x
>> StringIndexOf.advancedWithShortSub1	  1039.081		1053.96		1.014319384x
>> StringIndexOf.advancedWithShortSub2	      55.828		110.541		1.980027943x
>> StringIndexOf.constantPattern	                9.361		11.906		1.271872663x
>> StringIndexOf.searchCharLongSuccess	        4.216		4.218		1.000474383x
>> StringIndexOf.searchCharMediumSuccess	3.133		3.216		1.02649218x
>> StringIndexOf.searchCharShortSuccess	3.76		        3.761		1.000265957x
>> StringIndexOf.success	                                9.186		9.713		1.057369911x
>> StringIndexOf.successBig	                      14.341		46.343		3.231504079x
>> StringIndexOfChar.latin1_AVX2_String	  6220.918		12154.52		1.953814533x
>> StringIndexOfChar.latin1_AVX2_char	  5503.556		5540.044		1.006629895x
>> StringIndexOfChar.latin1_SSE4_String	  6978.854		6818.689		0.977049957x
>> StringIndexOfChar.latin1_SSE4_char	  5657.499		5474.624		0.967675646x
>> StringIndexOfChar.latin1_Short_String	  7132.541		6863.359		0.962260014x
>> StringIndexOfChar.latin1_Short_char	16013.389	      16162.437		1.009307711x
>> StringIndexOfChar.latin1_mixed_String	  7386.123	      14771.622		1.999915517x
>> StringIndexOfChar.latin1_mixed_char	  9901.671		9782.245		0.987938803
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rearrange; add lambdas for clarity

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1174:

> 1172: // Alignment specifying the maximum number of allowed bytes to pad.
> 1173: // If padding > max, no padding is inserted.
> 1174: void MacroAssembler::p2align(int modulus, int maxbytes) {

We could pass offset() as an argument to p2align. Basically have three arguments to p2align(modulus, target, maxbytes). Also maybe rename p2align as align then?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 208:

> 206:   ////////////////////////////////////////////////////////////////////////////////////////
> 207:   ////////////////////////////////////////////////////////////////////////////////////////
> 208:   if (VM_Version::supports_avx2()) {  // AVX2 version

Instead of the if check here, it would be better to do an assert here:
assert (VM_Version::supports_avx2(), "Needs AVX2 support");

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 233:

> 231:     ////////////////////////////////////////////////////////////////////////////////////////
> 232:     ////////////////////////////////////////////////////////////////////////////////////////
> 233: 

This comment can go right before the method start. Also good to add in the comment the native function parameters.

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238:

> 236:     const Register needle       = rdx;
> 237:     const Register needle_len   = rcx;
> 238: 

This is the calling convention on Linux. How is windows platform handled?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 260:

> 258:     // const XMMRegister save_rcx  = xmm11;
> 259:     // const XMMRegister save_r8   = xmm12;
> 260: 

This could be removed?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 279:

> 277:     fnptrs[isLL   ? StrIntrinsicNode::LL
> 278:            : isUU ? StrIntrinsicNode::UU
> 279:                   : StrIntrinsicNode::UL] = __ pc();

Could this not be simplified as:
 fnptrs[ae] = __ pc();

src/hotspot/share/opto/library_call.cpp line 1263:

> 1261:   if (result != nullptr) {
> 1262:     // The result is index relative to from_index if substring was found, -1 otherwise.
> 1263:     // Generate code which will fold into cmove.

Any reason to remove this comment?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591547667
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591612417
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591613215
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591617528
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591607921
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591618222
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591554296


More information about the core-libs-dev mailing list