RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

Sandhya Viswanathan sviswanathan at openjdk.org
Mon May 6 23:21:57 UTC 2024


On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions.  This change accelerates String.IndexOf on average 1.3x for AVX2.  The benchmark numbers:
>> 
>> 
>> Benchmark	                                               Score		Latest		
>> StringIndexOf.advancedWithMediumSub   343.573		317.934		0.925375393x
>> StringIndexOf.advancedWithShortSub1	  1039.081		1053.96		1.014319384x
>> StringIndexOf.advancedWithShortSub2	      55.828		110.541		1.980027943x
>> StringIndexOf.constantPattern	                9.361		11.906		1.271872663x
>> StringIndexOf.searchCharLongSuccess	        4.216		4.218		1.000474383x
>> StringIndexOf.searchCharMediumSuccess	3.133		3.216		1.02649218x
>> StringIndexOf.searchCharShortSuccess	3.76		        3.761		1.000265957x
>> StringIndexOf.success	                                9.186		9.713		1.057369911x
>> StringIndexOf.successBig	                      14.341		46.343		3.231504079x
>> StringIndexOfChar.latin1_AVX2_String	  6220.918		12154.52		1.953814533x
>> StringIndexOfChar.latin1_AVX2_char	  5503.556		5540.044		1.006629895x
>> StringIndexOfChar.latin1_SSE4_String	  6978.854		6818.689		0.977049957x
>> StringIndexOfChar.latin1_SSE4_char	  5657.499		5474.624		0.967675646x
>> StringIndexOfChar.latin1_Short_String	  7132.541		6863.359		0.962260014x
>> StringIndexOfChar.latin1_Short_char	16013.389	      16162.437		1.009307711x
>> StringIndexOfChar.latin1_mixed_String	  7386.123	      14771.622		1.999915517x
>> StringIndexOfChar.latin1_mixed_char	  9901.671		9782.245		0.987938803
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rearrange; add lambdas for clarity

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314:

> 312: 
> 313:     // needle_len is in elements, not bytes, for UTF-16
> 314:     __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : OPT_NEEDLE_SIZE_MAX);

OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329:

> 327:     ////////////////////////////////////////////////////////////////////////////////////////
> 328: 
> 329:     __ bind(L_begin);

So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below.  A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591640551
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591646095


More information about the core-libs-dev mailing list