RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v6]

Emanuel Peter epeter at openjdk.org
Tue Jan 9 15:20:33 UTC 2024


On Mon, 8 Jan 2024 20:48:39 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions.  This change accelerates String.IndexOf on average 1.3x for AVX2.  The benchmark numbers:
>> 
>> 
>> Benchmark	                                               Score		Latest		
>> StringIndexOf.advancedWithMediumSub   343.573		317.934		0.925375393x
>> StringIndexOf.advancedWithShortSub1	  1039.081		1053.96		1.014319384x
>> StringIndexOf.advancedWithShortSub2	      55.828		110.541		1.980027943x
>> StringIndexOf.constantPattern	                9.361		11.906		1.271872663x
>> StringIndexOf.searchCharLongSuccess	        4.216		4.218		1.000474383x
>> StringIndexOf.searchCharMediumSuccess	3.133		3.216		1.02649218x
>> StringIndexOf.searchCharShortSuccess	3.76		        3.761		1.000265957x
>> StringIndexOf.success	                                9.186		9.713		1.057369911x
>> StringIndexOf.successBig	                      14.341		46.343		3.231504079x
>> StringIndexOfChar.latin1_AVX2_String	  6220.918		12154.52		1.953814533x
>> StringIndexOfChar.latin1_AVX2_char	  5503.556		5540.044		1.006629895x
>> StringIndexOfChar.latin1_SSE4_String	  6978.854		6818.689		0.977049957x
>> StringIndexOfChar.latin1_SSE4_char	  5657.499		5474.624		0.967675646x
>> StringIndexOfChar.latin1_Short_String	  7132.541		6863.359		0.962260014x
>> StringIndexOfChar.latin1_Short_char	16013.389	      16162.437		1.009307711x
>> StringIndexOfChar.latin1_mixed_String	  7386.123	      14771.622		1.999915517x
>> StringIndexOfChar.latin1_mixed_char	  9901.671		9782.245		0.987938803
>
> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits:
> 
>  - Merge branch 'openjdk:master' into indexof
>  - Addressing review comments.
>  - Fix for JDK-8321599
>  - Support UU IndexOf
>  - Only use optimization when EnableX86ECoreOpts is true
>  - Fix whitespace
>  - Merge branch 'openjdk:master' into indexof
>  - Comments; added exhaustive-ish test
>  - Subtracting 0x10 twice.
>  - Stomped on r13 in switch branch calculation
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/8a4dc79e...600377b0

@asgibbons I cannot yet promise to review this, I just left a few comments after scrolling through this change. I'm especially scared of reviewing `src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp`.

I launched testing for Commit 21 / v05.

Maybe an ignorant question: How would avx512 be affected?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 83:

> 81: 
> 82: //   const __m256i first = _mm256_set1_epi8(needle[0]);
> 83: //   const __m256i last = _mm256_set1_epi8(needle[k - 1]);

I think it would be nicer if you had comment `//` on every line, and no gaps.

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1608:

> 1606:     //   vector compares when size is 2 * VEC_SIZE or less. 38    8. Use 4
> 1607:     //   vector compares when size is 4 * VEC_SIZE or less. 39    9. Use 8
> 1608:     //   vector compares when size is 8 * VEC_SIZE or less.  */

Is this formatting intended?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1672:

> 1670: 
> 1671:     // 98              VPCMPEQ VEC_SIZE(%rdi), %ymm2, %ymm2
> 1672:     // 99              vpmovmskb %ymm2, %eax

It seems that here the comments and code is strangely interleaved / shifted. What is this all for?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2301:

> 2299:     // 388             setg    %dl
> 2300:     // 389             leal    -1(%rdx, %rdx), %eax
> 2301:     __ movzbl(rcx, Address(rsi, rax, Address::times_1, -0x20));

Down here it is even worse

test/jdk/java/lang/StringBuffer/IndexOf.java line 34:

> 32: public class IndexOf {
> 33: 
> 34:   static Random generator = new Random(1999);

Would it be an alternative to use the this:

import jdk.test.lib.Utils;
...
Random random = Utils.getRandomInstance();

This has a random seed, but it is always printed in the output and can be set via a test-flag.

test/jdk/java/lang/StringBuffer/IndexOf.java line 44:

> 42:     }
> 43:     System.out.println("");
> 44:     generator.setSeed(1999);

Is there a good reason for a fixed seed?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-1811353722
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1446211178
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1446210544
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1446216019
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1446217305
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1446221928
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1446223038


More information about the core-libs-dev mailing list