RFR: 8320448: Accelerate IndexOf using AVX2 [v19]
Volodymyr Paprotski
duke at openjdk.org
Wed May 15 20:26:17 UTC 2024
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark Score Latest
>> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x
>> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x
>> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x
>> StringIndexOf.constantPattern 9.361 11.906 1.271872663x
>> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x
>> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x
>> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x
>> StringIndexOf.success 9.186 9.713 1.057369911x
>> StringIndexOf.successBig 14.341 46.343 3.231504079x
>> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x
>> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x
>> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x
>> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x
>> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x
>> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x
>> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x
>> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
>
> Rearrange; add lambdas for clarity
First pass at StringIndexOfHuge.java and IndexOf.java
test/jdk/java/lang/StringBuffer/IndexOf.java line 40:
> 38: private static boolean failure = false;
> 39: public static void main(String[] args) throws Exception {
> 40: String testName = "IndexOf";
intentation
test/jdk/java/lang/StringBuffer/IndexOf.java line 47:
> 45: char[] haystack_16 = new char[128];
> 46:
> 47: for (int i = 0; i < 128; i++) {
you can use `char` instead of `int` as iterator
test/jdk/java/lang/StringBuffer/IndexOf.java line 54:
> 52: // for (int i = 1; i < 128; i++) {
> 53: // haystack_16[i] = (char) (i);
> 54: // }
dead code
test/jdk/java/lang/StringBuffer/IndexOf.java line 64:
> 62: Charset hs_charset = StandardCharsets.UTF_16;
> 63: Charset needleCharset = StandardCharsets.ISO_8859_1;
> 64: // Charset needleCharset = StandardCharsets.UTF_16;
Move from main() into a function that takes `needleCharset` as a parameter, then call that function twice.
test/jdk/java/lang/StringBuffer/IndexOf.java line 81:
> 79: sourceBuffer = new StringBuffer(sourceString);
> 80: targetString = generateTestString(10, 11);
> 81: } while (sourceString.indexOf(targetString) != -1);
Should really keep the original test unmodified and add new tests as needed
test/jdk/java/lang/StringBuffer/IndexOf.java line 83:
> 81: shs = "$&),,18+-!'8)+";
> 82: endNeedle = "8)-";
> 83: l_offset = 9;
dead code
test/jdk/java/lang/StringBuffer/IndexOf.java line 89:
> 87: StringBuffer bshs = new StringBuffer(shs);
> 88:
> 89: // printStringBytes(shs.getBytes(hs_charset));
dead code (and next two comments)
test/jdk/java/lang/StringBuffer/IndexOf.java line 90:
> 88:
> 89: // printStringBytes(shs.getBytes(hs_charset));
> 90: for (int i = 0; i < 200000; i++) {
This wont be a deterministic way to reach the intrinsic. I would suggest copying the idea from test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java
i.e. Have two `@run main` invocations at the top of this file, one with default parameters, one with `-Xcomp -XX:-TieredCompilation`. You dont need a 'driver' program, that was to handle something else.
/*
* @test
* @modules java.base/com.sun.crypto.provider
* @run main java.base/com.sun.crypto.provider.Poly1305KAT
* @summary Unit test for com.sun.crypto.provider.Poly1305.
*/
/*
* @test
* @modules java.base/com.sun.crypto.provider
* @summary Unit test for IntrinsicCandidate in com.sun.crypto.provider.Poly1305.
* @run main/othervm -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+ForceUnreachable java.base/com.sun.crypto.provider.Poly1305KAT
*/
test/jdk/java/lang/StringBuffer/IndexOf.java line 126:
> 124: int aNewLength = getRandomIndex(min, max);
> 125: for (int y = 0; y < aNewLength; y++) {
> 126: int achar = generator.nextInt(30) + 30;
This will only ever generate LL cases, i.e. chars from [30,60]. Could be parametrized to also produce utf16 if instead of 30, offset was in the unicode range
test/jdk/java/lang/StringBuffer/IndexOf.java line 199:
> 197: System.out.println("Source="+sourceString.substring(hsBegin, hsBegin + haystackLen));
> 198: System.out.println("Target="+targetString.substring(nBegin, nBegin + needleLen));
> 199: System.out.println("haystackLen="+haystackLen+" neeldeLen="+needleLen+" hsBegin="+hsBegin+" nBegin="+nBegin+
This looks like 'development scaffolding' (i.e. printf debugging) that was meant to be removed
test/jdk/java/lang/StringBuffer/IndexOf.java line 237:
> 235: + sourceBuffer.toString() + " len Buffer = " + sourceBuffer.toString().length());
> 236: System.err.println(" naive = " + naiveFind(sourceBuffer.toString(), targetString, 0) + ", IndexOf = "
> 237: + sourceBuffer.indexOf(targetString));
More tracing left behind here and rest of this function (original just recorded failure and moved along)
test/jdk/java/lang/StringBuffer/IndexOf.java line 284:
> 282:
> 283: // Note: it is possible although highly improbable that failCount will
> 284: // be > 0 even if everthing is working ok
This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`?
test/jdk/java/lang/StringBuffer/IndexOf.java line 295:
> 293: sourceString = generateTestString(99, 100);
> 294: sourceBuffer = new StringBuffer(sourceString);
> 295: targetString = generateTestString(10, 11);
Generate a random int [0,1,2] for LL, UU, UL, pass that as parameter to generateTestString() to test the other paths. Same for other tests in this file using this pattern.
This test is specific to haystacklen=100, needlelen=10.. what about other haystack/needle sizes to exercise all the paths in the intrinsic assembler (i.e. haystack >=, <=32, needlelen ={1,2,3,4,5..32..}). Elsewhere already?
test/jdk/java/lang/StringBuffer/IndexOf.java line 360:
> 358: System.err.println(" sAnswer = " + sAnswer + ", sbAnswer = " + sbAnswer);
> 359: System.err.println(" testString = '" + testString + "'");
> 360: System.err.println(" testBuffer = '" + testBuffer + "'");
tracing left here and further down
test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 2:
> 1: /*
> 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved.
New file, just 2024
test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 81:
> 79: lateMatchString16 = dataStringHuge16.substring(dataStringHuge16.length() - 31);
> 80:
> 81: searchString = "oscar";
Would had liked to see a few more small needles (i.e. to test/verify individual switch statement cases)
test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 94:
> 92:
> 93:
> 94: /** IndexOf Micros */
Would really had preferred @Param{"LL", "UU", "UL"}; would be easier to spot if there are any copy/paste errors..
test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132:
> 130: @Benchmark
> 131: public int searchHugeLargeSubstring() {
> 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74);
.repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement.
(String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString())
test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 242:
> 240: @Benchmark
> 241: public int search16HugeLargeSubstring16() {
> 242: return dataStringHuge16.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74);
`search16HugeLargeSubstring16` implies UU, but with `"B".repeat(30) + "X" + "A".repeat(30)` is UL
-------------
PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2058681000
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602136400
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602140456
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602137044
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602158011
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602160330
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602144091
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602147967
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602153043
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602181943
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602162587
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602167728
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602184697
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602198158
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602171418
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602200123
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602133525
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602130679
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602047091
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602115797
More information about the hotspot-compiler-dev
mailing list