JDK-8173585: Intrinsify StringLatin1.indexOf(char)
Tatton, Jason
jptatton at amazon.com
Thu Sep 3 21:28:56 UTC 2020
Hi All,
Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8173585
http://cr.openjdk.java.net/~phh/8173585/webrev.00/
This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as code for C2.
Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and java/lang/StringLatin1.java
Details of testing:
==============
I have created a jtreg test “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new intrinsic. Note that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the input String. Hence the test has been designed to cover all these cases. In summary they are:
• A “short” string of < 16 characters.
• A SIMD String of 16 – 31 characters.
• A AVX2 SIMD String of 32 characters+.
Hardware used for testing:
------------------------------------
• Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support)
• Intel i7 processor (with AVX2 support).
• AWS Graviton 2 (ARM 64 processor).
I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.
Possible future enhancements:
=========================
For the x86 implementation there may be two further improvements we can make in order to improve performance of both the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
1. Make use of AVX-512 instructions.
2. For “short” Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD instructions instead of a loop.
JMH Benchmark results:
====================
The benchmarks examine the 3 codepaths for StringLatin1 and StringUTF16. Here are the results for Intel x86 (ARM is similar):
FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16):
Latin1 UTF16
Short: 15 7
SSE4: 16 8
AVX2: 32 16
Without StringLatin1 indexofchar intrinsic:
Benchmark Mode Cnt Score Error Units
IndexOfBenchmark.latin1_AVX2_String thrpt 5 121781.424 ± 355.085 ops/s
IndexOfBenchmark.latin1_AVX2_char thrpt 5 46060.612 ± 151.274 ops/s
IndexOfBenchmark.latin1_SSE4_String thrpt 5 197339.146 ± 90.333 ops/s
IndexOfBenchmark.latin1_SSE4_char thrpt 5 61401.204 ± 426.761 ops/s
IndexOfBenchmark.latin1_Short_String thrpt 5 175389.355 ± 294.976 ops/s
IndexOfBenchmark.latin1_Short_char thrpt 5 60759.868 ± 124.349 ops/s
IndexOfBenchmark.utf16_AVX2_String thrpt 5 123601.020 ± 111.981 ops/s
IndexOfBenchmark.utf16_AVX2_char thrpt 5 141116.832 ± 380.489 ops/s
IndexOfBenchmark.utf16_SSE4_String thrpt 5 178136.762 ± 143.227 ops/s
IndexOfBenchmark.utf16_SSE4_char thrpt 5 181430.649 ± 120.097 ops/s
IndexOfBenchmark.utf16_Short_String thrpt 5 158301.361 ± 182.738 ops/s
IndexOfBenchmark.utf16_Short_char thrpt 5 84876.919 ± 247.769 ops/s
With StringLatin1 indexofchar intrinsic:
Benchmark Mode Cnt Score Error Units
IndexOfBenchmark.latin1_AVX2_String thrpt 5 113621.676 ± 68.235 ops/s
IndexOfBenchmark.latin1_AVX2_char thrpt 5 177757.909 ± 727.308 ops/s
IndexOfBenchmark.latin1_SSE4_String thrpt 5 180529.049 ± 57.356 ops/s
IndexOfBenchmark.latin1_SSE4_char thrpt 5 235087.776 ± 457.024 ops/s
IndexOfBenchmark.latin1_Short_String thrpt 5 165914.990 ± 329.024 ops/s
IndexOfBenchmark.latin1_Short_char thrpt 5 53989.544 ± 65.393 ops/s
IndexOfBenchmark.utf16_AVX2_String thrpt 5 107632.783 ± 446.272 ops/s
IndexOfBenchmark.utf16_AVX2_char thrpt 5 143131.734 ± 159.944 ops/s
IndexOfBenchmark.utf16_SSE4_String thrpt 5 169882.703 ± 1024.367 ops/s
IndexOfBenchmark.utf16_SSE4_char thrpt 5 175693.972 ± 775.423 ops/s
IndexOfBenchmark.utf16_Short_String thrpt 5 163595.993 ± 225.089 ops/s
IndexOfBenchmark.utf16_Short_char thrpt 5 90126.154 ± 365.642 ops/s
We can see above that indexOf(char) now behaves similarly between StringUTF16 and StringLatin1.
JMH benchmark code:
------------------------------
package org.sample;
import java.util.Random;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
@State(Scope.Thread)
public class IndexOfBenchmark {
private static final int loops = 100000;
private static final Random rng = new Random(1999);
private static final int pathCnt = 1000;
private static final String [] latn1_short = new String[pathCnt];
private static final String [] latn1_sse4 = new String[pathCnt];
private static final String [] latn1_avx2 = new String[pathCnt];
private static final String [] utf16_short = new String[pathCnt];
private static final String [] utf16_sse4 = new String[pathCnt];
private static final String [] utf16_avx2 = new String[pathCnt];
static {
for (int i = 0; i < pathCnt; i++) {
latn1_short[i] = randomName(false, 15);
latn1_sse4[i] = randomName(false, 16);
latn1_avx2[i] = randomName(false, 32);
utf16_short[i] = randomName(true, 7);
utf16_sse4[i] = randomName(true, 8);
utf16_avx2[i] = randomName(true, 16);
}
}
private static String randomName(boolean isUtf16, int length) {
StringBuilder sb = new StringBuilder(length);
sb.append(isUtf16?'☺':'b');
for (int i = 1; i < length; i++) {
//'a is never present in rnd string
sb.append((char) ('b' + rng.nextInt(25)));
}
return sb.toString();
}
@Benchmark
public static void latin1_Short_char() {
int ret = 0;
for (String what : latn1_short) {
ret += what.indexOf('a');
}
}
@Benchmark
public static void latin1_SSE4_char() {
int ret = 0;
for (String what : latn1_sse4) {
ret += what.indexOf('a');
}
}
@Benchmark
public static void latin1_AVX2_char() {
int ret = 0;
for (String what : latn1_avx2) {
ret += what.indexOf('a');
}
}
@Benchmark
public static int utf16_Short_char() {
int ret = 0;
for (String what : utf16_short) {
ret += what.indexOf('a');
}
return ret;
}
@Benchmark
public static int utf16_SSE4_char() {
int ret = 0;
for (String what : utf16_sse4) {
ret += what.indexOf('a');
}
return ret;
}
@Benchmark
public static int utf16_AVX2_char() {
int ret = 0;
for (String what : utf16_avx2) {
ret += what.indexOf('a');
}
return ret;
}
@Benchmark
public static int latin1_Short_String() {
int ret = 0;
for (String what : latn1_short) {
ret += what.indexOf("a");
}
return ret;
}
@Benchmark
public static int latin1_SSE4_String() {
int ret = 0;
for (String what : latn1_sse4) {
ret += what.indexOf("a");
}
return ret;
}
@Benchmark
public static int latin1_AVX2_String() {
int ret = 0;
for (String what : latn1_avx2) {
ret += what.indexOf("a");
}
return ret;
}
@Benchmark
public static int utf16_Short_String() {
int ret = 0;
for (String what : utf16_short) {
ret += what.indexOf("a");
}
return ret;
}
@Benchmark
public static int utf16_SSE4_String() {
int ret = 0;
for (String what : utf16_sse4) {
ret += what.indexOf("a");
}
return ret;
}
@Benchmark
public static int utf16_AVX2_String() {
int ret = 0;
for (String what : utf16_avx2) {
ret += what.indexOf("a");
}
return ret;
}
}
--
Jason
More information about the core-libs-dev
mailing list