JDK-8173585: Intrinsify StringLatin1.indexOf(char)

Tatton, Jason jptatton at amazon.com
Thu Sep 3 21:28:56 UTC 2020


Hi All,

Please review the following patch:

https://bugs.openjdk.java.net/browse/JDK-8173585
http://cr.openjdk.java.net/~phh/8173585/webrev.00/

This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as code for C2.

Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and java/lang/StringLatin1.java

Details of testing: 
==============
I have created a jtreg test “compiler/intrinsics/string/TestStringLatin1IndexOfChar” to cover this new intrinsic. Note that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the input String. Hence the test has been designed to cover all these cases. In summary they are:
• A “short” string of < 16 characters.
• A SIMD String of 16 – 31 characters.
• A AVX2 SIMD String of 32 characters+.

Hardware used for testing: 
------------------------------------
• Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support)
• Intel i7 processor (with AVX2 support).
• AWS Graviton 2 (ARM 64 processor).

I also ran; ‘run-test-tier1’ and ‘run-test-tier2’ for: x86_64 and aarch64.

Possible future enhancements: 
=========================
For the x86 implementation there may be two further improvements we can make in order to improve performance of both the StringUTF16 and StringLatin1 indexOf(char) intrinsics:
1. Make use of AVX-512 instructions.
2. For “short” Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD instructions instead of a loop.


JMH Benchmark results:
====================
The benchmarks examine the 3 codepaths for StringLatin1 and StringUTF16. Here are the results for Intel x86 (ARM is similar):

FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16):
       Latin1  UTF16
Short: 15       7
SSE4:  16       8 
AVX2:  32       16

Without StringLatin1 indexofchar intrinsic:
Benchmark                              Mode  Cnt       Score      Error  Units
IndexOfBenchmark.latin1_AVX2_String   thrpt    5  121781.424 ± 355.085  ops/s
IndexOfBenchmark.latin1_AVX2_char     thrpt    5   46060.612 ± 151.274  ops/s
IndexOfBenchmark.latin1_SSE4_String   thrpt    5  197339.146 ±  90.333  ops/s
IndexOfBenchmark.latin1_SSE4_char     thrpt    5   61401.204 ± 426.761  ops/s
IndexOfBenchmark.latin1_Short_String  thrpt    5  175389.355 ± 294.976  ops/s
IndexOfBenchmark.latin1_Short_char    thrpt    5   60759.868 ± 124.349  ops/s
IndexOfBenchmark.utf16_AVX2_String    thrpt    5  123601.020 ± 111.981  ops/s
IndexOfBenchmark.utf16_AVX2_char      thrpt    5  141116.832 ± 380.489  ops/s
IndexOfBenchmark.utf16_SSE4_String    thrpt    5  178136.762 ± 143.227  ops/s
IndexOfBenchmark.utf16_SSE4_char      thrpt    5  181430.649 ± 120.097  ops/s
IndexOfBenchmark.utf16_Short_String   thrpt    5  158301.361 ± 182.738  ops/s
IndexOfBenchmark.utf16_Short_char     thrpt    5   84876.919 ± 247.769  ops/s

With StringLatin1 indexofchar intrinsic:
Benchmark                              Mode  Cnt       Score      Error  Units
IndexOfBenchmark.latin1_AVX2_String   thrpt    5  113621.676 ±   68.235  ops/s
IndexOfBenchmark.latin1_AVX2_char     thrpt    5  177757.909 ±  727.308  ops/s
IndexOfBenchmark.latin1_SSE4_String   thrpt    5  180529.049 ±   57.356  ops/s
IndexOfBenchmark.latin1_SSE4_char     thrpt    5  235087.776 ±  457.024  ops/s
IndexOfBenchmark.latin1_Short_String  thrpt    5  165914.990 ±  329.024  ops/s
IndexOfBenchmark.latin1_Short_char    thrpt    5   53989.544 ±   65.393  ops/s
IndexOfBenchmark.utf16_AVX2_String    thrpt    5  107632.783 ±  446.272  ops/s
IndexOfBenchmark.utf16_AVX2_char      thrpt    5  143131.734 ±  159.944  ops/s
IndexOfBenchmark.utf16_SSE4_String    thrpt    5  169882.703 ± 1024.367  ops/s
IndexOfBenchmark.utf16_SSE4_char      thrpt    5  175693.972 ±  775.423  ops/s
IndexOfBenchmark.utf16_Short_String   thrpt    5  163595.993 ±  225.089  ops/s
IndexOfBenchmark.utf16_Short_char     thrpt    5   90126.154 ±  365.642  ops/s

We can see above that indexOf(char) now behaves similarly between StringUTF16 and StringLatin1. 

JMH benchmark code:
------------------------------

package org.sample;

import java.util.Random;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

@State(Scope.Thread)
public class IndexOfBenchmark {
    private static final int loops = 100000;
    private static final Random rng = new Random(1999);
    private static final int pathCnt = 1000;
    private static final String [] latn1_short = new String[pathCnt];
    private static final String [] latn1_sse4  = new String[pathCnt];
    private static final String [] latn1_avx2  = new String[pathCnt];
    private static final String [] utf16_short = new String[pathCnt];
    private static final String [] utf16_sse4  = new String[pathCnt];
    private static final String [] utf16_avx2  = new String[pathCnt];
    static {
        for (int i = 0; i < pathCnt; i++) {
            latn1_short[i] = randomName(false, 15);
            latn1_sse4[i]  = randomName(false, 16);
            latn1_avx2[i]  = randomName(false, 32);
            utf16_short[i] = randomName(true, 7);
            utf16_sse4[i]  = randomName(true, 8);
            utf16_avx2[i]  = randomName(true, 16);
        }
    }

    private static String randomName(boolean isUtf16, int length) {
        StringBuilder sb = new StringBuilder(length);
        sb.append(isUtf16?'☺':'b');
        
        for (int i = 1; i < length; i++) {
            //'a is never present in rnd string
            sb.append((char) ('b' + rng.nextInt(25)));
        }
        return sb.toString();
    }

    @Benchmark
    public static void latin1_Short_char() {
        int ret = 0;
        for (String what : latn1_short) {
            ret += what.indexOf('a');
        }
    }

    @Benchmark
    public static void latin1_SSE4_char() {
        int ret = 0;
        for (String what : latn1_sse4) {
            ret += what.indexOf('a');
        }
    }

    @Benchmark
    public static void latin1_AVX2_char() {
        int ret = 0;
        for (String what : latn1_avx2) {
            ret += what.indexOf('a');
        }
    }

    @Benchmark
    public static int utf16_Short_char() {
        int ret = 0;
        for (String what : utf16_short) {
            ret += what.indexOf('a');
        }
        return ret;
    }

    @Benchmark
    public static int utf16_SSE4_char() {
        int ret = 0;
        for (String what : utf16_sse4) {
            ret += what.indexOf('a');
        }
        return ret;
    }
    
    @Benchmark
    public static int utf16_AVX2_char() {
        int ret = 0;
        for (String what : utf16_avx2) {
            ret += what.indexOf('a');
        }
        return ret;
    }

    @Benchmark
    public static int latin1_Short_String() {
        int ret = 0;
        for (String what : latn1_short) {
            ret += what.indexOf("a");
        }
        return ret;
    }

    @Benchmark
    public static int latin1_SSE4_String() {
        int ret = 0;
        for (String what : latn1_sse4) {
            ret += what.indexOf("a");
        }
        return ret;
    }

    @Benchmark
    public static int latin1_AVX2_String() {
        int ret = 0;
        for (String what : latn1_avx2) {
            ret += what.indexOf("a");
        }
        return ret;
    }

    @Benchmark
    public static int utf16_Short_String() {
        int ret = 0;
        for (String what : utf16_short) {
            ret += what.indexOf("a");
        }
        return ret;
    }

    @Benchmark
    public static int utf16_SSE4_String() {
        int ret = 0;
        for (String what : utf16_sse4) {
            ret += what.indexOf("a");
        }
        return ret;
    }
    
    @Benchmark
    public static int utf16_AVX2_String() {
        int ret = 0;
        for (String what : utf16_avx2) {
            ret += what.indexOf("a");
        }
        return ret;
    }   
}




--
Jason



More information about the core-libs-dev mailing list