String.indexOf(single-char-String)

Sat Nov 27 17:53:57 UTC 2021

StringIndexOfChar showed that the assumption that the char variant 
should be at least as fast as the String variant doesn't always apply. 
The implementation for indexOf(char) seems to have a different code 
emission heuristics than the String variant.
https://github.com/openjdk/jdk/pull/6509#issuecomment-980681069

the following benchmark reproduces the anomaly:

package dev.mbien.jmh;

import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.results.format.ResultFormatType;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class SSECharAnomalyJMH {

//Benchmark                      (length)  Mode  Cnt   Score Error  Units
//CharVsStringJMH.indexOfChar           0  avgt    5   4.895 ± 0.063  ns/op
//CharVsStringJMH.indexOfString         0  avgt    5   6.131 ± 0.115  ns/op

//CharVsStringJMH.indexOfChar           1  avgt    5  10.071 ± 0.121  
ns/op //xxx
//CharVsStringJMH.indexOfString         1  avgt    5   6.322 ± 0.099  
ns/op //xxx

//CharVsStringJMH.indexOfChar           2  avgt    5   4.577 ± 0.009  ns/op
//CharVsStringJMH.indexOfString         2  avgt    5   5.917 ± 0.019  ns/op

//CharVsStringJMH.indexOfChar           3  avgt    5   7.093 ± 0.040  ns/op
//CharVsStringJMH.indexOfString         3  avgt    5  11.533 ± 0.100  ns/op

// -XX:-UseSSE42Intrinsics
//Benchmark                      (length)  Mode  Cnt   Score Error  Units
//CharVsStringJMH.indexOfChar           0  avgt    5   5.063 ± 0.023  ns/op
//CharVsStringJMH.indexOfString         0  avgt    5   6.605 ± 0.288  ns/op

//CharVsStringJMH.indexOfChar           1  avgt    5   8.768 ± 0.082  
ns/op //compare with SSE version
//CharVsStringJMH.indexOfString         1  avgt    5  10.640 ± 0.133  ns/op

//CharVsStringJMH.indexOfChar           2  avgt    5   9.371 ± 0.021  ns/op
//CharVsStringJMH.indexOfString         2  avgt    5  12.427 ± 0.087  ns/op

//CharVsStringJMH.indexOfChar           3  avgt    5  16.435 ± 0.235  ns/op
//CharVsStringJMH.indexOfString         3  avgt    5  17.566 ± 0.143  ns/op

     @Param({"0", "1", "2", "3"})
     private int length = 0;

     private final String[] str = {
         "abcd",
         "abBaegyswratgd",                                       // 
length < 16
         "abBaegyswratgfed",                                     // 
length = 16
         "ABCDEFGHIJKLMNOPQRSTUVWXYZabcefghijklmnopqrstuvwxyzd", // ~40
     };

     @Benchmark
     public void indexOfChar(Blackhole bh) throws InterruptedException {
         bh.consume(str[length].indexOf('d') != -1);
     }

     @Benchmark
     public void indexOfString(Blackhole bh) throws InterruptedException {
         bh.consume(str[length].indexOf("d") != -1);
     }

     public static void main(String[] args) throws RunnerException {

         Options opt = new OptionsBuilder()
                 .include(SSECharAnomalyJMH.class.getSimpleName())
                 .resultFormat(ResultFormatType.JSON)
.result("results/"+SSECharAnomalyJMH.class.getSimpleName()+".json")
                 .build();

         new Runner(opt).run();

         opt = new OptionsBuilder()
                 .include(SSECharAnomalyJMH.class.getSimpleName())
                 .resultFormat(ResultFormatType.JSON)
                 .jvmArgsAppend("-XX:-UseSSE42Intrinsics")
//                .jvmArgsAppend("-XX:UseAVX=0")
.result("results/"+SSECharAnomalyJMH.class.getSimpleName()+"2.json")
                 .build();

         new Runner(opt).run();
     }

}

On 26.11.21 14:42, Michael Bien wrote:
> added benchmark results for OpenJDK's StringIndexOf benchmark:
> https://github.com/openjdk/jdk/pull/6509#issuecomment-979985594
>
> -michael
>
>
> On 25.11.21 15:05, Michael Bien wrote:
>> Hello again,
>>
>> I was trying to run JDK's benchmarks over night (second attempt 
>> actually) but had some difficulties to get stable results.
>>
>> This makes it difficult to compare the modified version with a 
>> reference. I am not sure what the cause is, I have heard some intel 
>> CPUs can't run avx instructions for a long time without changing 
>> clock - maybe i am hitting this issue?
>> Its not the temperature and i turned boost and HT off + it runs in 
>> headless mode. One run already takes almost 5h and I have to run it 
>> twice - so i can't increase the iterations even more.
>>
>>
>> for example:
>>
>> # Benchmark: 
>> org.openjdk.bench.java.lang.StringIndexOfChar.utf16_mixed_String
>> # Parameters: (loops = 100000, pathCnt = 1000, rngSeed = 1999)
>>
>> # Run progress: 95.35% complete, ETA 00:13:20
>> # Fork: 1 of 1
>> # Warmup Iteration   1: 18592.094 ns/op    <- second fastest run?
>> # Warmup Iteration   2: 20519.413 ns/op
>> # Warmup Iteration   3: 19768.099 ns/op
>> # Warmup Iteration   4: 23093.410 ns/op
>> # Warmup Iteration   5: 29112.909 ns/op
>> # Warmup Iteration   6: 18962.671 ns/op
>> # Warmup Iteration   7: 16721.933 ns/op    <- fastest run?
>> # Warmup Iteration   8: 20267.809 ns/op
>> # Warmup Iteration   9: 23934.031 ns/op
>> # Warmup Iteration  10: 22474.836 ns/op
>> # Warmup Iteration  11: 19583.471 ns/op
>> # Warmup Iteration  12: 19595.319 ns/op
>> # Warmup Iteration  13: 24865.299 ns/op
>> # Warmup Iteration  14: 19581.014 ns/op
>> # Warmup Iteration  15: 19566.849 ns/op
>> # Warmup Iteration  16: 19576.219 ns/op
>> # Warmup Iteration  17: 19574.475 ns/op
>> # Warmup Iteration  18: 19565.854 ns/op
>> # Warmup Iteration  19: 26594.867 ns/op
>> # Warmup Iteration  20: 26532.977 ns/op
>> Iteration   1: 25484.070 ns/op
>> Iteration   2: 19594.206 ns/op
>> Iteration   3: 30327.037 ns/op
>> Iteration   4: 31029.242 ns/op  <- xxx
>> Iteration   5: 19560.472 ns/op
>> Iteration   6: 19611.728 ns/op
>> Iteration   7: 23214.511 ns/op
>> Iteration   8: 28455.757 ns/op
>> Iteration   9: 19787.638 ns/op
>> Iteration  10: 23737.501 ns/op
>> Iteration  11: 25947.249 ns/op
>> Iteration  12: 19768.214 ns/op
>> Iteration  13: 25789.970 ns/op
>> Iteration  14: 20558.622 ns/op
>> Iteration  15: 19611.317 ns/op
>> Iteration  16: 27761.431 ns/op
>> Iteration  17: 19749.799 ns/op
>> Iteration  18: 20862.478 ns/op
>> Iteration  19: 19581.498 ns/op
>> Iteration  20: 28094.839 ns/op
>>
>>
>> latin1_Short_String, latin1_Short_char, latin1_mixed_String, 
>> latin1_mixed_char, utf16_mixed_String and utf16_mixed_char have all 
>> large error bars (all in StringIndexOfChar).
>>
>>
>> best regards,
>>
>> michael
>>
>>
>>
>> On 23.11.21 17:06, Michael Bien wrote:
>>> On 23.11.21 15:57, Roger Riggs wrote:
>>>> Hi Michael,
>>>>
>>>> As you might expect performance of strings is very sensitive and 
>>>> has been tuned extensively over the years many times.
>>>>
>>>> Though this improves the performance for 1 character strings. It 
>>>> will have an impact on *every other* length of string.
>>>> You'll need to show that it does not impact performance of longer 
>>>> strings.
>>>
>>> yes of course. The if (str.length == 1) branch should be dead code 
>>> and eliminated by the JVM for all String constants with non-one 
>>> lengths.
>>>
>>> Looking through the benchmarks in micro/*/java/lang/String*, all 
>>> seem to be using constants as parameter for indexOf(). To try to 
>>> measure the impact of the if branch i would have to write a 
>>> benchmark with a parameter which changes every iteration, right? 
>>> Otherwise the branch will be optimized away by the JIT.
>>>
>>>>
>>>> It may be worth looking further at other ways to achieve the result.
>>>
>>> agreed, I tried the most obvious approach first, but there is a 
>>> chance that the fast path can be put into the intrinsified 
>>> StringLatin1/StringUTF16 code instead.
>>>
>>> -michael
>>>
>>>
>>>>
>>>> Regards, Roger
>>>>
>>>>
>>>> On 11/22/21 3:52 PM, Michael Bien wrote:
>>>>> Hello,
>>>>>
>>>>> I kept forgetting which variants of the String methods perform 
>>>>> better with single-char-Strings and which with char (IDEs had the 
>>>>> tendency to suggest the wrong variant since it changed between JDK 
>>>>> releases). So i wrote JMH benchmarks and noticed that the last 
>>>>> method with a performance difference seems to be String.indexOf() 
>>>>> - all other variants performed equally (unless I overlooked some).
>>>>>
>>>>> this might be fairly easy to fix:
>>>>>
>>>>> https://github.com/openjdk/jdk/pull/6509
>>>>>
>>>>> (side effect: contains("c") is also faster)
>>>>>
>>>>> I haven't looked into the intrinsified code of StringLatin1 and 
>>>>> StringUTF16 to check if it could be fixed there (mostly because i 
>>>>> actually don't know how the JVM assembles those intrinsics). It 
>>>>> might be possible to improve this for short Strings in general, 
>>>>> not just for chars, dependent on why the intrinsified version is 
>>>>> actually slower for single-char-Strings. I opted for the trivial 
>>>>> fix in java code.
>>>>>
>>>>> best regards,
>>>>>
>>>>> michael
>>>>>
>>>>
>>>
>>
>