RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3]

Andrew Haley aph at openjdk.java.net
Mon Jul 5 16:50:50 UTC 2021


On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang <whuang at openjdk.org> wrote:

>> Dear all, 
>>      Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64.
>>      We profile the performance by using this JMH case:
>>  
>> 
>>    ```java
>>     package com.huawei.string;
>>     import java.util.*;
>>     import java.util.concurrent.TimeUnit;
>>     
>>     import org.openjdk.jmh.annotations.CompilerControl;
>>     import org.openjdk.jmh.annotations.Benchmark;
>>     import org.openjdk.jmh.annotations.Level;
>>     import org.openjdk.jmh.annotations.OutputTimeUnit;
>>     import org.openjdk.jmh.annotations.Param;
>>     import org.openjdk.jmh.annotations.Scope;
>>     import org.openjdk.jmh.annotations.Setup;
>>     import org.openjdk.jmh.annotations.State;
>>     import org.openjdk.jmh.annotations.Fork;
>>     import org.openjdk.jmh.infra.Blackhole;
>>     
>>     @State(Scope.Thread)
>>     @OutputTimeUnit(TimeUnit.MILLISECONDS)
>>     public class StringEqual {
>>         @Param({"8", "64", "4096"})
>>         int size;
>>     
>>         String str1;
>>         String str2;
>>     
>>         @Setup(Level.Trial)
>>         public void init() {
>>             str1 = newString(size, 'c', '1');
>>             str2 = newString(size, 'c', '2');
>>         }
>>     
>>         public String newString(int length, char charToFill, char lastChar) {
>>             if (length > 0) {
>>                 char[] array = new char[length];
>>                 Arrays.fill(array, charToFill);
>>                 array[length - 1] = lastChar;
>>                 return new String(array);
>>             }
>>             return "";
>>         }
>>     
>>         @Benchmark
>>         @CompilerControl(CompilerControl.Mode.DONT_INLINE)
>>         public boolean EqualString() {
>>             return str1.equals(str2);
>>         }
>>     }
>> 
>>    ```
>> The result is list as following:(Linux aarch64 with 128cores)
>> 
>> Benchmark                       | (size) |  Mode | Cnt  |     Score |     Error |  Units
>> ----------------------------------|-------|---------|-------|------------|------------|----------
>> StringEqual.EqualString      |         8 | thrpt  | 10 | 123971.994 | ± 1462.131 | ops/ms
>> StringEqual.EqualString       |       64 | thrpt |  10  | 56009.960  | ±  999.734 | ops/ms
>> StringEqual.EqualString        |    4096 | thrpt |  10 |   1943.852 | ±  8.159 | ops/ms
>> StringEqual.EqualStringWithNEON    |   8 | thrpt |  10 | 120319.271  | ± 1392.185 | ops/ms
>> StringEqual.EqualStringWithNEON    |  64 | thrpt |  10 |  72914.767 | ± 1814.173 | ops/ms
>> StringEqual.EqualStringWithNEON  |  4096 | thrpt  | 10  |  2579.155 | ± 15.589 | ops/ms
>> 
>> Yours, 
>> WANG Huang
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   unroll when small string sizes

I'm still seeing a slight advantage for `ldp` on Graviton 2:


Benchmark           (size)  Mode  Cnt   Score   Error  Units
StringEquals.equal     256  avgt    5  15.592 ± 0.080  us/op
StringEquals.equal     512  avgt    5  28.467 ± 0.245  us/op
StringEquals.equal    1024  avgt    5  53.883 ± 0.272  us/op


Versus the latest Neon version:

Benchmark           (size)  Mode  Cnt   Score   Error  Units
StringEquals.equal     256  avgt    5  16.848 ± 0.158  us/op
StringEquals.equal     512  avgt    5  29.640 ± 0.024  us/op
StringEquals.equal    1024  avgt    5  55.257 ± 0.050  us/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/4423


More information about the hotspot-dev mailing list