RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3]

Andrew Haley aph at openjdk.java.net
Mon Jul 5 16:34:54 UTC 2021


On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang <whuang at openjdk.org> wrote:

>> Dear all, 
>>      Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64.
>>      We profile the performance by using this JMH case:
>>  
>> 
>>    ```java
>>     package com.huawei.string;
>>     import java.util.*;
>>     import java.util.concurrent.TimeUnit;
>>     
>>     import org.openjdk.jmh.annotations.CompilerControl;
>>     import org.openjdk.jmh.annotations.Benchmark;
>>     import org.openjdk.jmh.annotations.Level;
>>     import org.openjdk.jmh.annotations.OutputTimeUnit;
>>     import org.openjdk.jmh.annotations.Param;
>>     import org.openjdk.jmh.annotations.Scope;
>>     import org.openjdk.jmh.annotations.Setup;
>>     import org.openjdk.jmh.annotations.State;
>>     import org.openjdk.jmh.annotations.Fork;
>>     import org.openjdk.jmh.infra.Blackhole;
>>     
>>     @State(Scope.Thread)
>>     @OutputTimeUnit(TimeUnit.MILLISECONDS)
>>     public class StringEqual {
>>         @Param({"8", "64", "4096"})
>>         int size;
>>     
>>         String str1;
>>         String str2;
>>     
>>         @Setup(Level.Trial)
>>         public void init() {
>>             str1 = newString(size, 'c', '1');
>>             str2 = newString(size, 'c', '2');
>>         }
>>     
>>         public String newString(int length, char charToFill, char lastChar) {
>>             if (length > 0) {
>>                 char[] array = new char[length];
>>                 Arrays.fill(array, charToFill);
>>                 array[length - 1] = lastChar;
>>                 return new String(array);
>>             }
>>             return "";
>>         }
>>     
>>         @Benchmark
>>         @CompilerControl(CompilerControl.Mode.DONT_INLINE)
>>         public boolean EqualString() {
>>             return str1.equals(str2);
>>         }
>>     }
>> 
>>    ```
>> The result is list as following:(Linux aarch64 with 128cores)
>> 
>> Benchmark                       | (size) |  Mode | Cnt  |     Score |     Error |  Units
>> ----------------------------------|-------|---------|-------|------------|------------|----------
>> StringEqual.EqualString      |         8 | thrpt  | 10 | 123971.994 | ± 1462.131 | ops/ms
>> StringEqual.EqualString       |       64 | thrpt |  10  | 56009.960  | ±  999.734 | ops/ms
>> StringEqual.EqualString        |    4096 | thrpt |  10 |   1943.852 | ±  8.159 | ops/ms
>> StringEqual.EqualStringWithNEON    |   8 | thrpt |  10 | 120319.271  | ± 1392.185 | ops/ms
>> StringEqual.EqualStringWithNEON    |  64 | thrpt |  10 |  72914.767 | ± 1814.173 | ops/ms
>> StringEqual.EqualStringWithNEON  |  4096 | thrpt  | 10  |  2579.155 | ± 15.589 | ops/ms
>> 
>> Yours, 
>> WANG Huang
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   unroll when small string sizes

Here are some Graviton 2 timings, five versions:


+UseSimpleStringEquals:
Benchmark           (size)  Mode  Cnt   Score    Error  Units
StringEquals.equal       1  avgt    5   2.813 ±  0.001  us/op
StringEquals.equal       3  avgt    5   2.821 ±  0.001  us/op
StringEquals.equal       4  avgt    5   2.812 ±  0.001  us/op
StringEquals.equal       6  avgt    5   2.821 ±  0.002  us/op
StringEquals.equal       8  avgt    5   2.420 ±  0.001  us/op
StringEquals.equal      11  avgt    5   2.420 ±  0.001  us/op
StringEquals.equal      16  avgt    5   2.421 ±  0.002  us/op
StringEquals.equal      21  avgt    5   3.291 ±  0.003  us/op
StringEquals.equal      32  avgt    5   4.412 ±  0.001  us/op
StringEquals.equal      45  avgt    5   5.623 ±  0.001  us/op
StringEquals.equal      64  avgt    5   7.225 ±  0.010  us/op
StringEquals.equal      91  avgt    5  10.426 ±  0.002  us/op
StringEquals.equal     128  avgt    5  13.628 ±  0.001  us/op
StringEquals.equal     181  avgt    5  19.231 ±  0.002  us/op
StringEquals.equal     256  avgt    5  26.436 ±  0.009  us/op


Your commit 4f02c00f55f1dc37762a04b7e30534ee27a7f20a:

Benchmark           (size)  Mode  Cnt   Score    Error  Units
StringEquals.equal       1  avgt    5   2.812 ±  0.001  us/op
StringEquals.equal       3  avgt    5   3.212 ±  0.001  us/op
StringEquals.equal       4  avgt    5   2.812 ±  0.001  us/op
StringEquals.equal       6  avgt    5   3.212 ±  0.001  us/op
StringEquals.equal       8  avgt    5   3.612 ±  0.001  us/op
StringEquals.equal      11  avgt    5   4.413 ±  0.001  us/op
StringEquals.equal      16  avgt    5   4.813 ±  0.001  us/op
StringEquals.equal      21  avgt    5   5.613 ±  0.001  us/op
StringEquals.equal      32  avgt    5   6.418 ±  0.001  us/op
StringEquals.equal      45  avgt    5   7.614 ±  0.001  us/op
StringEquals.equal      64  avgt    5   6.929 ±  0.081  us/op
StringEquals.equal      91  avgt    5   9.617 ±  0.001  us/op
StringEquals.equal     128  avgt    5  11.880 ±  0.152  us/op
StringEquals.equal     181  avgt    5  16.576 ±  0.002  us/op
StringEquals.equal     256  avgt    5  21.869 ±  0.108  us/op


My hack using ldp:

Benchmark           (size)  Mode  Cnt   Score    Error  Units
StringEquals.equal       1  avgt    5   2.414 ±  0.001  us/op
StringEquals.equal       3  avgt    5   2.814 ±  0.001  us/op
StringEquals.equal       4  avgt    5   2.414 ±  0.001  us/op
StringEquals.equal       6  avgt    5   2.814 ±  0.001  us/op
StringEquals.equal       8  avgt    5   3.214 ±  0.001  us/op
StringEquals.equal      11  avgt    5   4.015 ±  0.001  us/op
StringEquals.equal      16  avgt    5   4.419 ±  0.001  us/op
StringEquals.equal      21  avgt    5   5.216 ±  0.001  us/op
StringEquals.equal      32  avgt    5   6.017 ±  0.001  us/op
StringEquals.equal      45  avgt    5   7.218 ±  0.001  us/op
StringEquals.equal      64  avgt    5   6.015 ±  0.001  us/op
StringEquals.equal      91  avgt    5   8.967 ±  0.015  us/op
StringEquals.equal     128  avgt    5   9.217 ±  0.001  us/op
StringEquals.equal     181  avgt    5  14.096 ±  0.011  us/op
StringEquals.equal     256  avgt    5  15.462 ±  0.259  us/op


Today's -UseSimpleStringEquals:

Benchmark           (size)  Mode  Cnt   Score    Error  Units
StringEquals.equal       1  avgt    5   2.812 ±  0.001  us/op
StringEquals.equal       3  avgt    5   3.212 ±  0.001  us/op
StringEquals.equal       4  avgt    5   2.812 ±  0.001  us/op
StringEquals.equal       6  avgt    5   3.212 ±  0.001  us/op
StringEquals.equal       8  avgt    5   2.813 ±  0.002  us/op
StringEquals.equal      11  avgt    5   2.813 ±  0.001  us/op
StringEquals.equal      16  avgt    5   2.813 ±  0.001  us/op
StringEquals.equal      21  avgt    5   3.615 ±  0.001  us/op
StringEquals.equal      32  avgt    5   4.414 ±  0.001  us/op
StringEquals.equal      45  avgt    5   7.080 ±  0.027  us/op
StringEquals.equal      64  avgt    5   7.613 ±  0.001  us/op
StringEquals.equal      91  avgt    5  10.037 ±  0.005  us/op
StringEquals.equal     128  avgt    5  10.419 ±  0.001  us/op
StringEquals.equal     181  avgt    5  14.896 ±  0.004  us/op
StringEquals.equal     256  avgt    5  16.823 ±  0.001  us/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/4423


More information about the hotspot-dev mailing list