RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3]

Wang Huang whuang at openjdk.java.net
Fri Jul 2 09:59:01 UTC 2021


On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang <whuang at openjdk.org> wrote:

>> Dear all, 
>>      Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64.
>>      We profile the performance by using this JMH case:
>>  
>> 
>>    ```java
>>     package com.huawei.string;
>>     import java.util.*;
>>     import java.util.concurrent.TimeUnit;
>>     
>>     import org.openjdk.jmh.annotations.CompilerControl;
>>     import org.openjdk.jmh.annotations.Benchmark;
>>     import org.openjdk.jmh.annotations.Level;
>>     import org.openjdk.jmh.annotations.OutputTimeUnit;
>>     import org.openjdk.jmh.annotations.Param;
>>     import org.openjdk.jmh.annotations.Scope;
>>     import org.openjdk.jmh.annotations.Setup;
>>     import org.openjdk.jmh.annotations.State;
>>     import org.openjdk.jmh.annotations.Fork;
>>     import org.openjdk.jmh.infra.Blackhole;
>>     
>>     @State(Scope.Thread)
>>     @OutputTimeUnit(TimeUnit.MILLISECONDS)
>>     public class StringEqual {
>>         @Param({"8", "64", "4096"})
>>         int size;
>>     
>>         String str1;
>>         String str2;
>>     
>>         @Setup(Level.Trial)
>>         public void init() {
>>             str1 = newString(size, 'c', '1');
>>             str2 = newString(size, 'c', '2');
>>         }
>>     
>>         public String newString(int length, char charToFill, char lastChar) {
>>             if (length > 0) {
>>                 char[] array = new char[length];
>>                 Arrays.fill(array, charToFill);
>>                 array[length - 1] = lastChar;
>>                 return new String(array);
>>             }
>>             return "";
>>         }
>>     
>>         @Benchmark
>>         @CompilerControl(CompilerControl.Mode.DONT_INLINE)
>>         public boolean EqualString() {
>>             return str1.equals(str2);
>>         }
>>     }
>> 
>>    ```
>> The result is list as following:(Linux aarch64 with 128cores)
>> 
>> Benchmark                       | (size) |  Mode | Cnt  |     Score |     Error |  Units
>> ----------------------------------|-------|---------|-------|------------|------------|----------
>> StringEqual.EqualString      |         8 | thrpt  | 10 | 123971.994 | ± 1462.131 | ops/ms
>> StringEqual.EqualString       |       64 | thrpt |  10  | 56009.960  | ±  999.734 | ops/ms
>> StringEqual.EqualString        |    4096 | thrpt |  10 |   1943.852 | ±  8.159 | ops/ms
>> StringEqual.EqualStringWithNEON    |   8 | thrpt |  10 | 120319.271  | ± 1392.185 | ops/ms
>> StringEqual.EqualStringWithNEON    |  64 | thrpt |  10 |  72914.767 | ± 1814.173 | ops/ms
>> StringEqual.EqualStringWithNEON  |  4096 | thrpt  | 10  |  2579.155 | ± 15.589 | ops/ms
>> 
>> Yours, 
>> WANG Huang
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   unroll when small string sizes

> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_
> 
> I had to make some changes to the benchmark to get accurate timing, because
> it is swamped by JMH overhead for very small strings.
> 
> It should be clear from my patch what I did. The most important part is
> to run the test code in a loop, or you won't see small effects. We're
> trying to measure something that only takes a few nanoseconds.
> 
> This is what I see, Apple M1, two equal strings:
> 
> Old:
> 
> StringEquals.equal 8 avgt 5 0.948 ? 0.001 us/op
> StringEquals.equal 11 avgt 5 0.948 ? 0.004 us/op
> StringEquals.equal 16 avgt 5 0.948 ? 0.001 us/op
> StringEquals.equal 22 avgt 5 1.260 ? 0.002 us/op
> StringEquals.equal 32 avgt 5 1.886 ? 0.001 us/op
> StringEquals.equal 45 avgt 5 2.514 ? 0.001 us/op
> StringEquals.equal 64 avgt 5 3.141 ? 0.003 us/op
> StringEquals.equal 91 avgt 5 4.395 ? 0.002 us/op
> StringEquals.equal 121 avgt 5 5.653 ? 0.014 us/op
> StringEquals.equal 181 avgt 5 8.011 ? 0.010 us/op
> StringEquals.equal 256 avgt 5 11.433 ? 0.014 us/op
> StringEquals.equal 512 avgt 5 23.005 ? 0.124 us/op
> StringEquals.equal 1024 avgt 5 49.185 ? 0.032 us/op
> 
> Your patch:
> 
> Benchmark (size) Mode Cnt Score Error Units
> StringEquals.equal 8 avgt 5 1.574 ? 0.001 us/op
> StringEquals.equal 11 avgt 5 1.734 ? 0.004 us/op
> StringEquals.equal 16 avgt 5 1.888 ? 0.002 us/op
> StringEquals.equal 22 avgt 5 1.892 ? 0.003 us/op
> StringEquals.equal 32 avgt 5 2.517 ? 0.003 us/op
> StringEquals.equal 45 avgt 5 2.988 ? 0.002 us/op
> StringEquals.equal 64 avgt 5 2.517 ? 0.003 us/op
> StringEquals.equal 91 avgt 5 8.659 ? 0.007 us/op
> StringEquals.equal 121 avgt 5 5.649 ? 0.007 us/op
> StringEquals.equal 181 avgt 5 6.050 ? 0.009 us/op
> StringEquals.equal 256 avgt 5 7.088 ? 0.016 us/op
> StringEquals.equal 512 avgt 5 14.163 ? 0.018 us/op
> StringEquals.equal 1024 avgt 5 29.998 ? 0.052 us/op
> 
> As you can see, we're looking at regressions all the way up to size=45,
> with something very odd happening at size=91. Finally the vectorized
> code starts to pull ahead at size=181.
> 
> A few things:
> 
> You should never be executing the TAIL unless the string is really
> short. Just do one pair of unaligned loads at the end to finish.
> 
> Please don't use aliases for rscratch1 and rscratch2. Calling them tmp1
> and tmp2 doesn't help the reader.
> 
> So: please make sure the smaller strings are at least as good as
> they are now. Remember strings are usually short, so we can tolerate
> no regressions with the smaller sizes.
> 
> I don't think that Neon does any good here. This is what I get by rewriting
> (just) the stub with scalar registers, in the attached patch:
> 
> Benchmark (size) Mode Cnt Score Error Units
> StringEquals.equal 8 avgt 5 1.574 ? 0.004 us/op
> StringEquals.equal 11 avgt 5 1.734 ? 0.003 us/op
> StringEquals.equal 16 avgt 5 1.888 ? 0.002 us/op
> StringEquals.equal 22 avgt 5 1.891 ? 0.003 us/op
> StringEquals.equal 32 avgt 5 2.517 ? 0.001 us/op
> StringEquals.equal 45 avgt 5 2.988 ? 0.002 us/op
> StringEquals.equal 64 avgt 5 2.595 ? 0.004 us/op
> StringEquals.equal 91 avgt 5 4.083 ? 0.006 us/op
> StringEquals.equal 121 avgt 5 5.432 ? 0.006 us/op
> StringEquals.equal 181 avgt 5 6.292 ? 0.009 us/op
> StringEquals.equal 256 avgt 5 7.232 ? 0.008 us/op
> StringEquals.equal 512 avgt 5 13.304 ? 0.012 us/op
> StringEquals.equal 1024 avgt 5 25.537 ? 0.012 us/op
> 
> I use an editor with automatic indentation, as do many people, so
> I inserted brackets in the right places in the assembly code.
> 
> --
> Andrew Haley (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: 8268229.patch
> Type: text/x-patch
> Size: 12464 bytes
> Desc: not available
> URL: <https://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20210629/61ccd20c/8268229-0001.patch>

@theRealAph  Thank you for your suggestion. It's my fault that the JMH I used is not accurate. I changed my codes and re-tested under your JMH:

Before opt:
Benchmark |(size)| Mode| Cnt | Score| Error |Units
-------------------|------|-----|-----|-------|---------|-----
StringEquals.equal | 8| avgt| 5 | 2.334|? 0.012 |us/op
StringEquals.equal | 11| avgt| 5 | 2.335|? 0.012 |us/op
StringEquals.equal | 16| avgt| 5 | 2.334|? 0.011 |us/op
StringEquals.equal | 22| avgt| 5 | 3.414|? 0.422 |us/op
StringEquals.equal | 32| avgt| 5 | 3.890|? 0.004 |us/op
StringEquals.equal | 45| avgt| 5 | 5.610|? 0.023 |us/op
StringEquals.equal | 64| avgt| 5 | 7.215|? 0.009 |us/op
StringEquals.equal | 91| avgt| 5 | 12.305|? 1.716 |us/op
StringEquals.equal | 121| avgt| 5 | 14.891|? 0.085 |us/op
StringEquals.equal | 181| avgt| 5 | 21.502|? 0.050 |us/op
StringEquals.equal | 256| avgt| 5 | 29.968|? 0.155 |us/op
StringEquals.equal | 512| avgt| 5 | 59.414|? 2.341 |us/op
StringEquals.equal | 1024| avgt| 5 |118.365|? 20.794 |us/op

After opt:
Benchmark |(size)| Mode| Cnt | Score| Error| Units
-------------------|------|-----|-----|------|-------|------
StringEquals.equal | 8| avgt| 5 | 2.333|? 0.003| us/op
StringEquals.equal | 11| avgt| 5 | 2.333|? 0.001| us/op
StringEquals.equal | 16| avgt| 5 | 2.332|? 0.002| us/op
StringEquals.equal | 22| avgt| 5 | 3.265|? 0.404| us/op
StringEquals.equal | 32| avgt| 5 | 3.875|? 0.002| us/op
StringEquals.equal | 45| avgt| 5 | 5.793|? 0.331| us/op
StringEquals.equal | 64| avgt| 5 | 6.730|? 0.054| us/op
StringEquals.equal | 91| avgt| 5 | 8.611|? 0.075| us/op
StringEquals.equal | 121| avgt| 5 |10.041|? 0.042| us/op
StringEquals.equal | 181| avgt| 5 |13.968|? 0.653| us/op
StringEquals.equal | 256| avgt| 5 |19.199|? 1.227| us/op
StringEquals.equal | 512| avgt| 5 |39.508|? 1.784| us/op
StringEquals.equal | 1024| avgt| 5 |77.883|? 1.290| us/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/4423


More information about the hotspot-dev mailing list