RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3]
Wang Huang
whuang at openjdk.java.net
Fri Jul 2 09:59:01 UTC 2021
On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang <whuang at openjdk.org> wrote:
>> Dear all,
>> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64.
>> We profile the performance by using this JMH case:
>>
>>
>> ```java
>> package com.huawei.string;
>> import java.util.*;
>> import java.util.concurrent.TimeUnit;
>>
>> import org.openjdk.jmh.annotations.CompilerControl;
>> import org.openjdk.jmh.annotations.Benchmark;
>> import org.openjdk.jmh.annotations.Level;
>> import org.openjdk.jmh.annotations.OutputTimeUnit;
>> import org.openjdk.jmh.annotations.Param;
>> import org.openjdk.jmh.annotations.Scope;
>> import org.openjdk.jmh.annotations.Setup;
>> import org.openjdk.jmh.annotations.State;
>> import org.openjdk.jmh.annotations.Fork;
>> import org.openjdk.jmh.infra.Blackhole;
>>
>> @State(Scope.Thread)
>> @OutputTimeUnit(TimeUnit.MILLISECONDS)
>> public class StringEqual {
>> @Param({"8", "64", "4096"})
>> int size;
>>
>> String str1;
>> String str2;
>>
>> @Setup(Level.Trial)
>> public void init() {
>> str1 = newString(size, 'c', '1');
>> str2 = newString(size, 'c', '2');
>> }
>>
>> public String newString(int length, char charToFill, char lastChar) {
>> if (length > 0) {
>> char[] array = new char[length];
>> Arrays.fill(array, charToFill);
>> array[length - 1] = lastChar;
>> return new String(array);
>> }
>> return "";
>> }
>>
>> @Benchmark
>> @CompilerControl(CompilerControl.Mode.DONT_INLINE)
>> public boolean EqualString() {
>> return str1.equals(str2);
>> }
>> }
>>
>> ```
>> The result is list as following:(Linux aarch64 with 128cores)
>>
>> Benchmark | (size) | Mode | Cnt | Score | Error | Units
>> ----------------------------------|-------|---------|-------|------------|------------|----------
>> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ± 1462.131 | ops/ms
>> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ± 999.734 | ops/ms
>> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ± 8.159 | ops/ms
>> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ± 1392.185 | ops/ms
>> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ± 1814.173 | ops/ms
>> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ± 15.589 | ops/ms
>>
>> Yours,
>> WANG Huang
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
>
> unroll when small string sizes
> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_
>
> I had to make some changes to the benchmark to get accurate timing, because
> it is swamped by JMH overhead for very small strings.
>
> It should be clear from my patch what I did. The most important part is
> to run the test code in a loop, or you won't see small effects. We're
> trying to measure something that only takes a few nanoseconds.
>
> This is what I see, Apple M1, two equal strings:
>
> Old:
>
> StringEquals.equal 8 avgt 5 0.948 ? 0.001 us/op
> StringEquals.equal 11 avgt 5 0.948 ? 0.004 us/op
> StringEquals.equal 16 avgt 5 0.948 ? 0.001 us/op
> StringEquals.equal 22 avgt 5 1.260 ? 0.002 us/op
> StringEquals.equal 32 avgt 5 1.886 ? 0.001 us/op
> StringEquals.equal 45 avgt 5 2.514 ? 0.001 us/op
> StringEquals.equal 64 avgt 5 3.141 ? 0.003 us/op
> StringEquals.equal 91 avgt 5 4.395 ? 0.002 us/op
> StringEquals.equal 121 avgt 5 5.653 ? 0.014 us/op
> StringEquals.equal 181 avgt 5 8.011 ? 0.010 us/op
> StringEquals.equal 256 avgt 5 11.433 ? 0.014 us/op
> StringEquals.equal 512 avgt 5 23.005 ? 0.124 us/op
> StringEquals.equal 1024 avgt 5 49.185 ? 0.032 us/op
>
> Your patch:
>
> Benchmark (size) Mode Cnt Score Error Units
> StringEquals.equal 8 avgt 5 1.574 ? 0.001 us/op
> StringEquals.equal 11 avgt 5 1.734 ? 0.004 us/op
> StringEquals.equal 16 avgt 5 1.888 ? 0.002 us/op
> StringEquals.equal 22 avgt 5 1.892 ? 0.003 us/op
> StringEquals.equal 32 avgt 5 2.517 ? 0.003 us/op
> StringEquals.equal 45 avgt 5 2.988 ? 0.002 us/op
> StringEquals.equal 64 avgt 5 2.517 ? 0.003 us/op
> StringEquals.equal 91 avgt 5 8.659 ? 0.007 us/op
> StringEquals.equal 121 avgt 5 5.649 ? 0.007 us/op
> StringEquals.equal 181 avgt 5 6.050 ? 0.009 us/op
> StringEquals.equal 256 avgt 5 7.088 ? 0.016 us/op
> StringEquals.equal 512 avgt 5 14.163 ? 0.018 us/op
> StringEquals.equal 1024 avgt 5 29.998 ? 0.052 us/op
>
> As you can see, we're looking at regressions all the way up to size=45,
> with something very odd happening at size=91. Finally the vectorized
> code starts to pull ahead at size=181.
>
> A few things:
>
> You should never be executing the TAIL unless the string is really
> short. Just do one pair of unaligned loads at the end to finish.
>
> Please don't use aliases for rscratch1 and rscratch2. Calling them tmp1
> and tmp2 doesn't help the reader.
>
> So: please make sure the smaller strings are at least as good as
> they are now. Remember strings are usually short, so we can tolerate
> no regressions with the smaller sizes.
>
> I don't think that Neon does any good here. This is what I get by rewriting
> (just) the stub with scalar registers, in the attached patch:
>
> Benchmark (size) Mode Cnt Score Error Units
> StringEquals.equal 8 avgt 5 1.574 ? 0.004 us/op
> StringEquals.equal 11 avgt 5 1.734 ? 0.003 us/op
> StringEquals.equal 16 avgt 5 1.888 ? 0.002 us/op
> StringEquals.equal 22 avgt 5 1.891 ? 0.003 us/op
> StringEquals.equal 32 avgt 5 2.517 ? 0.001 us/op
> StringEquals.equal 45 avgt 5 2.988 ? 0.002 us/op
> StringEquals.equal 64 avgt 5 2.595 ? 0.004 us/op
> StringEquals.equal 91 avgt 5 4.083 ? 0.006 us/op
> StringEquals.equal 121 avgt 5 5.432 ? 0.006 us/op
> StringEquals.equal 181 avgt 5 6.292 ? 0.009 us/op
> StringEquals.equal 256 avgt 5 7.232 ? 0.008 us/op
> StringEquals.equal 512 avgt 5 13.304 ? 0.012 us/op
> StringEquals.equal 1024 avgt 5 25.537 ? 0.012 us/op
>
> I use an editor with automatic indentation, as do many people, so
> I inserted brackets in the right places in the assembly code.
>
> --
> Andrew Haley (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: 8268229.patch
> Type: text/x-patch
> Size: 12464 bytes
> Desc: not available
> URL: <https://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20210629/61ccd20c/8268229-0001.patch>
@theRealAph Thank you for your suggestion. It's my fault that the JMH I used is not accurate. I changed my codes and re-tested under your JMH:
Before opt:
Benchmark |(size)| Mode| Cnt | Score| Error |Units
-------------------|------|-----|-----|-------|---------|-----
StringEquals.equal | 8| avgt| 5 | 2.334|? 0.012 |us/op
StringEquals.equal | 11| avgt| 5 | 2.335|? 0.012 |us/op
StringEquals.equal | 16| avgt| 5 | 2.334|? 0.011 |us/op
StringEquals.equal | 22| avgt| 5 | 3.414|? 0.422 |us/op
StringEquals.equal | 32| avgt| 5 | 3.890|? 0.004 |us/op
StringEquals.equal | 45| avgt| 5 | 5.610|? 0.023 |us/op
StringEquals.equal | 64| avgt| 5 | 7.215|? 0.009 |us/op
StringEquals.equal | 91| avgt| 5 | 12.305|? 1.716 |us/op
StringEquals.equal | 121| avgt| 5 | 14.891|? 0.085 |us/op
StringEquals.equal | 181| avgt| 5 | 21.502|? 0.050 |us/op
StringEquals.equal | 256| avgt| 5 | 29.968|? 0.155 |us/op
StringEquals.equal | 512| avgt| 5 | 59.414|? 2.341 |us/op
StringEquals.equal | 1024| avgt| 5 |118.365|? 20.794 |us/op
After opt:
Benchmark |(size)| Mode| Cnt | Score| Error| Units
-------------------|------|-----|-----|------|-------|------
StringEquals.equal | 8| avgt| 5 | 2.333|? 0.003| us/op
StringEquals.equal | 11| avgt| 5 | 2.333|? 0.001| us/op
StringEquals.equal | 16| avgt| 5 | 2.332|? 0.002| us/op
StringEquals.equal | 22| avgt| 5 | 3.265|? 0.404| us/op
StringEquals.equal | 32| avgt| 5 | 3.875|? 0.002| us/op
StringEquals.equal | 45| avgt| 5 | 5.793|? 0.331| us/op
StringEquals.equal | 64| avgt| 5 | 6.730|? 0.054| us/op
StringEquals.equal | 91| avgt| 5 | 8.611|? 0.075| us/op
StringEquals.equal | 121| avgt| 5 |10.041|? 0.042| us/op
StringEquals.equal | 181| avgt| 5 |13.968|? 0.653| us/op
StringEquals.equal | 256| avgt| 5 |19.199|? 1.227| us/op
StringEquals.equal | 512| avgt| 5 |39.508|? 1.784| us/op
StringEquals.equal | 1024| avgt| 5 |77.883|? 1.290| us/op
-------------
PR: https://git.openjdk.java.net/jdk/pull/4423
More information about the hotspot-dev
mailing list