RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3]
Andrew Haley
aph at openjdk.java.net
Mon Jul 5 16:34:54 UTC 2021
On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang <whuang at openjdk.org> wrote:
>> Dear all,
>> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64.
>> We profile the performance by using this JMH case:
>>
>>
>> ```java
>> package com.huawei.string;
>> import java.util.*;
>> import java.util.concurrent.TimeUnit;
>>
>> import org.openjdk.jmh.annotations.CompilerControl;
>> import org.openjdk.jmh.annotations.Benchmark;
>> import org.openjdk.jmh.annotations.Level;
>> import org.openjdk.jmh.annotations.OutputTimeUnit;
>> import org.openjdk.jmh.annotations.Param;
>> import org.openjdk.jmh.annotations.Scope;
>> import org.openjdk.jmh.annotations.Setup;
>> import org.openjdk.jmh.annotations.State;
>> import org.openjdk.jmh.annotations.Fork;
>> import org.openjdk.jmh.infra.Blackhole;
>>
>> @State(Scope.Thread)
>> @OutputTimeUnit(TimeUnit.MILLISECONDS)
>> public class StringEqual {
>> @Param({"8", "64", "4096"})
>> int size;
>>
>> String str1;
>> String str2;
>>
>> @Setup(Level.Trial)
>> public void init() {
>> str1 = newString(size, 'c', '1');
>> str2 = newString(size, 'c', '2');
>> }
>>
>> public String newString(int length, char charToFill, char lastChar) {
>> if (length > 0) {
>> char[] array = new char[length];
>> Arrays.fill(array, charToFill);
>> array[length - 1] = lastChar;
>> return new String(array);
>> }
>> return "";
>> }
>>
>> @Benchmark
>> @CompilerControl(CompilerControl.Mode.DONT_INLINE)
>> public boolean EqualString() {
>> return str1.equals(str2);
>> }
>> }
>>
>> ```
>> The result is list as following:(Linux aarch64 with 128cores)
>>
>> Benchmark | (size) | Mode | Cnt | Score | Error | Units
>> ----------------------------------|-------|---------|-------|------------|------------|----------
>> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ± 1462.131 | ops/ms
>> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ± 999.734 | ops/ms
>> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ± 8.159 | ops/ms
>> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ± 1392.185 | ops/ms
>> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ± 1814.173 | ops/ms
>> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ± 15.589 | ops/ms
>>
>> Yours,
>> WANG Huang
>
> Wang Huang has updated the pull request incrementally with one additional commit since the last revision:
>
> unroll when small string sizes
Here are some Graviton 2 timings, five versions:
+UseSimpleStringEquals:
Benchmark (size) Mode Cnt Score Error Units
StringEquals.equal 1 avgt 5 2.813 ± 0.001 us/op
StringEquals.equal 3 avgt 5 2.821 ± 0.001 us/op
StringEquals.equal 4 avgt 5 2.812 ± 0.001 us/op
StringEquals.equal 6 avgt 5 2.821 ± 0.002 us/op
StringEquals.equal 8 avgt 5 2.420 ± 0.001 us/op
StringEquals.equal 11 avgt 5 2.420 ± 0.001 us/op
StringEquals.equal 16 avgt 5 2.421 ± 0.002 us/op
StringEquals.equal 21 avgt 5 3.291 ± 0.003 us/op
StringEquals.equal 32 avgt 5 4.412 ± 0.001 us/op
StringEquals.equal 45 avgt 5 5.623 ± 0.001 us/op
StringEquals.equal 64 avgt 5 7.225 ± 0.010 us/op
StringEquals.equal 91 avgt 5 10.426 ± 0.002 us/op
StringEquals.equal 128 avgt 5 13.628 ± 0.001 us/op
StringEquals.equal 181 avgt 5 19.231 ± 0.002 us/op
StringEquals.equal 256 avgt 5 26.436 ± 0.009 us/op
Your commit 4f02c00f55f1dc37762a04b7e30534ee27a7f20a:
Benchmark (size) Mode Cnt Score Error Units
StringEquals.equal 1 avgt 5 2.812 ± 0.001 us/op
StringEquals.equal 3 avgt 5 3.212 ± 0.001 us/op
StringEquals.equal 4 avgt 5 2.812 ± 0.001 us/op
StringEquals.equal 6 avgt 5 3.212 ± 0.001 us/op
StringEquals.equal 8 avgt 5 3.612 ± 0.001 us/op
StringEquals.equal 11 avgt 5 4.413 ± 0.001 us/op
StringEquals.equal 16 avgt 5 4.813 ± 0.001 us/op
StringEquals.equal 21 avgt 5 5.613 ± 0.001 us/op
StringEquals.equal 32 avgt 5 6.418 ± 0.001 us/op
StringEquals.equal 45 avgt 5 7.614 ± 0.001 us/op
StringEquals.equal 64 avgt 5 6.929 ± 0.081 us/op
StringEquals.equal 91 avgt 5 9.617 ± 0.001 us/op
StringEquals.equal 128 avgt 5 11.880 ± 0.152 us/op
StringEquals.equal 181 avgt 5 16.576 ± 0.002 us/op
StringEquals.equal 256 avgt 5 21.869 ± 0.108 us/op
My hack using ldp:
Benchmark (size) Mode Cnt Score Error Units
StringEquals.equal 1 avgt 5 2.414 ± 0.001 us/op
StringEquals.equal 3 avgt 5 2.814 ± 0.001 us/op
StringEquals.equal 4 avgt 5 2.414 ± 0.001 us/op
StringEquals.equal 6 avgt 5 2.814 ± 0.001 us/op
StringEquals.equal 8 avgt 5 3.214 ± 0.001 us/op
StringEquals.equal 11 avgt 5 4.015 ± 0.001 us/op
StringEquals.equal 16 avgt 5 4.419 ± 0.001 us/op
StringEquals.equal 21 avgt 5 5.216 ± 0.001 us/op
StringEquals.equal 32 avgt 5 6.017 ± 0.001 us/op
StringEquals.equal 45 avgt 5 7.218 ± 0.001 us/op
StringEquals.equal 64 avgt 5 6.015 ± 0.001 us/op
StringEquals.equal 91 avgt 5 8.967 ± 0.015 us/op
StringEquals.equal 128 avgt 5 9.217 ± 0.001 us/op
StringEquals.equal 181 avgt 5 14.096 ± 0.011 us/op
StringEquals.equal 256 avgt 5 15.462 ± 0.259 us/op
Today's -UseSimpleStringEquals:
Benchmark (size) Mode Cnt Score Error Units
StringEquals.equal 1 avgt 5 2.812 ± 0.001 us/op
StringEquals.equal 3 avgt 5 3.212 ± 0.001 us/op
StringEquals.equal 4 avgt 5 2.812 ± 0.001 us/op
StringEquals.equal 6 avgt 5 3.212 ± 0.001 us/op
StringEquals.equal 8 avgt 5 2.813 ± 0.002 us/op
StringEquals.equal 11 avgt 5 2.813 ± 0.001 us/op
StringEquals.equal 16 avgt 5 2.813 ± 0.001 us/op
StringEquals.equal 21 avgt 5 3.615 ± 0.001 us/op
StringEquals.equal 32 avgt 5 4.414 ± 0.001 us/op
StringEquals.equal 45 avgt 5 7.080 ± 0.027 us/op
StringEquals.equal 64 avgt 5 7.613 ± 0.001 us/op
StringEquals.equal 91 avgt 5 10.037 ± 0.005 us/op
StringEquals.equal 128 avgt 5 10.419 ± 0.001 us/op
StringEquals.equal 181 avgt 5 14.896 ± 0.004 us/op
StringEquals.equal 256 avgt 5 16.823 ± 0.001 us/op
-------------
PR: https://git.openjdk.java.net/jdk/pull/4423
More information about the hotspot-dev
mailing list