AARCH64 optimization: using TBZ instruction for bit check
Boris Ulasevich
boris.ulasevich at bell-sw.com
Sat Jun 13 18:24:50 UTC 2020
Hi Eric,
Ok. Here is the webrev with JMH:
http://cr.openjdk.java.net/~bulasevich/8247408/webrev.01
Thank you,
Boris
On 12.06.2020 21:24, eric.caspole at oracle.com wrote:
> Hi Boris,
> Could you add the JMH to your webrev under
> test/micro/org/openjdk/bench/?
> Thanks,
> Eric
>
>
> On 6/12/20 2:10 PM, Boris Ulasevich wrote:
>> Hi all,
>>
>> Please review the new AARCH64 instruction selection rules.
>> The change applies TBZ instruction for bit checks: "if ((var&16) ==
>> 16)".
>> This makes 17% performance improvement on the benchmark and 5% on a
>> real application.
>>
>> http://bugs.openjdk.java.net/browse/JDK-8247408
>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00
>>
>> - from the full change I excluded far branch test is because it works
>> a long time, and I'm not sure C2 will not change its behaviour:
>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus
>>
>> The change was tested on jtreg in fastdebug mode: no regressions.
>>
>> thanks,
>> Boris
>>
>> ========================================================================================
>>
>> Benchmark Mode Cnt
>> Score Error Units Score Error
>> TBZBenchmark.cmpAndBranch2Tbz thrpt 25
>> 1329060.879 ± 42.780 ops/s 1504990.708 ± 158.096
>> TBZBenchmark.cmpAndBranch2Tbz:CPI thrpt 5 0.325
>> ± 0.001 #/op 0.410 ± 0.001
>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses thrpt 5 0.019
>> ± 0.031 #/op 0.018 ± 0.025
>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads thrpt 5 16.811
>> ± 0.791 #/op 16.809 ± 0.914
>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses thrpt 5 0.016
>> ± 0.017 #/op 0.014 ± 0.022
>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores thrpt 5 16.704
>> ± 0.634 #/op 16.771 ± 0.539
>> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses thrpt 5 0.017
>> ± 0.027 #/op 0.016 ± 0.023
>> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads thrpt 5
>> 1811.848 ± 3.552 #/op 1148.737 ± 2.993
>> TBZBenchmark.cmpAndBranch2Tbz:branch-misses thrpt 5 1.013
>> ± 0.009 #/op 1.011 ± 0.018
>> TBZBenchmark.cmpAndBranch2Tbz:cycles thrpt 5
>> 1882.193 ± 3.799 #/op 1662.994 ± 5.935
>> TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses thrpt 5 0.004
>> ± 0.008 #/op 0.005 ± 0.016
>> TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads thrpt 5 16.687
>> ± 0.732 #/op 16.669 ± 0.958
>> TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses thrpt 5 0.003
>> ± 0.009 #/op 0.003 ± 0.008
>> TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads thrpt 5
>> 1586.390 ± 2.612 #/op 1353.981 ± 3.469
>> TBZBenchmark.cmpAndBranch2Tbz:instructions thrpt 5
>> 5791.824 ± 15.362 #/op 4055.443 ± 17.785
>> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend thrpt 5 5.279
>> ± 1.968 #/op 20.459 ± 5.258
>> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend thrpt 5 66.808
>> ± 0.700 #/op 12.738 ± 1.040
>>
>> public class TBZBenchmark {
>> @Benchmark
>> public int cmpAndBranch2Tbz() {
>> int count = 0;
>> for (int value = 0; value < 1000; value++) {
>> if ((value & 32) == 32) {
>> count--;
>> } else {
>> count++;
>> }
>> }
>> return count;
>> }
>> }
>>
More information about the hotspot-compiler-dev
mailing list