AARCH64 optimization: using TBZ instruction for bit check
eric.caspole at oracle.com
eric.caspole at oracle.com
Fri Jun 12 18:24:57 UTC 2020
Hi Boris,
Could you add the JMH to your webrev under
test/micro/org/openjdk/bench/?
Thanks,
Eric
On 6/12/20 2:10 PM, Boris Ulasevich wrote:
> Hi all,
>
> Please review the new AARCH64 instruction selection rules.
> The change applies TBZ instruction for bit checks: "if ((var&16) == 16)".
> This makes 17% performance improvement on the benchmark and 5% on a real
> application.
>
> http://bugs.openjdk.java.net/browse/JDK-8247408
> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00
>
> - from the full change I excluded far branch test is because it works a
> long time, and I'm not sure C2 will not change its behaviour:
> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus
>
> The change was tested on jtreg in fastdebug mode: no regressions.
>
> thanks,
> Boris
>
> ========================================================================================
>
> Benchmark Mode Cnt
> Score Error Units Score Error
> TBZBenchmark.cmpAndBranch2Tbz thrpt 25
> 1329060.879 ± 42.780 ops/s 1504990.708 ± 158.096
> TBZBenchmark.cmpAndBranch2Tbz:CPI thrpt 5 0.325 ±
> 0.001 #/op 0.410 ± 0.001
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses thrpt 5 0.019 ±
> 0.031 #/op 0.018 ± 0.025
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads thrpt 5 16.811 ±
> 0.791 #/op 16.809 ± 0.914
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses thrpt 5 0.016 ±
> 0.017 #/op 0.014 ± 0.022
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores thrpt 5 16.704 ±
> 0.634 #/op 16.771 ± 0.539
> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses thrpt 5 0.017 ±
> 0.027 #/op 0.016 ± 0.023
> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads thrpt 5 1811.848
> ± 3.552 #/op 1148.737 ± 2.993
> TBZBenchmark.cmpAndBranch2Tbz:branch-misses thrpt 5 1.013 ±
> 0.009 #/op 1.011 ± 0.018
> TBZBenchmark.cmpAndBranch2Tbz:cycles thrpt 5 1882.193
> ± 3.799 #/op 1662.994 ± 5.935
> TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses thrpt 5 0.004 ±
> 0.008 #/op 0.005 ± 0.016
> TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads thrpt 5 16.687 ±
> 0.732 #/op 16.669 ± 0.958
> TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses thrpt 5 0.003 ±
> 0.009 #/op 0.003 ± 0.008
> TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads thrpt 5 1586.390
> ± 2.612 #/op 1353.981 ± 3.469
> TBZBenchmark.cmpAndBranch2Tbz:instructions thrpt 5 5791.824
> ± 15.362 #/op 4055.443 ± 17.785
> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend thrpt 5 5.279 ±
> 1.968 #/op 20.459 ± 5.258
> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend thrpt 5 66.808 ±
> 0.700 #/op 12.738 ± 1.040
>
> public class TBZBenchmark {
> @Benchmark
> public int cmpAndBranch2Tbz() {
> int count = 0;
> for (int value = 0; value < 1000; value++) {
> if ((value & 32) == 32) {
> count--;
> } else {
> count++;
> }
> }
> return count;
> }
> }
>
More information about the hotspot-compiler-dev
mailing list