[aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check
Boris Ulasevich
boris.ulasevich at bell-sw.com
Fri Jun 12 18:10:13 UTC 2020
Hi all,
Please review the new AARCH64 instruction selection rules.
The change applies TBZ instruction for bit checks: "if ((var&16) == 16)".
This makes 17% performance improvement on the benchmark and 5% on a real
application.
http://bugs.openjdk.java.net/browse/JDK-8247408
http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00
- from the full change I excluded far branch test is because it works a
long time, and I'm not sure C2 will not change its behaviour:
http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus
The change was tested on jtreg in fastdebug mode: no regressions.
thanks,
Boris
========================================================================================
Benchmark Mode Cnt
Score Error Units Score Error
TBZBenchmark.cmpAndBranch2Tbz thrpt 25
1329060.879 ± 42.780 ops/s 1504990.708 ± 158.096
TBZBenchmark.cmpAndBranch2Tbz:CPI thrpt 5
0.325 ± 0.001 #/op 0.410 ± 0.001
TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses thrpt 5
0.019 ± 0.031 #/op 0.018 ± 0.025
TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads thrpt 5 16.811 ±
0.791 #/op 16.809 ± 0.914
TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses thrpt 5
0.016 ± 0.017 #/op 0.014 ± 0.022
TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores thrpt 5 16.704 ±
0.634 #/op 16.771 ± 0.539
TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses thrpt 5
0.017 ± 0.027 #/op 0.016 ± 0.023
TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads thrpt 5 1811.848
± 3.552 #/op 1148.737 ± 2.993
TBZBenchmark.cmpAndBranch2Tbz:branch-misses thrpt 5
1.013 ± 0.009 #/op 1.011 ± 0.018
TBZBenchmark.cmpAndBranch2Tbz:cycles thrpt 5 1882.193
± 3.799 #/op 1662.994 ± 5.935
TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses thrpt 5
0.004 ± 0.008 #/op 0.005 ± 0.016
TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads thrpt 5 16.687 ±
0.732 #/op 16.669 ± 0.958
TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses thrpt 5
0.003 ± 0.009 #/op 0.003 ± 0.008
TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads thrpt 5 1586.390
± 2.612 #/op 1353.981 ± 3.469
TBZBenchmark.cmpAndBranch2Tbz:instructions thrpt 5 5791.824
± 15.362 #/op 4055.443 ± 17.785
TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend thrpt 5
5.279 ± 1.968 #/op 20.459 ± 5.258
TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend thrpt 5 66.808 ±
0.700 #/op 12.738 ± 1.040
public class TBZBenchmark {
@Benchmark
public int cmpAndBranch2Tbz() {
int count = 0;
for (int value = 0; value < 1000; value++) {
if ((value & 32) == 32) {
count--;
} else {
count++;
}
}
return count;
}
}
More information about the aarch64-port-dev
mailing list