AARCH64 optimization: using TBZ instruction for bit check

eric.caspole at oracle.com eric.caspole at oracle.com
Fri Jun 12 18:24:57 UTC 2020


Hi Boris,
Could you add the JMH to your webrev under
test/micro/org/openjdk/bench/?
Thanks,
Eric


On 6/12/20 2:10 PM, Boris Ulasevich wrote:
> Hi all,
> 
> Please review the new AARCH64 instruction selection rules.
> The change applies TBZ instruction for bit checks: "if ((var&16) == 16)".
> This makes 17% performance improvement on the benchmark and 5% on a real 
> application.
> 
> http://bugs.openjdk.java.net/browse/JDK-8247408
> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00
> 
> - from the full change I excluded far branch test is because it works a 
> long time, and I'm not sure C2 will not change its behaviour:
> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus
> 
> The change was tested on jtreg in fastdebug mode: no regressions.
> 
> thanks,
> Boris
> 
> ======================================================================================== 
> 
> Benchmark                                               Mode Cnt 
> Score    Error  Units           Score     Error
> TBZBenchmark.cmpAndBranch2Tbz                          thrpt   25 
> 1329060.879 ± 42.780  ops/s     1504990.708 ± 158.096
> TBZBenchmark.cmpAndBranch2Tbz:CPI                      thrpt 5 0.325 ±  
> 0.001   #/op           0.410 ±   0.001
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses    thrpt 5 0.019 ±  
> 0.031   #/op           0.018 ±   0.025
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads          thrpt 5 16.811 ± 
> 0.791   #/op          16.809 ±   0.914
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses   thrpt 5 0.016 ±  
> 0.017   #/op           0.014 ±   0.022
> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores         thrpt 5 16.704 ± 
> 0.634   #/op          16.771 ±   0.539
> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses    thrpt 5 0.017 ±  
> 0.027   #/op           0.016 ±   0.023
> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads          thrpt 5 1811.848 
> ±  3.552   #/op        1148.737 ±   2.993
> TBZBenchmark.cmpAndBranch2Tbz:branch-misses            thrpt 5 1.013 ±  
> 0.009   #/op           1.011 ±   0.018
> TBZBenchmark.cmpAndBranch2Tbz:cycles                   thrpt 5 1882.193 
> ±  3.799   #/op        1662.994 ±   5.935
> TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses         thrpt 5 0.004 ±  
> 0.008   #/op           0.005 ±   0.016
> TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads               thrpt 5 16.687 ± 
> 0.732   #/op          16.669 ±   0.958
> TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses         thrpt 5 0.003 ±  
> 0.009   #/op           0.003 ±   0.008
> TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads               thrpt 5 1586.390 
> ±  2.612   #/op        1353.981 ±   3.469
> TBZBenchmark.cmpAndBranch2Tbz:instructions             thrpt 5 5791.824 
> ± 15.362   #/op        4055.443 ±  17.785
> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend   thrpt 5 5.279 ±  
> 1.968   #/op          20.459 ±   5.258
> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend  thrpt 5 66.808 ± 
> 0.700   #/op          12.738 ±   1.040
> 
> public class TBZBenchmark {
>      @Benchmark
>      public int cmpAndBranch2Tbz() {
>          int count = 0;
>          for (int value = 0; value < 1000; value++) {
>              if ((value & 32) == 32) {
>                  count--;
>              } else {
>                  count++;
>              }
>          }
>          return count;
>      }
> }
> 


More information about the hotspot-compiler-dev mailing list