RFC: AArch64: Set Segmented CodeCache default size to 127M
Astigeevich, Evgeny
eastig at amazon.co.uk
Thu Feb 10 23:02:08 UTC 2022
Hello,
We’d like to discuss a proposal for setting TieredCompilation Segmented CodeCache default size to 127M on AArch64 (https://bugs.openjdk.java.net/browse/JDK-8280150).
The current default size of TieredCompilation CodeCache is 240M: 116M "non-profiled" segment + 116M "profiled" segment + 8M "non-nmethods" segment. AArch64 ISA has direct calls and jumps range limited to 128M. The C1/C2 compilers generate far jumps, calls and trampolines to overcome the limitation of direct jumps/calls. They use MacroAssembler::far_branches which compares ReservedCodeCacheSize with the direct jumps/calls range. With 240M CodeCache JIT has to use far jumps/trampolines. Such far jumps/trampolines result in performance and code size overhead.
Our observations [1] suggest most applications running on AArch64 platforms have hot code not exceeding 128M.
AArch64 has a default ReservedCodeCacheSize of 48M. For tiered compilation the value is multiplied by 5 getting it to 240M. We experimented with CodeCache configuration: 48M "non-profiled" segment + 48M "profiled" segment + 8M "non-nmethods" segment. We ran SpecJbb2015, DaCapo at f480064 (https://github.com/dacapobench/dacapobench/tree/dev-chopin), Renaissance 0.14, and internal services.
We did not see any statistically significant regressions. SpecJbb improved max-jOPS by +1.68% and critical-jOPS by +1.34%. For DaCapo, eclipse improved by 3.57%, tomcat by 1.45% and tradesoap by 3.03%. Only two Renaissance benchmarks had statistically significant results: dotty (+9.0%) and finagle-http (+3.9%). Others had changes which were comparable with the coefficient of variation. All benchmarks had significant decreases in max use of the non-profiled and profiled segments (see data below).
To mitigate risks of 104M not being enough we’d like to change the default size of TieredCompilation CodeCache to 127M (which is just below the size where the JIT would generate far jumps and trampolines): 60M "non-profiled" segment + 60M "profiled" segment + 7M "non-nmethods" segment. We did partial runs with 127M CodeCache. Their results were similar to the 104M configuration.
Average maximum used memory(Kb) in segments (it was checked numbers of compiled methods were similar in both cases):
NPS=non-profiled segment
PS=profiled segment
NNS=non-nmethods segment
SpecJbb
+----------+---------+--------+---------+--------+--------+----------+---------+----------+
| 116M NPS | 116M PS | 8M NNS | 48M NPS | 48M PS | 8M NNS | diff NPS | diff PS | diff NNS |
+----------+---------+--------+---------+--------+--------+----------+---------+----------+
| 12491 | 13968 | 4274 | 10649 | 12276 | 4234 | -14.7% | -12.1% | -0.9% |
+----------+---------+--------+---------+--------+--------+----------+---------+----------+
DaCapo
+------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+
| benchmark | 116M NPS | 116M PS | 8M NNS | 48M NPS | 48M PS | 8M NNS | diff NPS | diff PS | diff NNS |
+------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+
| avrora | 2301 | 6324 | 4167 | 1887 | 5049 | 4080 | -18.00% | -20.20% | -2.10% |
| batik | 6108 | 5301 | 4128 | 4686 | 4289 | 4114 | -23.30% | -19.10% | -0.30% |
| biojava | 2018 | 5907 | 4047 | 1703 | 5364 | 4026 | -15.60% | -9.20% | -0.50% |
| eclipse | 30862 | 26824 | 4275 | 27314 | 24330 | 4180 | -11.50% | -9.30% | -2.20% |
| jme | 1567 | 5987 | 3502 | 1315 | 5205 | 3491 | -16.10% | -13.10% | -0.30% |
| lusearch | 5424 | 9145 | 4201 | 4699 | 7147 | 4100 | -13.40% | -21.90% | -2.40% |
| pmd | 12011 | 14438 | 4232 | 10701 | 12456 | 4140 | -10.90% | -13.70% | -2.20% |
| sunflow | 1707 | 4341 | 4082 | 1220 | 3174 | 4040 | -28.60% | -26.90% | -1.00% |
| tomcat | 15228 | 23595 | 4292 | 13519 | 20686 | 4187 | -11.20% | -12.30% | -2.50% |
| graphchi | 1243 | 5238 | 4009 | 1063 | 4375 | 3998 | -14.50% | -16.50% | -0.30% |
| xalan | 5270 | 8363 | 4191 | 4784 | 6643 | 4100 | -9.20% | -20.60% | -2.20% |
| fop | 11597 | 20814 | 4336 | 10361 | 18485 | 4256 | -10.70% | -11.20% | -1.80% |
| luindex | 4013 | 5531 | 3697 | 3083 | 4384 | 3507 | -23.20% | -20.70% | -5.20% |
| zxing | 4577 | 7267 | 4255 | 4044 | 5820 | 4164 | -11.60% | -19.90% | -2.10% |
| tradebeans | 10313 | 26983 | 4603 | 9210 | 24954 | 4522 | -10.70% | -7.50% | -1.80% |
| tradesoap | 16939 | 35276 | 4649 | 15245 | 30888 | 4549 | -10.00% | -12.40% | -2.10% |
+------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+
Renaissance
+------------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+
| benchmark | 116M NPS | 116M PS | 8M NNS | 48M NPS | 48M PS | 8M NNS | diff NPS | diff PS | diff NNS |
+------------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+
| akka-uct | 4053 | 9615 | 3661 | 3001 | 8381 | 3559 | -26.00% | -12.80% | -2.80% |
| als | 20732 | 39367 | 4554 | 18914 | 32400 | 4464 | -8.80% | -17.70% | -2.00% |
| chi-square | 7922 | 23568 | 3828 | 7160 | 20603 | 3759 | -9.60% | -12.60% | -1.80% |
| dec-tree | 23938 | 55512 | 4026 | 21857 | 36866 | 3946 | -8.70% | -33.60% | -2.00% |
| dotty | 42405 | 40963 | 3712 | 37997 | 32770 | 3621 | -10.40% | -20.00% | -2.50% |
| finagle-chirper | 21150 | 19833 | 3795 | 18652 | 17479 | 3693 | -11.80% | -11.90% | -2.70% |
| finagle-http | 11950 | 19553 | 3778 | 10675 | 17234 | 3709 | -10.70% | -11.90% | -1.80% |
| fj-kmeans | 960 | 4756 | 3504 | 882 | 4437 | 3484 | -8.10% | -6.70% | -0.60% |
| future-genetic | 1760 | 5470 | 3526 | 1466 | 4449 | 3497 | -16.70% | -18.70% | -0.80% |
| gauss-mix | 11910 | 21406 | 4459 | 10675 | 18741 | 4382 | -10.40% | -12.40% | -1.70% |
| log-regression | 25230 | 42802 | 4108 | 22791 | 34542 | 3989 | -9.70% | -19.30% | -2.90% |
| mnemonics | 1094 | 3914 | 3501 | 1010 | 3669 | 3480 | -7.70% | -6.30% | -0.60% |
| movie-lens | 20571 | 23472 | 4495 | 18500 | 20728 | 4424 | -10.10% | -11.70% | -1.60% |
| naive-bayes | 24305 | 45967 | 4030 | 22124 | 35135 | 3929 | -9.00% | -23.60% | -2.50% |
| page-rank | 9386 | 24226 | 3817 | 8554 | 22081 | 3769 | -8.90% | -8.90% | -1.30% |
| par-mnemonics | 1217 | 4318 | 3501 | 1128 | 4098 | 3477 | -7.30% | -5.10% | -0.70% |
| philosophers | 2647 | 5765 | 3571 | 2146 | 4293 | 3506 | -18.90% | -25.50% | -1.80% |
| reactors | 2663 | 5266 | 3632 | 2278 | 4321 | 3513 | -14.50% | -17.90% | -3.30% |
| rx-scrabble | 2511 | 6721 | 3535 | 2131 | 5037 | 3506 | -15.10% | -25.10% | -0.80% |
| scala-doku | 2106 | 6408 | 3522 | 1775 | 4744 | 3500 | -15.70% | -26.00% | -0.60% |
| scala-kmeans | 1104 | 4634 | 3497 | 1002 | 4345 | 3481 | -9.20% | -6.20% | -0.50% |
| scala-stm-bench7 | 3492 | 6611 | 3601 | 3158 | 5302 | 3509 | -9.60% | -19.80% | -2.60% |
| scrabble | 1816 | 6046 | 3546 | 1460 | 4902 | 3496 | -19.60% | -18.90% | -1.40% |
+------------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+
[1] CodeCache usage data from:
- Latest versions of SpecJbb, DaCapo and Renaissance benchmarks.
- An internal service with 15000+ compiled Java methods running without compilation issues with 64M CodeCache (TieredCompilation off) and with 127M segmented CodeCache.
- A recommendation to use 64M CodeCache (TieredCompilation off) to improve performance (https://github.com/aws/aws-graviton-getting-started/blob/main/java.md).
- IDEs like IntelliJ, CLion can use more 130M but they don't rely on the default values.
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
More information about the hotspot-compiler-dev
mailing list