RFR: 8350852: Implement JMH benchmark for sparse CodeCache

Tue Mar 4 17:37:40 UTC 2025

This benchmark is used to check performance impact of the code cache being sparse.

We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods.

Results: code region size 2M (2097152) bytes
- Intel Xeon Platinum 8259CL

|activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
|---	|---	|---	|---	|---	|---	|---	|
|128	|1	|128	|19.577	|0.619	|us/op	|	|
|128	|32	|4	|22.968	|0.314	|us/op	|17.30%	|
|128	|48	|3	|22.245	|0.388	|us/op	|13.60%	|
|128	|64	|2	|23.874	|0.84	|us/op	|21.90%	|
|128	|80	|2	|23.786	|0.231	|us/op	|21.50%	|
|128	|96	|1	|26.224	|1.16	|us/op	|34%	|
|128	|112	|1	|27.028	|0.461	|us/op	|38.10%	|
|256	|1	|256	|47.43	|1.146	|us/op	|	|
|256	|32	|8	|63.962	|1.671	|us/op	|34.90%	|
|256	|48	|5	|63.396	|0.247	|us/op	|33.70%	|
|256	|64	|4	|66.604	|2.286	|us/op	|40.40%	|
|256	|80	|3	|59.746	|1.273	|us/op	|26%	|
|256	|96	|3	|63.836	|1.034	|us/op	|34.60%	|
|256	|112	|2	|63.538	|1.814	|us/op	|34%	|
|512	|1	|512	|172.731	|4.409	|us/op	|	|
|512	|32	|16	|206.772	|6.229	|us/op	|19.70%	|
|512	|48	|11	|215.275	|2.228	|us/op	|24.60%	|
|512	|64	|8	|212.962	|2.028	|us/op	|23.30%	|
|512	|80	|6	|201.335	|12.519	|us/op	|16.60%	|
|512	|96	|5	|198.133	|6.502	|us/op	|14.70%	|
|512	|112	|5	|193.739	|3.812	|us/op	|12.20%	|
|768	|1	|768	|325.154	|5.048	|us/op	|	|
|768	|32	|24	|346.298	|20.196	|us/op	|6.50%	|
|768	|48	|16	|350.746	|2.931	|us/op	|7.90%	|
|768	|64	|12	|339.445	|7.927	|us/op	|4.40%	|
|768	|80	|10	|347.408	|7.355	|us/op	|6.80%	|
|768	|96	|8	|340.983	|3.578	|us/op	|4.90%	|
|768	|112	|7	|353.949	|2.98	|us/op	|8.90%	|
|1024	|1	|1024	|368.352	|5.961	|us/op	|	|
|1024	|32	|32	|463.822	|6.274	|us/op	|25.90%	|
|1024	|48	|21	|457.674	|15.144	|us/op	|24.20%	|
|1024	|64	|16	|477.694	|0.986	|us/op	|29.70%	|
|1024	|80	|13	|484.901	|32.601	|us/op	|31.60%	|
|1024	|96	|11	|480.8	|27.088	|us/op	|30.50%	|
|1024	|112	|9	|474.416	|10.053	|us/op	|28.80%	|

- AArch64 Neoverse N1

|activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
|---	|---	|---	|---	|---	|---	|---	|
|128	|1	|128	|25.297	|0.792	|us/op	|	|
|128	|32	|4	|31.451	|0.455	|us/op	|24.30%	|
|128	|48	|3	|30.641	|0.663	|us/op	|21.10%	|
|128	|64	|2	|31.742	|0.433	|us/op	|25.50%	|
|128	|80	|2	|31.867	|0.719	|us/op	|26%	|
|128	|96	|1	|32.741	|0.358	|us/op	|29.40%	|
|128	|112	|1	|35.679	|0.638	|us/op	|41%	|
|256	|1	|256	|54.577	|1.478	|us/op	|	|
|256	|32	|8	|69.756	|1.771	|us/op	|27.80%	|
|256	|48	|5	|69.276	|0.317	|us/op	|26.90%	|
|256	|64	|4	|71.583	|2.446	|us/op	|31.20%	|
|256	|80	|3	|74.121	|2.521	|us/op	|35.80%	|
|256	|96	|3	|74.21	|0.632	|us/op	|36%	|
|256	|112	|2	|76.15	|2.681	|us/op	|39.50%	|
|512	|1	|512	|206.98	|1.35	|us/op	|	|
|512	|32	|16	|204.413	|4.111	|us/op	|-1.20%	|
|512	|48	|11	|211.315	|5.066	|us/op	|2.10%	|
|512	|64	|8	|224.012	|2.78	|us/op	|8.20%	|
|512	|80	|6	|209.903	|3.291	|us/op	|1.40%	|
|512	|96	|5	|213.318	|5.401	|us/op	|3.10%	|
|512	|112	|5	|210.134	|6.171	|us/op	|1.50%	|
|768	|1	|768	|354.851	|6.912	|us/op	|	|
|768	|32	|24	|364.047	|12.096	|us/op	|2.60%	|
|768	|48	|16	|381.982	|9.478	|us/op	|7.60%	|
|768	|64	|12	|389.904	|20.204	|us/op	|9.90%	|
|768	|80	|10	|385.125	|11.627	|us/op	|8.50%	|
|768	|96	|8	|377.831	|8.263	|us/op	|6.50%	|
|768	|112	|7	|388.252	|3.558	|us/op	|9.40%	|
|1024	|1	|1024	|430.501	|5.762	|us/op	|	|
|1024	|32	|32	|498.758	|16.065	|us/op	|15.90%	|
|1024	|48	|21	|507.239	|4.676	|us/op	|17.80%	|
|1024	|64	|16	|529.827	|25.531	|us/op	|23.10%	|
|1024	|80	|13	|537.753	|18.643	|us/op	|24.90%	|
|1024	|96	|11	|557.753	|7.804	|us/op	|29.60%	|
|1024	|112	|9	|544.645	|20.507	|us/op	|26.50%	|

In the case of 128 active methods and 112 groups the code sparsity (r11c per 1000 instructions) value was 1.09. For 128 active methods and 1 group this value was 0.0001. According to https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/debug_hw_perf.md, a number >0.5 indicates the code being executed by the CPU is very sparse.

- AArch64 Neoverse V1

|activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
|---	|---	|---	|---	|---	|---	|---	|
|128	|1	|128	|16.356	|0.239	|us/op	|	|
|128	|32	|4	|26.53	|0.71	|us/op	|62.20%	|
|128	|48	|3	|26.501	|1.792	|us/op	|62%	|
|128	|64	|2	|27.727	|1.128	|us/op	|69.50%	|
|128	|80	|2	|27.872	|1.346	|us/op	|70.40%	|
|128	|96	|1	|27.795	|0.958	|us/op	|69.90%	|
|128	|112	|1	|28.315	|0.695	|us/op	|73.10%	|
|256	|1	|256	|55.325	|1.74	|us/op	|	|
|256	|32	|8	|88.366	|2.968	|us/op	|59.70%	|
|256	|48	|5	|93.082	|0.539	|us/op	|68.20%	|
|256	|64	|4	|97.154	|2.865	|us/op	|75.60%	|
|256	|80	|3	|102.005	|5.147	|us/op	|84.40%	|
|256	|96	|3	|99.049	|4.068	|us/op	|79%	|
|256	|112	|2	|101.099	|1.467	|us/op	|82.70%	|
|512	|1	|512	|149.965	|3.813	|us/op	|	|
|512	|32	|16	|191.49	|4.07	|us/op	|27.70%	|
|512	|48	|11	|201.375	|3.384	|us/op	|34.30%	|
|512	|64	|8	|204.789	|3.964	|us/op	|36.60%	|
|512	|80	|6	|203.223	|3.236	|us/op	|35.50%	|
|512	|96	|5	|223.094	|3.022	|us/op	|48.80%	|
|512	|112	|5	|220.352	|3.431	|us/op	|46.90%	|
|768	|1	|768	|266.406	|5.179	|us/op	|	|
|768	|32	|24	|290.236	|10.351	|us/op	|8.90%	|
|768	|48	|16	|293.058	|8.69	|us/op	|10%	|
|768	|64	|12	|297.037	|6.729	|us/op	|11.50%	|
|768	|80	|10	|311.171	|2.136	|us/op	|16.80%	|
|768	|96	|8	|313.311	|5.015	|us/op	|17.60%	|
|768	|112	|7	|316.534	|8.885	|us/op	|18.80%	|
|1024	|1	|1024	|383.712	|3.717	|us/op	|	|
|1024	|32	|32	|379.525	|8.701	|us/op	|-1.10%	|
|1024	|48	|21	|388.86	|12.566	|us/op	|1.30%	|
|1024	|64	|16	|398.676	|13.699	|us/op	|3.90%	|
|1024	|80	|13	|410.646	|1.688	|us/op	|7%	|
|1024	|96	|11	|407.945	|10.952	|us/op	|6.30%	|
|1024	|112	|9	|408.161	|17.233	|us/op	|6.40%	|

The worst case for Graviton 3, 256 active methods and 112 groups, and ~83% regression, had the code sparsity value 0.6 vs 0.00002 when all 256 methods were in one group.

-------------

Commit messages:
 - Simplify benchmark code
 - 8350852: Implement JMH benchmark for sparse CodeCache

Changes: https://git.openjdk.org/jdk/pull/23831/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23831&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8350852
  Stats: 311 lines in 1 file changed: 311 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/23831.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23831/head:pull/23831

PR: https://git.openjdk.org/jdk/pull/23831