RFR: 8350852: Implement JMH benchmark for sparse CodeCache

Boris Ulasevich bulasevich at openjdk.org
Wed Mar 19 16:58:08 UTC 2025


On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

> This benchmark is used to check performance impact of the code cache being sparse.
> 
> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods.
> 
> Results: code region size 2M (2097152) bytes
> - Intel Xeon Platinum 8259CL
> 
> |activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
> |---	|---	|---	|---	|---	|---	|---	|
> |128	|1	|128	|19.577	|0.619	|us/op	|	|
> |128	|32	|4	|22.968	|0.314	|us/op	|17.30%	|
> |128	|48	|3	|22.245	|0.388	|us/op	|13.60%	|
> |128	|64	|2	|23.874	|0.84	|us/op	|21.90%	|
> |128	|80	|2	|23.786	|0.231	|us/op	|21.50%	|
> |128	|96	|1	|26.224	|1.16	|us/op	|34%	|
> |128	|112	|1	|27.028	|0.461	|us/op	|38.10%	|
> |256	|1	|256	|47.43	|1.146	|us/op	|	|
> |256	|32	|8	|63.962	|1.671	|us/op	|34.90%	|
> |256	|48	|5	|63.396	|0.247	|us/op	|33.70%	|
> |256	|64	|4	|66.604	|2.286	|us/op	|40.40%	|
> |256	|80	|3	|59.746	|1.273	|us/op	|26%	|
> |256	|96	|3	|63.836	|1.034	|us/op	|34.60%	|
> |256	|112	|2	|63.538	|1.814	|us/op	|34%	|
> |512	|1	|512	|172.731	|4.409	|us/op	|	|
> |512	|32	|16	|206.772	|6.229	|us/op	|19.70%	|
> |512	|48	|11	|215.275	|2.228	|us/op	|24.60%	|
> |512	|64	|8	|212.962	|2.028	|us/op	|23.30%	|
> |512	|80	|6	|201.335	|12.519	|us/op	|16.60%	|
> |512	|96	|5	|198.133	|6.502	|us/op	|14.70%	|
> |512	|112	|5	|193.739	|3.812	|us/op	|12.20%	|
> |768	|1	|768	|325.154	|5.048	|us/op	|	|
> |768	|32	|24	|346.298	|20.196	|us/op	|6.50%	|
> |768	|48	|16	|350.746	|2.931	|us/op	|7.90%	|
> |768	|64	|12	|339.445	|7.927	|us/op	|4.40%	|
> |768	|80	|10	|347.408	|7.355	|us/op	|6.80%	|
> |768	|96	|8	|340.983	|3.578	|us/op	|4.90%	|
> |768	|112	|7	|353.949	|2.98	|us/op	|8.90%	|
> |1024	|1	|1024	|368.352	|5.961	|us/op	|	|
> |1024	|32	|32	|463.822	|6.274	|us/op	|25.90%	|
> |1024	|48	|21	|457.674	|15.144	|us/op	|24.20%	|
> |1024	|64	|16	|477.694	|0.986	|us/op	|29.70%	|
> |1024	|80	|13	|484.901	|32.601	|us/op	|31.60%	|
> |1024	|96	|11	|480.8	|27.088	|us/op	|30.50%	|
> |1024	|112	|9	|474.416	|10.053	|us/op	|28.80%	|
> 
> - AArch64 Neoverse N1
> 
> |activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
> |---	|---	|---	|---	|---	|---	|---	|
> |128	|1	|128	|25.297	|0.792	|us/op	|	|
> |128	|32	|4	|31.451...

Hi Evgeny,

I ran the benchmark on my machines (s390 and riscv64 are virtual machines, so I do not trust them much). The result is that code sparsity affects performance on Graviton4, s390 and POWER9. RISC-V in my hands does not care about code sparsity, but shows dramatic degradation as the amount of code increases.

Here is the raw data:

..  |   | G4 |   | S390 |   | POWER9 |   | riscv64 |  
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
activeMethodCount | groupCount | us/op | ± | us/op | ± | us/op | ± | us/op | ±
128 | 1 | 11.972 | 0.004 | 21.577 | 0.042 | 27.585 | 0.749 | 108.109 | 0.669
128 | 32 | 13.622 | 0.092 | 24.762 | 0.149 | 34.682 | 2.468 | 107.09 | 0.507
128 | 48 | 13.217 | 0.072 | 25.094 | 0.014 | 35.657 | 0.913 | 108.862 | 0.43
128 | 64 | 13.668 | 0.04 | 25.581 | 0.056 | 34.857 | 0.841 | 109.416 | 0.258
128 | 80 | 13.986 | 0.127 | 25.74 | 0.071 | 36.264 | 0.873 | 110.196 | 0.29
128 | 96 | 14.594 | 0.055 | 26.033 | 0.058 | 36.734 | 0.672 | 111.411 | 0.602
128 | 112 | 14.77 | 0.078 | 27.594 | 0.033 | 36.482 | 1.513 | 112.238 | 0.6
256 | 1 | 23.998 | 0.019 | 45.146 | 0.131 | 68.831 | 1.058 | 224.967 | 0.392
256 | 32 | 26.273 | 0.036 | 52.402 | 0.038 | 71.686 | 4.776 | 217.511 | 1.667
256 | 48 | 26.61 | 0.063 | 52.949 | 0.317 | 70.867 | 2.41 | 220.549 | 0.41
256 | 64 | 26.959 | 0.085 | 53.824 | 0.367 | 72.771 | 1.423 | 220.805 | 0.952
256 | 80 | 27.646 | 0.089 | 53.927 | 1.035 | 73.949 | 2.102 | 220.814 | 0.498
256 | 96 | 27.829 | 0.128 | 54.665 | 0.029 | 75.791 | 3.527 | 222.571 | 0.875
256 | 112 | 28.298 | 0.064 | 53.902 | 0.237 | 75.996 | 3.266 | 224.626 | 1.752
512 | 1 | 48.181 | 0.032 | 88.372 | 0.299 | 147.922 | 7.862 | 487.557 | 1.454
512 | 32 | 53.157 | 0.044 | 108.089 | 0.124 | 151.998 | 3.999 | 462.369 | 0.917
512 | 48 | 55.13 | 0.052 | 109.149 | 0.77 | 160.646 | 28.419 | 456.265 | 1.198
512 | 64 | 56.609 | 0.123 | 110.346 | 0.729 | 158.9 | 16.885 | 464.728 | 3.811
512 | 80 | 57.146 | 0.091 | 110.808 | 0.295 | 157.446 | 11.494 | 454.655 | 4.941
512 | 96 | 59.038 | 0.092 | 111.117 | 0.101 | 154.412 | 5.113 | 465.095 | 1.281
512 | 112 | 60.647 | 0.331 | 110.216 | 0.153 | 155.93 | 9.2 | 489.859 | 0.988
768 | 1 | 77.086 | 0.402 | 139.595 | 0.839 | 191.497 | 6.112 | 1998.335 | 5729.012
768 | 32 | 89.599 | 0.14 | 159.535 | 0.816 | 230.192 | 2.105 | 1663.619 | 5404.593
768 | 48 | 94.312 | 0.33 | 164.865 | 0.493 | 234.917 | 12.344 | 1737.604 | 5687.615
768 | 64 | 94.243 | 0.218 | 166.708 | 0.498 | 234.764 | 10.555 | 1717.1 | 5527.53
768 | 80 | 95.566 | 0.068 | 167.759 | 0.067 | 235.179 | 9.158 | 1732.491 | 5585.148
768 | 96 | 99.435 | 0.323 | 168.27 | 0.201 | 232.356 | 5.571 | 1926.957 | 6162.978
768 | 112 | 105.814 | 0.366 | 167.955 | 0.188 | 234.879 | 4.964 | 1876.117 | 6096.535
1024 | 1 | 110.407 | 1.27 | 198.679 | 1.541 | 251.436 | 14.05 | 6632.683 | 4073.64
1024 | 32 | 137.626 | 1.62 | 215.316 | 0.422 | 290.579 | 8.847 | 6546.788 | 3998.059
1024 | 48 | 141.191 | 0.372 | 216.638 | 1.415 | 295.236 | 21.935 | 6523.087 | 4009.421
1024 | 64 | 141.227 | 0.238 | 218.441 | 1.636 | 299.975 | 5.916 | 6356.841 | 4165.066
1024 | 80 | 148.555 | 0.157 | 220.563 | 0.21 | 298.32 | 11.28 | 6321.32 | 4617.812
1024 | 96 | 155.47 | 0.321 | 218.799 | 0.431 | 298.863 | 18.88 | 6431.995 | 4325.676
1024 | 112 | 158.288 | 0.568 | 219.812 | 0.955 | 290.01 | 8.248 | 6262.742 | 4558.472

And let me post some pictures.

Here is the most simple and evident one. It says: sparsity matters. We observe approximately a 20% performance degradation on AArch, S390, and Power9 when we split methods into 32/128 distant groups.

![image](https://github.com/user-attachments/assets/1c46b5da-6c88-4f57-8955-c82658bb512b)

Here is a broader picture. I normalized the data to align different platforms and compare the time per single method call. We see that sparsity matters, and the amount of code is also important. I do not include my RISC-V results here, as its performance behaves erratically as the amount of code increases. And I believe this is not a real effect but rather a peculiarity of the virtual machine.

![image](https://github.com/user-attachments/assets/f1943df2-0156-4186-b4b0-f0e98e816aba)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2737391773


More information about the hotspot-compiler-dev mailing list