RFR: 8350852: Implement JMH benchmark for sparse CodeCache

Wed Mar 19 18:15:08 UTC 2025

On Wed, 19 Mar 2025 16:55:56 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> This benchmark is used to check performance impact of the code cache being sparse.
>> 
>> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods.
>> 
>> Results: code region size 2M (2097152) bytes
>> - Intel Xeon Platinum 8259CL
>> 
>> |activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
>> |---	|---	|---	|---	|---	|---	|---	|
>> |128	|1	|128	|19.577	|0.619	|us/op	|	|
>> |128	|32	|4	|22.968	|0.314	|us/op	|17.30%	|
>> |128	|48	|3	|22.245	|0.388	|us/op	|13.60%	|
>> |128	|64	|2	|23.874	|0.84	|us/op	|21.90%	|
>> |128	|80	|2	|23.786	|0.231	|us/op	|21.50%	|
>> |128	|96	|1	|26.224	|1.16	|us/op	|34%	|
>> |128	|112	|1	|27.028	|0.461	|us/op	|38.10%	|
>> |256	|1	|256	|47.43	|1.146	|us/op	|	|
>> |256	|32	|8	|63.962	|1.671	|us/op	|34.90%	|
>> |256	|48	|5	|63.396	|0.247	|us/op	|33.70%	|
>> |256	|64	|4	|66.604	|2.286	|us/op	|40.40%	|
>> |256	|80	|3	|59.746	|1.273	|us/op	|26%	|
>> |256	|96	|3	|63.836	|1.034	|us/op	|34.60%	|
>> |256	|112	|2	|63.538	|1.814	|us/op	|34%	|
>> |512	|1	|512	|172.731	|4.409	|us/op	|	|
>> |512	|32	|16	|206.772	|6.229	|us/op	|19.70%	|
>> |512	|48	|11	|215.275	|2.228	|us/op	|24.60%	|
>> |512	|64	|8	|212.962	|2.028	|us/op	|23.30%	|
>> |512	|80	|6	|201.335	|12.519	|us/op	|16.60%	|
>> |512	|96	|5	|198.133	|6.502	|us/op	|14.70%	|
>> |512	|112	|5	|193.739	|3.812	|us/op	|12.20%	|
>> |768	|1	|768	|325.154	|5.048	|us/op	|	|
>> |768	|32	|24	|346.298	|20.196	|us/op	|6.50%	|
>> |768	|48	|16	|350.746	|2.931	|us/op	|7.90%	|
>> |768	|64	|12	|339.445	|7.927	|us/op	|4.40%	|
>> |768	|80	|10	|347.408	|7.355	|us/op	|6.80%	|
>> |768	|96	|8	|340.983	|3.578	|us/op	|4.90%	|
>> |768	|112	|7	|353.949	|2.98	|us/op	|8.90%	|
>> |1024	|1	|1024	|368.352	|5.961	|us/op	|	|
>> |1024	|32	|32	|463.822	|6.274	|us/op	|25.90%	|
>> |1024	|48	|21	|457.674	|15.144	|us/op	|24.20%	|
>> |1024	|64	|16	|477.694	|0.986	|us/op	|29.70%	|
>> |1024	|80	|13	|484.901	|32.601	|us/op	|31.60%	|
>> |1024	|96	|11	|480.8	|27.088	|us/op	|30.50%	|
>> |1024	|112	|9	|474.416	|10.053	|us/op	|28.80%	|
>> 
>> - AArch64 Neoverse N1
>> 
>> |activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|...
>
> Hi Evgeny,
> 
> I ran the benchmark on my machines (s390 and riscv64 are virtual machines, so I do not trust them much). The result is that code sparsity affects performance on Graviton4, s390 and POWER9. RISC-V in my hands does not care about code sparsity, but shows dramatic degradation as the amount of code increases.
> 
> Here is the raw data:
> 
> ..  |   | G4 |   | S390 |   | POWER9 |   | riscv64 |  
> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> activeMethodCount | groupCount | us/op | ± | us/op | ± | us/op | ± | us/op | ±
> 128 | 1 | 11.972 | 0.004 | 21.577 | 0.042 | 27.585 | 0.749 | 108.109 | 0.669
> 128 | 32 | 13.622 | 0.092 | 24.762 | 0.149 | 34.682 | 2.468 | 107.09 | 0.507
> 128 | 48 | 13.217 | 0.072 | 25.094 | 0.014 | 35.657 | 0.913 | 108.862 | 0.43
> 128 | 64 | 13.668 | 0.04 | 25.581 | 0.056 | 34.857 | 0.841 | 109.416 | 0.258
> 128 | 80 | 13.986 | 0.127 | 25.74 | 0.071 | 36.264 | 0.873 | 110.196 | 0.29
> 128 | 96 | 14.594 | 0.055 | 26.033 | 0.058 | 36.734 | 0.672 | 111.411 | 0.602
> 128 | 112 | 14.77 | 0.078 | 27.594 | 0.033 | 36.482 | 1.513 | 112.238 | 0.6
> 256 | 1 | 23.998 | 0.019 | 45.146 | 0.131 | 68.831 | 1.058 | 224.967 | 0.392
> 256 | 32 | 26.273 | 0.036 | 52.402 | 0.038 | 71.686 | 4.776 | 217.511 | 1.667
> 256 | 48 | 26.61 | 0.063 | 52.949 | 0.317 | 70.867 | 2.41 | 220.549 | 0.41
> 256 | 64 | 26.959 | 0.085 | 53.824 | 0.367 | 72.771 | 1.423 | 220.805 | 0.952
> 256 | 80 | 27.646 | 0.089 | 53.927 | 1.035 | 73.949 | 2.102 | 220.814 | 0.498
> 256 | 96 | 27.829 | 0.128 | 54.665 | 0.029 | 75.791 | 3.527 | 222.571 | 0.875
> 256 | 112 | 28.298 | 0.064 | 53.902 | 0.237 | 75.996 | 3.266 | 224.626 | 1.752
> 512 | 1 | 48.181 | 0.032 | 88.372 | 0.299 | 147.922 | 7.862 | 487.557 | 1.454
> 512 | 32 | 53.157 | 0.044 | 108.089 | 0.124 | 151.998 | 3.999 | 462.369 | 0.917
> 512 | 48 | 55.13 | 0.052 | 109.149 | 0.77 | 160.646 | 28.419 | 456.265 | 1.198
> 512 | 64 | 56.609 | 0.123 | 110.346 | 0.729 | 158.9 | 16.885 | 464.728 | 3.811
> 512 | 80 | 57.146 | 0.091 | 110.808 | 0.295 | 157.446 | 11.494 | 454.655 | 4.941
> 512 | 96 | 59.038 | 0.092 | 111.117 | 0.101 | 154.412 | 5.113 | 465.095 | 1.281
> 512 | 112 | 60.647 | 0.331 | 110.216 | 0.153 | 155.93 | 9.2 | 489.859 | 0.988
> 768 | 1 | 77.086 | 0.402 | 139.595 | 0.839 | 191.497 | 6.112 | 1998.335 | 5729.012
> 768 | 32 | 89.599 | 0.14 | 159.535 | 0.816 | 230.192 | 2.105 | 1663.619 | 5404.593
> 768 | 48 | 94.312 | 0.33 | 164.865 | 0.493 | 234.917 | 12.344 | 1737.604 | 5687.615
> 768 | 64 | 94.243 | 0.218 | 166.708 | 0.498 | 234.764 | 10.555 | 1717.1 | 5527.53
> 768 | 8...

Hi @bulasevich,
Thank you for the data very much. They are very useful.
I am planning to add some changes to the benchmark:
1. Address Vladimir's comments about having different size nmethods.
2. Add calling of methods without reflection: static calls, vtable calls, itable calls.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2737596648