RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v2]

Tue Mar 25 02:33:10 UTC 2025

On Thu, 20 Mar 2025 13:46:06 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

>> This benchmark is used to check performance impact of the code cache being sparse.
>> 
>> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods.
>> 
>> Results: code region size 2M (2097152) bytes
>> - Intel Xeon Platinum 8259CL
>> 
>> |activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|
>> |---	|---	|---	|---	|---	|---	|---	|
>> |128	|1	|128	|19.577	|0.619	|us/op	|	|
>> |128	|32	|4	|22.968	|0.314	|us/op	|17.30%	|
>> |128	|48	|3	|22.245	|0.388	|us/op	|13.60%	|
>> |128	|64	|2	|23.874	|0.84	|us/op	|21.90%	|
>> |128	|80	|2	|23.786	|0.231	|us/op	|21.50%	|
>> |128	|96	|1	|26.224	|1.16	|us/op	|34%	|
>> |128	|112	|1	|27.028	|0.461	|us/op	|38.10%	|
>> |256	|1	|256	|47.43	|1.146	|us/op	|	|
>> |256	|32	|8	|63.962	|1.671	|us/op	|34.90%	|
>> |256	|48	|5	|63.396	|0.247	|us/op	|33.70%	|
>> |256	|64	|4	|66.604	|2.286	|us/op	|40.40%	|
>> |256	|80	|3	|59.746	|1.273	|us/op	|26%	|
>> |256	|96	|3	|63.836	|1.034	|us/op	|34.60%	|
>> |256	|112	|2	|63.538	|1.814	|us/op	|34%	|
>> |512	|1	|512	|172.731	|4.409	|us/op	|	|
>> |512	|32	|16	|206.772	|6.229	|us/op	|19.70%	|
>> |512	|48	|11	|215.275	|2.228	|us/op	|24.60%	|
>> |512	|64	|8	|212.962	|2.028	|us/op	|23.30%	|
>> |512	|80	|6	|201.335	|12.519	|us/op	|16.60%	|
>> |512	|96	|5	|198.133	|6.502	|us/op	|14.70%	|
>> |512	|112	|5	|193.739	|3.812	|us/op	|12.20%	|
>> |768	|1	|768	|325.154	|5.048	|us/op	|	|
>> |768	|32	|24	|346.298	|20.196	|us/op	|6.50%	|
>> |768	|48	|16	|350.746	|2.931	|us/op	|7.90%	|
>> |768	|64	|12	|339.445	|7.927	|us/op	|4.40%	|
>> |768	|80	|10	|347.408	|7.355	|us/op	|6.80%	|
>> |768	|96	|8	|340.983	|3.578	|us/op	|4.90%	|
>> |768	|112	|7	|353.949	|2.98	|us/op	|8.90%	|
>> |1024	|1	|1024	|368.352	|5.961	|us/op	|	|
>> |1024	|32	|32	|463.822	|6.274	|us/op	|25.90%	|
>> |1024	|48	|21	|457.674	|15.144	|us/op	|24.20%	|
>> |1024	|64	|16	|477.694	|0.986	|us/op	|29.70%	|
>> |1024	|80	|13	|484.901	|32.601	|us/op	|31.60%	|
>> |1024	|96	|11	|480.8	|27.088	|us/op	|30.50%	|
>> |1024	|112	|9	|474.416	|10.053	|us/op	|28.80%	|
>> 
>> - AArch64 Neoverse N1
>> 
>> |activeMethodCount	|groupCount	|Methods/Group	|Score	|Error	|Units	|Diff	|...
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Separate active methods and method calling them with 128Mb dummy space

There are results for different implementations of Neoverse V2. All three CPUs show similar performance degradation as sparsity increases (i.e., as groupCount grows). This seems to be a common feature of the Neoverse V2 architecture. Azure Cobalt also degrades more sharply as the number of active methods increases.

SparseCodeCache |   | G4 |   | Azure Cobalt |   | Google Axion |  
-- | -- | -- | -- | -- | -- | -- | --
activeMethodCount | groupCount | us/op | ± | us/op | ± | us/op | ±
128 | 1 | 11.972 | 0.004 | 11.092 | 0.007 | 11.201 | 0.059
128 | 32 | 13.622 | 0.092 | 15.808 | 0.779 | 11.928 | 0.013
128 | 48 | 13.217 | 0.072 | 15.937 | 0.498 | 12.126 | 0.009
128 | 64 | 13.668 | 0.04 | 16.137 | 0.517 | 12.171 | 0.139
128 | 80 | 13.986 | 0.127 | 17.681 | 0.262 | 12.525 | 0.033
128 | 96 | 14.594 | 0.055 | 18.25 | 0.795 | 12.979 | 0.051
128 | 112 | 14.77 | 0.078 | 18.529 | 1.004 | 13.129 | 0.049
256 | 1 | 23.998 | 0.019 | 22.417 | 0.006 | 22.409 | 0.003
256 | 32 | 26.273 | 0.036 | 33.329 | 0.949 | 25.097 | 0.043
256 | 48 | 26.61 | 0.063 | 34.566 | 0.343 | 24.771 | 0.118
256 | 64 | 26.959 | 0.085 | 35.953 | 0.456 | 24.443 | 0.028
256 | 80 | 27.646 | 0.089 | 38.569 | 4.495 | 25.245 | 0.027
256 | 96 | 27.829 | 0.128 | 37.749 | 0.991 | 25.536 | 0.031
256 | 112 | 28.298 | 0.064 | 40.261 | 0.155 | 25.787 | 0.016
512 | 1 | 48.181 | 0.032 | 68.768 | 0.537 | 44.863 | 0.004
512 | 32 | 53.157 | 0.044 | 94.262 | 2.801 | 50.037 | 0.038
512 | 48 | 55.13 | 0.052 | 106.928 | 3.513 | 54.611 | 0.044
512 | 64 | 56.609 | 0.123 | 103.403 | 0.708 | 53.906 | 0.039
512 | 80 | 57.146 | 0.091 | 112.929 | 2.522 | 52.923 | 0.081
512 | 96 | 59.038 | 0.092 | 141.291 | 2.346 | 56.018 | 0.054
512 | 112 | 60.647 | 0.331 | 137.491 | 11.441 | 56.705 | 0.117
768 | 1 | 77.086 | 0.402 | 138.572 | 2.444 | 68.464 | 0.056
768 | 32 | 89.599 | 0.14 | 159.353 | 4.639 | 94.478 | 1.129
768 | 48 | 94.312 | 0.33 | 177.518 | 1.728 | 99.704 | 0.131
768 | 64 | 94.243 | 0.218 | 182.263 | 2.634 | 90.027 | 0.19
768 | 80 | 95.566 | 0.068 | 185.748 | 32.128 | 96.61 | 0.157
768 | 96 | 99.435 | 0.323 | 195.603 | 13.653 | 102.222 | 0.027
768 | 112 | 105.814 | 0.366 | 216.653 | 1.694 | 103.918 | 0.497
1024 | 1 | 110.407 | 1.27 | 203.428 | 2.049 | 97.032 | 0.739
1024 | 32 | 137.626 | 1.62 | 221.029 | 22.25 | 141.785 | 1.301
1024 | 48 | 141.191 | 0.372 | 233.768 | 5.211 | 146.639 | 2.779
1024 | 64 | 141.227 | 0.238 | 255.31 | 35.069 | 139.287 | 0.376
1024 | 80 | 148.555 | 0.157 | 252.645 | 24.165 | 155.4 | 0.301
1024 | 96 | 155.47 | 0.321 | 272.952 | 3.799 | 162.416 | 3.969
1024 | 112 | 158.288 | 0.568 | 247.452 | 9.267 | 151.082 | 0.204

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2749890493