RFC: improving NMethod code locality in CodeCache

Boris Ulasevich boris.ulasevich at bell-sw.com
Thu Dec 23 15:58:25 UTC 2021


Hi Evgeny,

Thank you for sharing the data. It is very detailed and well structured. 
It is indeed interesting that the code itself takes ~1/2 of the volume 
and sometimes even less. So judging from the numbers, we can 
(theoretically) double the code dencity. I agree that it is worth doing.

You say [1] that branch prediction hardware can become overloaded in the 
case of 15K compiled methods. In your numbers, I see the maxium is 7K 
methods ~ 50MB (on Renaissance benchmark). This is quite a load, yes. 
Also on aws-graviton-getting-started link [2] we see that the 
recommended CodeCacheSize value is 64M - more than that makes a 
performance impact. These cases may be also different by the contents of 
the code cache: I guess it's tiered compilation in benchmarks and 
non-tiered C2 in [2].

My questions are
- What is the typical CodeCache size for real-world applications? Is it 
common for CodeCache get hundreds of megabytes? Can it be simulated with 
benchmarks?
- I am not sure that branch predictors are often limited to a certain 
amount of memory, which is much less than the possible size of the code. 
There are now 3 generations of AWS Graviton HW. Do you observe same 
branch prediction and code cache size effects on all three?
- What does maximum CodeCache limit mean, is this distance from the 
first method to the last? Will it help if C2 put the metatadata and 
things to the next page after the instructions page? I mean it worth 
putting them not too far from each other.

Besides code density issue in case of a limited CodeCache size (either a 
small amount of memory or a limitation of branch predictor) I believe it 
makes sence to work with Sweaper so that it removes cold methods 
actively from the CodeCache (see the Hotness Code picture on Page 65, 
[3]). After the virtual machine warms up, the compiler threads are idle 
anyway. In general a GC-like approach can be applied to the CodeCache to 
make it clean, small and hot.

thanks,
Boris

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-November/056198.html
[2] https://github.com/aws/aws-graviton-getting-started/blob/main/java.md
[3] 
http://cr.openjdk.java.net/~thartmann/papers/2014-Code_Cache_Optimizations-thesis.pdf 


11/23/2021 8:34 PM, Astigeevich, Evgeny пишет:
> Hello,
>   
> We’d like to discuss a proposal for improving NMethod code locality in CodeCache.
>
> We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.
>
> The current NMethod layout is continuous and consists of the following sections:
> * Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ‘sizeof(NMethod)’. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes.
> * Relocation
> * Constant pool
> * Instructions (main code)
> * Stub code
> * Oops
> * Metadata: Class related metadata
> * Scopes data: Debugging information
> * Scopes pcs: Debugging information
> * Dependencies
> * Handler table: Exception handler table
> * Nul chk table: Implicit Null Pointer exception table
> * Speculations
> * JVMCI data
>
> We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ‘XX:+LogCompilation’.
> Summary of results for jdk17 with tiered compilation:
> * DaCapo:
>      * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv):
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 152     | 5215       | 916       |
> | Total size - bytes  | 271,576 | 38,367,872 | 4,072,616 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 4.7%  | 19.3% | 8.0%   |
> | consts        | 0.0%  | 0.1%  | 0.0%   |
> | instrs        | 39.7% | 49.7% | 44.5%  |
> | stub code     | 8.9%  | 11.3% | 10.1%  |
> | oops          | 0.2%  | 0.4%  | 0.3%   |
> | metadata      | 2.0%  | 3.0%  | 2.3%   |
> | scopes data   | 12.2% | 18.6% | 15.9%  |
> | scopes pcs    | 7.8%  | 9.0%  | 8.4%   |
> | deps          | 0.3%  | 0.8%  | 0.5%   |
> | handler table | 1.3%  | 3.3%  | 2.1%   |
> | nul_chk table | 1.0%  | 1.6%  | 1.6%   |
> +---------------+-------+-------+--------+
>
>      * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv):
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 155     | 5135       | 889       |
> | Total size - bytes  | 264,800 | 35,026,312 | 3,985,744 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 5.2%  | 20.6% | 8.3%   |
> | consts        | 0.0%  | 0.6%  | 0.1%   |
> | instrs        | 49.2% | 60.7% | 55.3%  |
> | stub code     | 1.1%  | 1.9%  | 1.4%   |
> | oops          | 0.1%  | 0.3%  | 0.2%   |
> | metadata      | 1.6%  | 2.9%  | 2.0%   |
> | scopes data   | 12.2% | 19.6% | 16.8%  |
> | scopes pcs    | 7.8%  | 9.2%  | 8.5%   |
> | deps          | 0.3%  | 0.8%  | 0.5%   |
> | handler table | 1.5%  | 3.5%  | 2.0%   |
> | nul_chk table | 0.9%  | 1.6%  | 1.1%   |
> +---------------+-------+-------+--------+
>
> * Renaissance
>      * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv):
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 155     | 7447       | 1198      |
> | Total size - bytes  | 366,248 | 52,840,528 | 4,989,392 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 4.8%  | 14.6% | 8.5%   |
> | consts        | 0.0%  | 0.1%  | 0.0%   |
> | instrs        | 35.7% | 45.6% | 42.8%  |
> | stub code     | 8.3%  | 12.0% | 10.1%  |
> | oops          | 0.2%  | 0.6%  | 0.4%   |
> | metadata      | 2.0%  | 4.1%  | 3.0%   |
> | scopes data   | 12.4% | 20.8% | 16.1%  |
> | scopes pcs    | 7.8%  | 8.9%  | 8.4%   |
> | deps          | 0.4%  | 1.0%  | 0.5%   |
> | handler table | 1.2%  | 3.9%  | 2.4%   |
> | nul_chk table | 0.9%  | 1.3%  | 1.1%   |
> +---------------+-------+-------+--------+
>
>      * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv):
>
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 158     | 7242       | 938       |
> | Total size - bytes  | 354,952 | 47,019,560 | 3,791,764 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 5.4%  | 15.7% | 9.7%   |
> | consts        | 0.0%  | 0.1%  | 0.0%   |
> | instrs        | 46.1% | 54.4% | 52.7%  |
> | stub code     | 1.3%  | 1.9%  | 1.4%   |
> | oops          | 0.2%  | 0.5%  | 0.3%   |
> | metadata      | 1.9%  | 3.4%  | 2.6%   |
> | scopes data   | 12.7% | 23.6% | 17.4%  |
> | scopes pcs    | 8.0%  | 9.4%  | 8.6%   |
> | deps          | 0.4%  | 1.0%  | 0.5%   |
> | handler table | 1.3%  | 4.0%  | 2.5%   |
> | nul_chk table | 1.0%  | 1.4%  | 1.2%   |
> +---------------+-------+-------+--------+
>
> The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.
>
> We’d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) “Support non-continuous CodeBlobs in HotSpot”, NMethod sections can be located in different places of memory. The discussion of it:https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.
>
> There is JDK-7072317 “move metadata from CodeCache” (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under.
>
> There can be different approaches for the implementation:
>
> 1. What to separate:
>      a. All code (main plus stub) from other sections.
>      b. Or only main code because this is the code where an application should spend most of the time.
>      c. Or the header and scope sections.
> 2. Where to put:
>      a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
>      b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
>      c.  Or in a completely different place (C-heap, Metaspace,...)
>
> It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.
>
> We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.
>   
> Comments welcome!
>   
> Thanks,
> Evgeny Astigeevich, AWS Corretto Team
>
>
>
>
> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
>
>


More information about the hotspot-dev mailing list