RFC: improving NMethod code locality in CodeCache

Fri Dec 3 18:44:31 UTC 2021

Regionalized code cache. Excellent idea.

Fwiw, I implemented one as part of a dynamic binary translator (x86 -> sparc) at Sun, see https://old.hotchips.org/wp-content/uploads/hc_archives/hc08/2_Mon/HC8.S2/HC8.2.1.pdf, slide 29. See also my comment on https://bugs.openjdk.java.net/browse/JDK-8015774.

Worked well in the binary translator context. It threw out the oldest code when full, even if the old code was hot, under the assumption that it would be quickly recompiled. We probably don't want to do that though.

Thanks,
Paul

-----Original Message-----
From: hotspot-dev <hotspot-dev-retn at openjdk.java.net> on behalf of Nils Eliasson <nils.eliasson at oracle.com>
Date: Friday, December 3, 2021 at 2:39 AM
To: "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
Subject: Re: RFC: improving NMethod code locality in CodeCache

*Hi Evgeny,

In the context of this it might be a good time to also reconsider the
code cache structure and evaluate other designs that might be a better
fit with separate metadata.

The current code cache has three separate heaps of fixed sized for three
different kinds of code blobs - profiled, nonprofiled and adapters.
There are some flexibility that allows the heaps to overflow into the
other heaps in low memory situations.

What is desirable for the futures is to keep the separation of different
kinds of data, but have more granularity, and perhaps different
granularity on different platforms. There must be flexibility in the
separation so that different parts can grow on demand. It would also be
nice to be able to support uncommit so that the code cache can shrink.

One such design could be to have a code cache that consists of blocks.
One block is a continuous part of memory that holds a specific type of
contents, like c2 nmethods. Blocks can be allocated on demand, and
blocks can be adapted to fit nicely with TLB page sizes. An empty block
could be deallocated and uncommitted if desired. On x86 code and stubs
could be kept together, and metadata kept in a separate block. On Aarch
stubs might have its own block type.

Blocks could also be of different sizes. Adapter blocks might be of a
small size so that less memory is wasted, while blocks for profiled code
initially get a big block.

This scheme would also hopefully avoid some of the sizing problems that
the current code cache has. We wouldn't need to guess the need of a
specific type - it would be allocated on demand. It would be easy to
experiment with different kinds of divisions of data. Just add a new
block type.

This solution would also facilitate more granular locking of the code
cache where allocation or traversal of different blocks can be done
independently.

What do you think?

Best regards,

Nils

*

On 2021-11-23 18:34, Astigeevich, Evgeny wrote:
> Hello,
>
> We’d like to discuss a proposal for improving NMethod code locality in CodeCache.
>
> We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.
>
> The current NMethod layout is continuous and consists of the following sections:
> * Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ‘sizeof(NMethod)’. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes.
> * Relocation
> * Constant pool
> * Instructions (main code)
> * Stub code
> * Oops
> * Metadata: Class related metadata
> * Scopes data: Debugging information
> * Scopes pcs: Debugging information
> * Dependencies
> * Handler table: Exception handler table
> * Nul chk table: Implicit Null Pointer exception table
> * Speculations
> * JVMCI data
>
> We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ‘XX:+LogCompilation’.
> Summary of results for jdk17 with tiered compilation:
> * DaCapo:
>      * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv):
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 152     | 5215       | 916       |
> | Total size - bytes  | 271,576 | 38,367,872 | 4,072,616 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 4.7%  | 19.3% | 8.0%   |
> | consts        | 0.0%  | 0.1%  | 0.0%   |
> | instrs        | 39.7% | 49.7% | 44.5%  |
> | stub code     | 8.9%  | 11.3% | 10.1%  |
> | oops          | 0.2%  | 0.4%  | 0.3%   |
> | metadata      | 2.0%  | 3.0%  | 2.3%   |
> | scopes data   | 12.2% | 18.6% | 15.9%  |
> | scopes pcs    | 7.8%  | 9.0%  | 8.4%   |
> | deps          | 0.3%  | 0.8%  | 0.5%   |
> | handler table | 1.3%  | 3.3%  | 2.1%   |
> | nul_chk table | 1.0%  | 1.6%  | 1.6%   |
> +---------------+-------+-------+--------+
>
>      * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv):
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 155     | 5135       | 889       |
> | Total size - bytes  | 264,800 | 35,026,312 | 3,985,744 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 5.2%  | 20.6% | 8.3%   |
> | consts        | 0.0%  | 0.6%  | 0.1%   |
> | instrs        | 49.2% | 60.7% | 55.3%  |
> | stub code     | 1.1%  | 1.9%  | 1.4%   |
> | oops          | 0.1%  | 0.3%  | 0.2%   |
> | metadata      | 1.6%  | 2.9%  | 2.0%   |
> | scopes data   | 12.2% | 19.6% | 16.8%  |
> | scopes pcs    | 7.8%  | 9.2%  | 8.5%   |
> | deps          | 0.3%  | 0.8%  | 0.5%   |
> | handler table | 1.5%  | 3.5%  | 2.0%   |
> | nul_chk table | 0.9%  | 1.6%  | 1.1%   |
> +---------------+-------+-------+--------+
>
> * Renaissance
>      * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv):
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 155     | 7447       | 1198      |
> | Total size - bytes  | 366,248 | 52,840,528 | 4,989,392 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 4.8%  | 14.6% | 8.5%   |
> | consts        | 0.0%  | 0.1%  | 0.0%   |
> | instrs        | 35.7% | 45.6% | 42.8%  |
> | stub code     | 8.3%  | 12.0% | 10.1%  |
> | oops          | 0.2%  | 0.6%  | 0.4%   |
> | metadata      | 2.0%  | 4.1%  | 3.0%   |
> | scopes data   | 12.4% | 20.8% | 16.1%  |
> | scopes pcs    | 7.8%  | 8.9%  | 8.4%   |
> | deps          | 0.4%  | 1.0%  | 0.5%   |
> | handler table | 1.2%  | 3.9%  | 2.4%   |
> | nul_chk table | 0.9%  | 1.3%  | 1.1%   |
> +---------------+-------+-------+--------+
>
>      * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv):
>
> +---------------------+---------+------------+-----------+
> |                     |   min   |   max      |   median  |
> +---------------------+---------+------------+-----------+
> | C2 nmethods         | 158     | 7242       | 938       |
> | Total size - bytes  | 354,952 | 47,019,560 | 3,791,764 |
> +---------------------+---------+------------+-----------+
>
> Proportion of the total size of a section vs C2 nmethods total size
>
> +---------------+-------+-------+--------+
> |    Section    |  min  |  max  | median |
> +---------------+-------+-------+--------+
> | header        | 5.4%  | 15.7% | 9.7%   |
> | consts        | 0.0%  | 0.1%  | 0.0%   |
> | instrs        | 46.1% | 54.4% | 52.7%  |
> | stub code     | 1.3%  | 1.9%  | 1.4%   |
> | oops          | 0.2%  | 0.5%  | 0.3%   |
> | metadata      | 1.9%  | 3.4%  | 2.6%   |
> | scopes data   | 12.7% | 23.6% | 17.4%  |
> | scopes pcs    | 8.0%  | 9.4%  | 8.6%   |
> | deps          | 0.4%  | 1.0%  | 0.5%   |
> | handler table | 1.3%  | 4.0%  | 2.5%   |
> | nul_chk table | 1.0%  | 1.4%  | 1.2%   |
> +---------------+-------+-------+--------+
>
> The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.
>
> We’d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) “Support non-continuous CodeBlobs in HotSpot”, NMethod sections can be located in different places of memory. The discussion of it:https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.
>
> There is JDK-7072317 “move metadata from CodeCache” (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under.
>
> There can be different approaches for the implementation:
>
> 1. What to separate:
>      a. All code (main plus stub) from other sections.
>      b. Or only main code because this is the code where an application should spend most of the time.
>      c. Or the header and scope sections.
> 2. Where to put:
>      a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
>      b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
>      c.  Or in a completely different place (C-heap, Metaspace,...)
>
> It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.
>
> We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.
>
> Comments welcome!
>
> Thanks,
> Evgeny Astigeevich, AWS Corretto Team
>
>
>
>
> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
>
>