[External] : RFC: improving NMethod code locality in CodeCache

Mon Nov 29 09:01:05 UTC 2021

Hi Evgeny,

Thanks for sharing these results and starting the discussion.

Some comments below.

On 23.11.21 18:34, Astigeevich, Evgeny wrote:
> We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.

Is it really a problem with branch prediction or more with instruction caching? With the current
implementation, the hot instructions of a single nmethod are already contiguous but different
nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the
metadata will improve locality but does that really have an effect on branch prediction?

Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)?

> The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.
> 
> We’d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data.

It would definitely be nice to have this as an option (rather than replacing the current
implementation) but I wonder how feasible it is. There is lots of code that depends on the current
layout and we would need to make all of that dependent on a flag.

> According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) “Support non-continuous CodeBlobs in HotSpot”, NMethod sections can be located in different places of memory. The discussion of it: https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.

Ever since I finished the implementation of the Segmented Code Cache
(https://openjdk.java.net/jeps/197), I wanted to work on this but never got to it. I think that the
additional complexity in the code cache is worth it but of course that has to be proven by a
performance evaluation.

For reference, here's my old thesis and the paper we published back then:
http://cr.openjdk.java.net/~thartmann/papers/2014-Code_Cache_Optimizations-thesis.pdf
http://cr.openjdk.java.net/~thartmann/papers/2014-PPPJ-Efficient_Code_Cache_Management.pdf

> There is JDK-7072317 “move metadata from CodeCache” (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under.

Yes, that makes sense.

> There can be different approaches for the implementation:
> 
> 1. What to separate:
>     a. All code (main plus stub) from other sections.
>     b. Or only main code because this is the code where an application should spend most of the time.
>     c. Or the header and scope sections.

I would say that from a performance perspective, only the main code matters because the stubs are
used for slow paths. If it simplifies prototyping, I would go with b) first.

> 2. Where to put:
>     a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
>     b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
>     c.  Or in a completely different place (C-heap, Metaspace,...)

It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality
between different nmethods.

Solution b) would only improve code locality in the same nmethod but the overall layout of
executable code in the code cache would still be sparse.

I think c) would be the ideal solution: The code cache would only contain executable code and all
the metadata would be somewhere else. But solution a) would lead to the same layout and might be
easier to implement.

> It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.

Yes, that is a concern. A thorough performance evaluation is required.

> We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.

Hope that helps. I'm curious what others think.

Best regards,
Tobias