[External] : RFC: improving NMethod code locality in CodeCache
Schmidt, Lutz
lutz.schmidt at sap.com
Mon Nov 29 12:19:09 UTC 2021
Hi,
a few thoughts immediately popped up when reading Evgeny's RFC and Tobias' comments. If my comments seem influenced by s390x - that might well be. It's the architecture I know best.
- The biggest concern I have relates to pc-relative addressing.
o nmethod constants are currently located next to the instruction section.
Putting them into a separately allocated area may break the pc-relative limit.
s390x limit: +/- 4GB, no fallback implemented.
o relative branches either are
+ short distance, mostly intra-nmethod
+ long distance, mostly inter-nmethod
+ not possible in general, e.g., runtime calls
The branch optimization (in shorten_branches) might less often be possible.
One example would be if stub code is moved to a separately allocated area.
- When considering performance, it is beneficial to have data which is being
patched (frequently) separated from the instruction stream.
s390x: never modify data in a cache line where instructions are fetched from.
That will kill your performance big time.
- I'm not a branch prediction expert. Instruction stream compactness may have an
influence if the prediction engine not only remembers the branch direction, but
the (limited length) distance as well.
Thanks,
Lutz
On 29.11.21, 10:03, "hotspot-dev on behalf of Tobias Hartmann" <hotspot-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:
Hi Evgeny,
Thanks for sharing these results and starting the discussion.
Some comments below.
On 23.11.21 18:34, Astigeevich, Evgeny wrote:
> We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded.
Is it really a problem with branch prediction or more with instruction caching? With the current
implementation, the hot instructions of a single nmethod are already contiguous but different
nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the
metadata will improve locality but does that really have an effect on branch prediction?
Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)?
> The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger.
>
> We’d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data.
It would definitely be nice to have this as an option (rather than replacing the current
implementation) but I wonder how feasible it is. There is lots of code that depends on the current
layout and we would need to make all of that dependent on a flag.
> According to the fixed JDK-8152664 (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8152664&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0j0bCjbCv7AQH1uULiERMIcfUWaTWzh%2FIJbKuMO70Ow%3D&reserved=0) “Support non-continuous CodeBlobs in HotSpot”, NMethod sections can be located in different places of memory. The discussion of it: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-dev%2F2016-April%2F022500.html&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4bXS2plxpknWzKwY9qdJl%2BTGEHiwV1LgMnIkHGwkG8A%3D&reserved=0. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released.
Ever since I finished the implementation of the Segmented Code Cache
(https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F197&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ylfS6p71bpm7XmNRfG0vjSw6ZqRPOoJvSRujzYkQz8g%3D&reserved=0), I wanted to work on this but never got to it. I think that the
additional complexity in the code cache is worth it but of course that has to be proven by a
performance evaluation.
For reference, here's my old thesis and the paper we published back then:
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-Code_Cache_Optimizations-thesis.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8KgOtwbSULPN%2FlUz10%2B9itGl%2Fmmvm6bV4y6D%2BcsT%2Bu4%3D&reserved=0
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-PPPJ-Efficient_Code_Cache_Management.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gDYHJdpnK1%2FgcxDGZsYJ0X0Ku%2BIwS9KWrk8ggSfUVt0%3D&reserved=0
> There is JDK-7072317 “move metadata from CodeCache” (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7072317&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=p6sjPC9HXMlydHk5mi4DlQh2ZOG4MYvcLte%2FAz%2B3ZbU%3D&reserved=0) which the implementation works can be done under.
Yes, that makes sense.
> There can be different approaches for the implementation:
>
> 1. What to separate:
> a. All code (main plus stub) from other sections.
> b. Or only main code because this is the code where an application should spend most of the time.
> c. Or the header and scope sections.
I would say that from a performance perspective, only the main code matters because the stubs are
used for slow paths. If it simplifies prototyping, I would go with b) first.
> 2. Where to put:
> a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin.
> b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset.
> c. Or in a completely different place (C-heap, Metaspace,...)
It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality
between different nmethods.
Solution b) would only improve code locality in the same nmethod but the overall layout of
executable code in the code cache would still be sparse.
I think c) would be the ideal solution: The code cache would only contain executable code and all
the metadata would be somewhere else. But solution a) would lead to the same layout and might be
easier to implement.
> It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property.
Yes, that is a concern. A thorough performance evaluation is required.
> We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317.
Hope that helps. I'm curious what others think.
Best regards,
Tobias
More information about the hotspot-dev
mailing list