[External] : RFD: Grouping hot code in CodeCache

Tue Mar 11 19:23:14 UTC 2025

On 3/10/25 3:55 PM, Astigeevich, Evgeny wrote:
> Hi Vladimir,
> 
>> I don't like manual part of this - providing list of hot methods which
>> should be collocated.
> 
> It looks like I was not clear in my first email and miscommunication happened.
> I am sorry. I provided it to share what we tried and what lessons we learned, especially how it is complicated.
> We have no intent to upstream list-based solutions.

Okay, NP.

But do you still want to continue work on next RFE and linked sub-RFEs?:

https://bugs.openjdk.org/browse/JDK-8326205

Please, clarify which ones you want to upstream?

> 
>> Sometime ago we had concept of Code
> 
> Thank for sharing. If I remember correctly it uses deoptimization to remove aging code which means recompilation.
> 
> BTW, I found https://openjdk.org/jeps/8350338 " Cooperative JFR Sampling".
> I see it has things we want in our implementation.

Yes, it started moving.

Thanks,
Vladimir K

> 
> Thanks,
> Evgeny
> 
> On 08/03/2025, 02:03, "Vladimir Kozlov" <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
> 
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> 
> 
> 
> On 3/7/25 3:47 PM, Astigeevich, Evgeny wrote:
>> Hi Vladimir,
>>
>> Thank you for the feedback.
>>
>>> My concern is that it will complicate VM existing code for not
>>> significant benefits in real production environment.
> 
> 
> To clarify.
> 
> 
> I don't like manual part of this - providing list of hot methods which
> should be collocated.
> 
> 
> I am fine to have special segment for special C2 compiled code. We will
> have one for some AOT code in Leyden.
> 
> 
> Move code in CodeCache to make it more dense is also fine.
> 
> 
>>
>> I think it won't complicate the existing code:
>> - Adding a code heap is ~50 lines of code, mostly in CodeCache::initialize_heaps.
>> - Relocating nmethods, according to PR[1], is ~300 lines of code.
>> - A grouping thread is simple and isolated. It will go through Java threads checking their last frame(s) and recording seen nmethods. It should have less code than the sweeper which was ~700 lines.
>>
>> I think it's better to wait for PoC to see the complexity.
>>
>>> What improvements your experiments in real production runs shows? And
>>> which JDK version you used for that?
>>
>> In production we are using internally 17 (static lists of methods) and 21 (dynamic lists of methods).
>> Improvements are in a range of 5% - 15%. They depend on how big CPU load is: the more CPU load the bigger improvement.
> 
> 
> Good.
> 
> 
>>
>>> As you know most of nmethod's metadata is moved from CodeCache.
>>> ...
>>> After that the code will be a lot more compact in CodeCache. Code sparsity
>>> should be less issue then.
>>
>> Yes, removing non-code from nmethod will improve code density. This means in a code region we will have more code vs non-code.
>> CPU instruction caches will like this.
>>
>> As I wrote in a comment to benchmark PR [2], Neoverse operates in code regions. For Neoverse it's more important to have as less code regions with active nmethods as possible.
>>
>> We are aware of cases when CodeCache usage is between 512M - 1G. The mentioned changes won't help in those cases.
>> If I remember, no public benchmarks have demonstrated improvements from non-code moved away from nmethod.
>>
>> Since the removal of the Sweeper, GC is in charge of cleaning CodeCache. We've seen cases when GC was triggered often because of allocation pressure on CodeCache.
>> For such cases, a recommended workaround is to increase the size of CodeCache from default 240M up to 512M. In such cases actively used nmethods will more likely be sparse.
> 
> 
> Hmm, may be we should restore counters decay for this case to prevent
> warm methods from compiling and polluting CodeCache and keep it small.
> 
> 
>>
>>> It would be nice if you redo your production experiments after that.
>>
>> Due to the complexity of customer's application we cannot run it on OpenJDKTip. It has thousand dependencies. We will need to move them on OpenJDKTip.
>> I think it would be difficult to backport the mentioned changes to 21
> 
> 
> Understood.
> 
> 
>>
>>> I understand that we can still have sparsity due to "warm" nmethods and
>>> C1 compiled code mixed with "hot" C2 nmethods.
>>
>> Customers having issues with big CodeCache on Graviton usually turn off tiered compilation to reduce far jumps/calls. BTW, this is another argument for identifying active nmethods and grouping them together: it should reduce/eliminate far jumps/calls.
>> With small CodeCache mix of C1 and C2 nmethods is not an issue.
>>
>>> Can we simply use a separate CodeCache's segment for all
>>> C2 "hot" (we can specify frequency flag to determine what "hot" means)
>>> methods regardless when they are compiled.
>>
>> I did not get the idea. We already have the non-profiled segment where C2 code is put. Do you mean that at the compilation time some methods are put in the regular non-profile segment and some in the specific non-profile segment?
> 
> 
> Yes, I meant separate segments for hot and warm methods, both are c2
> compiled code.
> 
> 
> It would still mix all 3 cases you pointed because compilation policy
> based mostly on what happened during startup. So it may be not good idea.
> 
> 
>> What we've seen that methods profiles keep changing.
>> There are the following cases:
>> 1. C2 methods used most of the time: their profile can stay the same or can get hotter.
>> 2. C2 methods used periodically: actively used, not used, actively used and so on
>> 3. C2 methods used actively during some time and never used after
>>
>> Currently GC identifies cases #3 and some cases #2, aka cold code. The percentage of methods case #1 is ~10% - 20%.
>> If we have 100M of C2 code, only 10M - 20M will be actively used. If we get unlucky, those 10M-20M could be spread across CodeCache and cause CPU stalls.
>> How can we identify those 10%-20% of methods at compilation time?
> 
> 
> I agree that it will be hard to determine that during compilation.
> We need some statistic after we compiled to find such methods.
> 
> 
> Sometime ago we had concept of Code Aging (removed after Sweeper was
> removed):
> https://urldefense.com/v3/__https://github.com/vnkozlov/jdk17u-dev/commit/54db2c2d612c573f91f69b7b387b43a8e1c9d563__;!!ACWV5N9M2RV99hQ!O67hmASdRidjl2V1_KDN8iqwvBiKycfefSp1XhUOPa_AWGAFwGDX_ojltPiZzV392Cn8T0t-le93_YbXQ4nkfN8$  <https://urldefense.com/v3/__https://github.com/vnkozlov/jdk17u-dev/commit/54db2c2d612c573f91f69b7b387b43a8e1c9d563__;!!ACWV5N9M2RV99hQ!O67hmASdRidjl2V1_KDN8iqwvBiKycfefSp1XhUOPa_AWGAFwGDX_ojltPiZzV392Cn8T0t-le93_YbXQ4nkfN8$ >
> 
> 
> It added counter on nmethod entry to keep track if it is alive. We can
> use something similar to track how frequently nmethod is used.
> 
> 
> Erik Osterlund also had prototype in Leyden for call stack profiling by
> VM itself to find most used hot methods during training run.
> 
> 
> Thanks,
> Vladimir.
> 
> 
>>
>> BTW, I think the separate hot code heap might simplify flushing cold code. Everything not in the hot code heap can automatically assumed cold.
>>
>> Thanks,
>> Evgeny
>>
>> [1]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23573__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBh_vDxrlg$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23573__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBh_vDxrlg$>
>> [2]: https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23831 <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/23831>*issuecomment-2705085399__;Iw!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhJyHApiM$
>>
>> On 06/03/2025, 22:41, "Vladimir Kozlov" <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com> <mailto:vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>> wrote:
>>
>>
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>>
>>
>>
>> Hi Evgeny,
>>
>>
>> My concern is that it will complicate VM existing code for not
>> significant benefits in real production environment.
>>
>>
>> What improvements your experiments in real production runs shows? And
>> which JDK version you used for that?
>>
>>
>> As you know most of nmethod's metadata is moved from CodeCache. And
>> Boris Ulasevich will move the final part (relocation info) soon. After
>> that the code will be a lot more compact in CodeCache. Code sparsity
>> should be less issue then.
>>
>>
>> It would be nice if you redo your production experiments after that.
>>
>>
>> I understand that we can still have sparsity due to "warm" nmethods and
>> C1 compiled code mixed with "hot" C2 nmethods. I think compilation
>> policy has heuristic to detect "warm" method (time intervals between
>> invocations). Can we simply use a separate CodeCache's segment for all
>> C2 "hot" (we can specify frequency flag to determine what "hot" means)
>> methods regardless when they are compiled. Then you don't need to create
>> list or do anything special for them. Most likely we will waste more
>> space in CodeCache but it could be conditional under flag which you
>> already proposed in separate segment RFE.
>>
>>
>> Thanks,
>> Vladimir K
>>
>>
>> On 3/5/25 10:41 AM, Astigeevich, Evgeny wrote:
>>> Hi Vladimir,
>>>
>>> This is JDK-8326205: Implement grouping hot nmethods in CodeCache.
>>>> As I managed to synthesize a benchmark
>> (https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$> >
>>> pull/23831 <https://urldefense.com/v3/__https://github.com/openjdk/jdk/ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/>>
>>> pull/23831__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI-
>>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb-
>>> imnfmmpfw$>) to demonstrate performance impact of sparse code, I’d like
>>> to discuss a possible solution of the sparse code.
>>>
>>> High level, a solution is:
>>>
>>> * Detect hot code.
>>> * Group hot code.
>>> * Maintain grouped code.
>>>
>>> Downstream we tried two approaches:
>>>
>>> * *Static lists of methods (compile command):* Identify frequently
>>> used (hot) methods using test runs and provide static method lists
>>> to JVM in production. When JVM compiles a Java method and the method
>>> is on the list, JVM puts the code into to a designated code heap
>>> (HotCodeHeap).
>>> * *Dynamic lists of methods (compiler directives):* Profile an
>>> application in production and dynamically relocate identified hot
>>> methods to HotCodeHeap. Relocation was implemented with recompilation.
>>>
>>> The main advantage of static lists is zero profiling overhead in
>>> production. We do all profiling and analysis in test runs. Its problems are:
>>>
>>> * *Training Run Accuracy*: We need training runs to have execution
>>> paths closely mimicking production environments. Otherwise we put
>>> wrong methods into HotCodeHeap.
>>> * *Method List Maintenance:* We need to rerun training to regenerate
>>> lists when application code changes. Training runs are expensive and
>>> time-consuming. They require long runs to guarantee we see all major
>>> execution paths. Updating lists in production can be as complex as
>>> application deployment
>>> * *Method Placement Limitations:* Methods marked for HotCodeHeap are
>>> permanently placed into HotCodeHeap. No mechanism to remove methods
>>> that become less frequently used.
>>>
>>> We addressed these problems with dynamic lists of methods. We
>>> implemented a Java agent that runs within the same JVM to dynamically
>>> detect and manage hot Java methods without prior method identification.
>>> The agent detects hot methods using JFR. The agent manages hot Java
>>> methods in HotCodeHeap with compiler directives. A new compiler
>>> directive marks methods with dynamic states ("hot" or "cold"). Methods
>>> marked by the “hot” state are recompiled and placed in HotCodeHeap.
>>> Methods marked by the “cold” state are eventually removed from HotCodeHeap.
>>>
>>> Problems of this approach are:
>>>
>>> * It requires specific, complex modifications to compiler directive
>>> support: recompilation of Java methods affected by compiler
>>> directives changes. This functionality is unique to Java agent
>>> implementation and has limited potential for broader use.
>>> * The agent cannot guarantee Java methods are moved to/removed from
>>> the HotCodeHeap because updates of compiler directives can fail.
>>> * The agent knows nothing about compiled code, e.g. whether it’s C1 or
>>> C2 compiled, code size, profile. This data can useful for deciding
>>> to move or not to move to HotCodeHeap.
>>> * Recompilations, especially C2, are expensive. Having many of them
>>> can cause performance issues. Also recompiled code might differ from
>>> the code we have detected as “hot”.
>>>
>>> Running these two approaches in production we learned:
>>>
>>> * We detect 95% of actively used methods withing the first 30 minutes
>>> of an application run. This is with JFR profiling configured: 90
>>> seconds session duration, sampling each 11 ms, 8 minutes between
>>> profiling sessions. We can find actively used methods faster if we
>>> reduce a pause between profiling sessions and sampling period.
>>> However it will increase the profiling overhead and affect
>>> application performance. With the current configuration, the
>>> profiling overhead is between 1% - 2%.
>>> * A set of actively used methods gets into the steady state (no new
>>> methods added to, no methods removed from) within the first 60 minutes.
>>> * Static lists, when created from runs close to production, have 80% -
>>> 90% methods always in use. This does not change over time.
>>> * Predicting the size of HotCodeHeap is difficult, especially with
>>> dynamic lists.
>>>
>>> We want to have grouping of hot method functionality as a part Hotspot
>>> JVM. We will group only C2 compiled methods. We can group JVMCI compiled
>>> methods, e.g. Graal, if needed. We need profiling precise enough to
>>> detect major Java methods. Low overhead is more important than precision.
>>>
>>> We think we can have a solution which does not require a lot of code:
>>>
>>> * Detect hot code: we can an implementation based on the Sweeper:
>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$> <https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhMWPimyo$> >
>>> runtime/sweeper.hpp <https://urldefense.com/v3/__https://github.com/ <https://urldefense.com/v3/__https://github.com/> <https://urldefense.com/v3/__https://github.com/> <https://urldefense.com/v3/__https://github.com/>>
>>> openjdk/jdk17u/blob/master/src/hotspot/share/runtime/
>>> sweeper.hpp__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI-
>>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb-
>>> imVr_axpo$>. We will use the handshakes mechanism, what the Sweeper
>>> used, to detect nmethods on the top of thread stacks.
>>> * Group hot code: we have a draft PR https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/__;!!ACWV5N9M2RV99hQ!NJUmKWpP8JXmMyj5OF0BoB4BguL0Vn4exg56Cep3V0gmpKW5-YDHnVzmnlL3dDpdxeCRooZpOzi67dBhX6EVoYM$> >
>>> pull/23573 <https://urldefense.com/v3/__https://github.com/openjdk/ <https://urldefense.com/v3/__https://github.com/openjdk/> <https://urldefense.com/v3/__https://github.com/openjdk/> <https://urldefense.com/v3/__https://github.com/openjdk/>>
>>> jdk/pull/23573__;!!ACWV5N9M2RV99hQ!OwHez5zoUshzI-
>>> baNlMChYzivbqU97PyvY08f_b1wH7Vd1hrqnwarTHE0Ha9IwOIOFw9jwE6gthfb-
>>> imcL9xtiE$>. It implements relocation of nmethods within CodeCache.
>>> * Maintain grouped code: we will add an additional code heap where hot
>>> nmethods will be relocated to.
>>>
>>> What do you think about this approach? Are there other possible solutions?
>>>
>>> Thanks,
>>>
>>> Evgeny A.
>>>
>>>
>>>
>>>
>>> Amazon Development Centre (London) Ltd.Registered in England and Wales
>>> with registration number 04543232 with its registered office at 1
>>> Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
>>
>>
> 
> 
> 
> 
> 
> 
> 
> 
> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
> 
>