<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Just to clarify, if grouping helps, does that mean the reason for
the performance impact of sparse code is mainly due to far calls
vs near calls?</p>
<p>dl<br>
</p>
<div class="moz-cite-prefix">On 3/5/25 10:41 AM, Astigeevich, Evgeny
wrote:<br>
</div>
<blockquote type="cite" cite="mid:1B0C3138-761B-4DB0-8A98-977C6FC40178@amazon.co.uk">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style>@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ligatures:standardcontextual;
mso-fareast-language:EN-US;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-ligatures:standardcontextual;
mso-fareast-language:EN-US;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;
mso-fareast-language:EN-US;}div.WordSection1
{page:WordSection1;}ol
{margin-bottom:0cm;}ul
{margin-bottom:0cm;}</style>
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi Vladimir,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">This is JDK-8326205:
Implement grouping hot nmethods in CodeCache.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">As I managed to
synthesize a benchmark (<a href="https://github.com/openjdk/jdk/pull/23831" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/jdk/pull/23831</a>)
to demonstrate performance impact of sparse code, I’d like
to discuss a possible solution of the sparse code.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">High level, a solution
is:<o:p></o:p></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l6 level1 lfo4"><span lang="EN-US">Detect hot code.<o:p></o:p></span></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l6 level1 lfo4"><span lang="EN-US">Group hot code.<o:p></o:p></span></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l6 level1 lfo4"><span lang="EN-US">Maintain grouped code.<o:p></o:p></span></li>
</ul>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Downstream </span>we
tried two approaches:<o:p></o:p></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoNormal" style="mso-list:l7 level1 lfo2"><b>Static
lists of methods (compile command):</b> Identify
frequently used (hot) methods using test runs and provide
static method lists to JVM in production. When JVM compiles
a Java method and the method is on the list, JVM puts the
code into to a designated code heap (HotCodeHeap).<o:p></o:p></li>
<li class="MsoNormal" style="mso-list:l7 level1 lfo2"><b>Dynamic
lists of methods (compiler directives):</b> Profile an
application in production and dynamically relocate
identified hot methods to HotCodeHeap. Relocation was
implemented with recompilation.<o:p></o:p></li>
</ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The main advantage of static lists is zero
profiling overhead in production. We do all profiling and
analysis in test runs. Its problems are:<o:p></o:p></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l1 level1 lfo5"><b>Training
Run Accuracy</b>: We need training runs to have execution
paths closely mimicking production environments. Otherwise
we put wrong methods into HotCodeHeap.<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l1 level1 lfo5"><b>Method
List Maintenance:</b> We need to rerun training to
regenerate lists when application code changes. Training
runs are expensive and time-consuming. They require long
runs to guarantee we see all major execution paths. Updating
lists in production can be as complex as application
deployment<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l1 level1 lfo5"><b>Method
Placement Limitations:</b> Methods marked for HotCodeHeap
are permanently placed into HotCodeHeap. No mechanism to
remove methods that become less frequently used.<o:p></o:p></li>
</ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We addressed these problems with dynamic
lists of methods. We implemented a Java agent that runs within
the same JVM to dynamically detect and manage hot Java methods
without prior method identification. The agent detects hot
methods using JFR. The agent manages hot Java methods in
HotCodeHeap with compiler directives. A new compiler directive
marks methods with dynamic states ("hot" or "cold"). Methods
marked by the “hot” state are recompiled and placed in
HotCodeHeap. Methods marked by the “cold” state are eventually
removed from HotCodeHeap.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Problems of this approach are:<o:p></o:p></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l3 level1 lfo6">It requires
specific, complex modifications to compiler directive
support: recompilation of Java methods affected by compiler
directives changes. This functionality is unique to Java
agent implementation and has limited potential for broader
use.<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l3 level1 lfo6">The agent
cannot guarantee Java methods are moved to/removed from the
HotCodeHeap because updates of compiler directives can fail.<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l3 level1 lfo6">The agent
knows nothing about compiled code, e.g. whether it’s C1 or
C2 compiled, code size, profile. This data can useful for
deciding to move or not to move to HotCodeHeap.<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l3 level1 lfo6">Recompilations,
especially C2, are expensive. Having many of them can cause
performance issues. Also recompiled code might differ from
the code we have detected as “hot”.<o:p></o:p></li>
</ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Running these two approaches in production
we learned:<o:p></o:p></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8">We detect
95% of actively used methods withing the first 30 minutes of
an application run. This is with JFR profiling configured:
90 seconds session duration, sampling each 11 ms, 8 minutes
between profiling sessions. We can find actively used
methods faster if we reduce a pause between profiling
sessions and sampling period. However it will increase the
profiling overhead and affect application performance. With
the current configuration, the profiling overhead is between
1% - 2%. <o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8">A set of
actively used methods gets into the steady state (no new
methods added to, no methods removed from) within the first
60 minutes.<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8">Static
lists, when created from runs close to production, have 80%
- 90% methods always in use. This does not change over time.<o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8">Predicting
the size of HotCodeHeap is difficult, especially with
dynamic lists.<o:p></o:p></li>
</ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We want to have grouping of hot method
functionality as a part Hotspot JVM. We will group only C2
compiled methods. We can group JVMCI compiled methods, e.g.
Graal, if needed. We need profiling precise enough to detect
major Java methods. Low overhead is more important than
precision.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We think we can have a solution which does
not require a lot of code:<o:p></o:p></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8"><span lang="EN-US">Detect hot code: we can an implementation
based on the Sweeper:
<a href="https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp" moz-do-not-send="true" class="moz-txt-link-freetext">
https://github.com/openjdk/jdk17u/blob/master/src/hotspot/share/runtime/sweeper.hpp</a>.
We will use the handshakes mechanism, what the Sweeper
used, to detect nmethods on the top of thread stacks.</span><o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8"><span lang="EN-US">Group hot code: we have a draft PR
<a href="https://github.com/openjdk/jdk/pull/23573" moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/jdk/pull/23573</a>.
It implements relocation of nmethods within CodeCache.</span><o:p></o:p></li>
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo8"><span lang="EN-US">Maintain grouped code: we will add an
additional code heap where hot nmethods will be relocated
to.</span><o:p></o:p></li>
</ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">What do you think about this approach? Are
there other possible solutions?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Evgeny A.<o:p></o:p></p>
</div>
<br>
<br>
<br>
Amazon Development Centre (London) Ltd.Registered in England and
Wales with registration number 04543232 with its registered office
at 1 Principal Place, Worship Street, London EC2A 2FA, United
Kingdom.<br>
<br>
<br>
</blockquote>
</body>
</html>