<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css">
.insert {
background-color: #AFA
}
.delete {
background-color: #F88;
text-decoration: line-through;
}
.tagInsert {
background-color: #070;
color: #FFF
}
.tagDelete {
background-color: #700;
color: #FFF
}
</style>
<meta name="generator" content="HTML Tidy for HTML5 for Linux version 5.6.0" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JEP TBD: Asynchronous Stack Trace VM API</title>
<style type="text/css" xml:space="preserve">
/*<![CDATA[*/
A { text-decoration: none; }
A:link { color: #437291; }
A:visited { color: #666666; }
A.anchor:link, A.anchor:visited { color: black; }
A[href]:hover { color: #e76f00; }
A IMG { border-width: 0px; }
BODY {
background: white;
margin: 2em;
margin-bottom: 150%;
font-family: DejaVu Sans, Bitstream Vera Sans, Helvetica, Verdana, sans;
font-size: 10pt; line-height: 1.4;
width: 42em;
}
TT, CODE, PRE { font-family: DejaVu Sans Mono, Bitstream Vera Sans Mono,
monospace; }
P { padding: 0pt; margin: 1ex 0em; }
P:first-child, PRE:first-child { margin-top: 0pt; }
H1 { font-size: 13pt; font-weight: bold;
padding: 0pt; margin: 2ex .5ex 1ex 0pt; }
H1:first-child, H2:first-child { margin-top: 0ex; }
H2 { font-size: 11pt; font-weight: bold;
padding: 0pt; margin: 2ex 0pt 1ex 0pt; }
H3 { font-size: 10pt; font-weight: bold; font-style: italic;
padding: 0pt; margin: 1ex 0pt 1ex 0pt; }
H4 { font-size: 9pt; font-weight: bold;
padding: 0pt; margin: 1ex 0pt 1ex 0pt; }
P.subhead { font-size: smaller; margin-top: -1ex; }
P.foot { font-size: x-small; margin-top: 6ex; }
UL, OL { margin-top: 1ex; margin-bottom: 1ex; margin-right: 2em; }
UL { padding-left: 2em; list-style-type: square; }
UL UL { margin-top: 0ex; margin-bottom: 0ex; }
LI { margin-top: 0pt; margin-bottom: 0pt; }
LI > P:first-child { margin-top: 1ex; }
BLOCKQUOTE { margin: 1.5ex 0ex; margin-left: 2em; }
/*
LI BLOCKQUOTE { margin-left: 0em; }
LI { margin: .5ex 0em; }
UL LI { list-style-type: square; }
*/
/*]]>*/
</style>
<style type="text/css" xml:space="preserve">
/*<![CDATA[*/
TABLE { border-collapse: collapse; padding: 0px; margin: 1em 0; }
TR:first-child TH, TR:first-child TD { padding-top: 0; }
TH, TD { padding: 0px; padding-top: .5ex; vertical-align: baseline; text-align: left; }
TD + TD, TH + TH { padding-left: 1em; }
TD:first-child, TH:first-child, TD.jep { text-align: right; }
TABLE.head TD:first-child { font-style: italic; padding-left: 2em; }
PRE { padding-left: 2em; margin: 1ex 0; font-size: inherit; }
TABLE PRE { padding-left: 0; margin: 0; }
TABLE.jeps TD:first-child + TD,
TABLE.jeps TD:first-child + TD + TD { padding-left: .5em; }
TABLE.jeps TD:first-child,
TABLE.jeps TD:first-child + TD,
TABLE.jeps TD:first-child + TD + TD { font-size: smaller; }
TABLE.jeps TD.cl { font-size: smaller; padding-right: 0; text-align: right; }
TABLE.jeps TD.cm { font-size: smaller; padding-left: .1em; padding-right: .1em; }
TABLE.jeps TD.cr { font-size: smaller; padding-left: 0; }
TABLE.jeps TD.z { padding-left: 0; padding-right: 0; }
TABLE.head TD { padding-top: 0; }
.withdrawn { text-decoration: line-through; }
/*]]>*/
</style>
</head>
<body>
<h1>JEP TBD: Asynchronous Stack Trace VM API</h1>
<table class="head"></table>
<div class="markdown">
<h2 id="Summary">Summary</h2>
<p>Define an efficient and reliable API <span class="delete">for</span><span class="insert">to</span> <span class="delete">asynchronous</span><span class="insert">collect</span> stack traces <span class="delete">with</span><span class="insert">asynchronously and include</span> information on<span class="insert"> both</span> Java and native<span class="insert"> stack</span> frames.</p>
<h2 id="Goals">Goals</h2>
<ul>
<li>
<p>Provide <span class="delete">an official and</span><span class="insert">a</span> well-tested API for<span class="delete"> external</span> profilers to obtain information on Java and native frames.</p>
</li>
<li>
<p>Support asynchronous usage, e.g.<span class="insert">,</span> calling<span class="delete"> the API</span> from signal handlers.</p>
</li>
<li><span class="delete">The implementation does</span>
<p><span class="insert">Do</span> not affect <span class="insert">performance when </span>the <span class="delete">performance of an JVM which</span><span class="insert">API</span> is not <span class="delete">profiled</span><span class="insert">in use</span>.</p>
</li>
<li><span class="delete">Memory</span>
<p><span class="insert">Do</span> <span class="delete">requirements for the collected data don't</span><span class="insert">not</span> significantly increase<span class="insert"> memory requirements</span> compared to the existing <code>AsyncGetCallTrace</code> <span class="delete">routine</span><span class="insert">API</span>.</p>
</li>
</ul>
<h2 id="Non-Goals">Non-Goals</h2>
<ul>
<li><span class="delete">The</span><span class="insert">It is not a goal to recommend the</span> new API <span class="delete">shall not be recommended </span>for production <span class="delete">usage</span><span class="insert">use</span>, <span class="delete">as</span><span class="insert">since</span> <span class="delete">there</span><span class="insert">it</span> <span class="delete">is</span><span class="insert">can</span> <span class="delete">a</span><span class="insert">crash</span> <span class="delete">minimal</span><span class="insert">the</span> <span class="delete">chance</span><span class="insert">VM. We will minimize the chances</span> of <span class="delete">crashing</span><span class="insert">that</span> <span class="delete">the JVM, but we minimize by addressing all issues found during</span><span class="insert">via</span> extensive testing and fuzzing.</li>
</ul>
<h2 id="Motivation">Motivation</h2>
<p>The <code>AsyncGetCallTrace</code> <span class="delete">routine</span><span class="insert">API</span> <span class="delete">has</span><span class="insert">is</span> <span class="delete">seen</span><span class="insert">used</span> <span class="delete">increasing use in recent years in profilers like </span><span class="delete">async-profiler</span><span class="delete"> with</span><span class="insert">by</span> almost all available profilers, <span class="insert">both </span>open-source and commercial, <span class="delete">using</span><span class="insert">including,</span> <span class="delete">it</span><span class="insert">e.g., </span><a href="https://github.com/jvm-profiling-tools/async-profiler"><span class="insert">async-profiler</span></a>. <span class="delete">But</span><span class="insert">Yet</span> it <span class="insert">has two major disadvantages:</span></p>
<ul>
<li><span class="insert">It </span>is<span class="delete"> only</span> an internal API,<span class="delete"> as it is</span> not exported in any header, and</li>
<li><span class="insert">It</span> <span class="delete">the</span><span class="insert">only returns</span> information <span class="delete">on</span><span class="insert">about Java</span> frames<span class="insert">,</span> <span class="delete">it</span><span class="insert">namely</span> <span class="delete">returns is pretty limited: Only the</span><span class="insert">their</span> method and <span class="delete">byte</span><span class="insert">bytecode</span> <span class="delete">code</span><span class="insert">indices.</span></li>
</ul>
<p><span class="insert">These</span> <span class="delete">index for Java frames is captured. Both</span><span class="insert">issues</span> make implementing profilers and related tooling <span class="delete">harder</span><span class="insert">more difficult</span>. <span class="delete">Tools</span><span class="insert">Some</span> <span class="delete">like async-profiler have to resort to complicated code to at least partially obtain</span><span class="insert">additional</span> information <span class="delete">that</span><span class="insert">can be extracted from</span> the <span class="delete">JVM</span><span class="insert">HotSpot</span> <span class="delete">already</span><span class="insert">VM</span> <span class="delete">has.</span><span class="insert">via</span> <span class="delete">Information</span><span class="insert">complex</span> <span class="delete">that</span><span class="insert">code, but other useful information</span> is<span class="delete"> currently</span> hidden and impossible to <span class="delete">get is</span><span class="insert">obtain:</span></p>
<ul>
<li><span class="delete">whether</span><span class="insert">Whether</span> a compiled Java frame is inlined <span class="delete">which is </span><span class="insert">(</span>currently only obtainable for the topmost compiled frames<span class="insert">),</span></li>
<li><span class="delete">the</span><span class="insert">The</span> compilation level of a Java frame (<span class="insert">i.</span>e.<span class="delete">g.</span><span class="insert">, compiled by</span> C1 or C2<span class="insert">),</span> <span class="delete">compiled)</span><span class="insert">and</span></li>
<li><span class="insert">Information on </span>C/C++ frames that are not at the top of the stack<span class="insert">.</span></li>
</ul>
<p>Such data can be helpful when profiling and tuning a VM for a given application<span class="insert">,</span> and<span class="delete"> also</span> for profiling code that uses JNI heavily.</p>
<h2 id="Description">Description</h2>
<p><span class="delete">This</span><span class="insert">We</span> <span class="delete">JEP</span><span class="insert">propose</span> <span class="delete">proposes</span><span class="insert">a</span> <span class="delete">an</span><span class="insert">new</span> <code>AsyncGetStackTrace</code> API<span class="delete"> which is</span><span class="insert">,</span> modeled <span class="delete">after</span><span class="insert">on</span> <span class="delete">AsyncGetCallTrace:</span><span class="insert">the </span><code><span class="insert">AsyncGetCallTrace</span></code><span class="insert"> API:</span></p>
<pre><code>void AsyncGetStackTrace(CallTrace *trace, jint depth, void* ucontext,
uint32_t options);
</code></pre>
<p>This API can be called by profilers to obtain the <span class="delete">call</span><span class="insert">stack</span> trace for the current thread. Calling this API from a signal<span class="delete">-</span> handler is safe<span class="insert">,</span> and the new implementation will be at least as stable as <code>AsyncGetCallTrace</code> or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the <code>CallTrace</code> <span class="delete">structure</span><span class="insert">array</span> with <span class="delete">enough</span><span class="insert">sufficient</span> memory for the requested stack depth.</p>
<p><span class="delete">Arguments:</span><span class="insert">Parameters:</span></p>
<ul>
<li><code>trace</code><span class="delete">:</span><span class="insert"> —</span> buffer for structured data to be filled in by the <span class="delete">JVM</span><span class="insert">VM</span></li>
<li><code>depth</code><span class="delete">:</span><span class="insert"> —</span> maximum depth of the call stack trace</li>
<li><code>ucontext</code><span class="delete">:</span><span class="insert"> —</span> optional <code>ucontext_t</code> of the current thread when it was interrupted</li>
<li><code>options</code><span class="delete">:</span><span class="insert"> —</span> bit set for options<span class="delete">, currently</span></li>
</ul>
<p><span class="insert">Currently</span> only the lowest bit <span class="insert">of the </span><code><span class="insert">options</span></code> is <span class="delete">considered,</span><span class="insert">considered:</span> <span class="delete">it</span><span class="insert">It</span> enables (<code>1</code>) <span class="delete">and</span><span class="insert">or</span> disables (<code>0</code>) the inclusion of C/C++ frames<span class="delete">,</span><span class="insert">.</span> <span class="delete">all</span><span class="insert">All</span> other bits are considered to be <code>0</code>
<span class="insert">.</span></p>
<p>The <code>trace</code> struct</p>
<pre><code>typedef struct {
jint num_frames; // number of frames in this trace
CallFrame *frames; // frames
void* frame_info; // more information on frames
} CallTrace;
</code></pre>
<p>is filled <span class="insert">in </span>by the VM. Its <code>num_frames</code> field contains the actual number of frames in the <code>frames</code> array or an error code. The <code>frame_info</code> field in that structure can later be used to store more information<span class="insert">,</span> but is currently <span class="delete">supposed to be </span><code>NULL</code>.</p>
<p>The error codes are a subset of the error codes for <code>AsyncGetCallTrace</code>, with the addition of <code>THREAD_NOT_JAVA</code> related to calling this procedure for non-Java threads:</p>
<pre><code>enum Error {
NO_JAVA_FRAME = 0,
NO_CLASS_LOAD = -1,
GC_ACTIVE = -2,
UNKNOWN_NOT_JAVA = -3,
NOT_WALKABLE_NOT_JAVA = -4,
UNKNOWN_JAVA = -5,
UNKNOWN_STATE = -7,
THREAD_EXIT = -8,
DEOPT = -9,
THREAD_NOT_JAVA = -10
};
</code></pre>
<p>Every <code>CallFrame</code> is the element of a union, <span class="delete">as</span><span class="insert">since</span> the information stored for Java and non-Java frames differs:</p>
<pre><code>typedef union {
FrameTypeId type; // to distinguish between JavaFrame and NonJavaFrame
JavaFrame java_frame;
NonJavaFrame non_java_frame;
} CallFrame;
</code></pre>
<p>There a several distinguishable frame types:</p>
<pre><code>enum FrameTypeId : uint8_t {
FRAME_JAVA = 1, // JIT compiled and interpreted
FRAME_JAVA_INLINED = 2, // inlined JIT compiled
FRAME_NATIVE = 3, // native wrapper to call C methods from Java
FRAME_STUB = 4, // VM generated stubs
FRAME_CPP = 5 // C/C++/... frames
};
</code></pre>
<p>The first two types are for Java frames<span class="insert">,</span> for which we store the following information in a struct of type <code>JavaFrame</code>:</p>
<pre><code>typedef struct {
FrameTypeId type; // frame type
int8_t comp_level; // compilation level, 0 is interpreted
uint16_t bci; // 0 < bci < 65536
jmethodID method_id;
} JavaFrame; // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_NATIVE
</code></pre>
<p>The <code>comp_level</code> <span class="delete">states</span><span class="insert">indicates</span> the compilation level of the method related to the frame<span class="insert">,</span> with higher numbers representing <span class="delete">"more"</span><span class="insert">higher levels of</span> compilation<span class="delete">. 0 is defined as interpreted</span>. It is modeled after the <a href="https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.hpp#L54"><code>CompLevel</code> enum</a> in <span class="delete">compiler/compilerDefinitions</span><span class="insert">HotSpot</span> but is dependent on the <span class="delete">used </span>compiler infrastructure<span class="insert"> used. A value of zero indicates no compilation, i.e., bytecode interpretation</span>.</p>
<p>Information on all other frames is stored in <span class="delete">the </span><code>NonJavaFrame</code> <span class="delete">struct:</span><span class="insert">structs:</span></p>
<pre><code>typedef struct {
FrameTypeId type; // frame type
void *pc; // current program counter inside this frame
} NonJavaFrame;
</code></pre>
<p>Although the API provides more information<span class="delete"> on the frames</span>, the amount of space required per frame (e.g.<span class="insert">,</span> 16 bytes on x86) is the same as for the <span class="delete">original</span><span class="insert">existing</span> <code>AsyncGetCallTrace</code> API.</p>
<p>We propose to <span class="delete">create</span><span class="insert">place the above declarations in</span> a<span class="insert"> new</span> static header file<span class="insert">,</span> <code>profile.h</code>. In the source tree it could be located in <code>src/java.base/share/native/include<span class="delete"> and</span></code><span class="insert">;</span> in a <span class="delete">delivered </span>JDK <span class="delete">bundle</span><span class="insert">image</span> it should be <span class="delete">contained</span><span class="insert">copied</span> <span class="delete">in</span><span class="insert">into</span> the <code>include</code> <span class="delete">folder</span><span class="insert">directory</span>. The <span class="delete">header</span><span class="insert">header’s</span> <span class="delete">needs</span><span class="insert">license</span> <span class="delete">to</span><span class="insert">should</span> <span class="delete">be provided under</span><span class="insert">include</span> the <span class="delete">"Classpath"</span><span class="insert">Classpath</span> <span class="delete">exception</span><span class="insert">Exception</span> <span class="delete">to</span><span class="insert">so</span> <span class="delete">make</span><span class="insert">that</span> it <span class="insert">is </span>consumable <span class="delete">for</span><span class="insert">by</span> <span class="delete">3rd </span><span class="insert">third-</span>party profiling tools.</p>
<p>A prototype implementation can be found <span class="delete">https://github.com/parttimenerd/jdk/tree/parttimenerd_asgct2</span><a href="https://github.com/openjdk/jdk-sandbox/tree/asgct2"><span class="insert">here</span></a><span class="insert">, and a demo combining it</span> with a <span class="delete">demo</span><span class="insert">modified</span> <span class="delete">at</span><span class="insert">async-profiler</span> <span class="delete">https://github.com/parttimenerd/asgct2-demo/.</span>
<span class="delete">Alternatives</span>
<span class="delete">Keep</span><span class="insert">can</span> <span class="delete">AsyncGetCallTrace</span><span class="insert">be</span> <span class="delete">as</span><span class="insert">found</span> <span class="delete">is, meaning a lack of maintenance and stability for a widely used de-facto API</span><a href="https://github.com/parttimenerd/asgct2-demo"><span class="insert">here</span></a>.</p>
<h2 id="Risks-and-Assumptions">Risks and Assumptions</h2>
<p>Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of <code>AsyncGetCallTrace</code> <span class="delete">as</span><span class="insert">since</span> they leak details of the implementation of standard library files and include native wrapper frames.</p>
<h2 id="Testing">Testing</h2>
<p><span class="delete">The implementation of this JEP</span><span class="insert">We</span> will add new stress tests to <span class="delete">find rare</span><span class="insert">identify</span> stability problems on all supported platforms. <span class="delete">The</span><span class="insert">We</span> <span class="delete">idea is</span><span class="insert">plan</span> to <span class="delete">run the profiling on</span><span class="insert">profile</span> a set of example programs (<span class="delete">for example</span><span class="insert">e.g.,</span> the <span class="delete">dacapo</span><span class="insert">DaCapo</span> and <span class="delete">renaissance</span><span class="insert">Renaissance</span> benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). <span class="delete">A prototype implementation can be found at https://github.com/parttimenerd/jdk-profiling-tester. The implementation</span><span class="insert">We</span> will also add substantial <span class="delete">JTREG</span><span class="insert">unit</span> tests which should cover all options and test the basic usage of the API.</p>
<span class="delete">A prototypical implementation can be found at https://github.com/openjdk/jdk-sandbox/tree/asgct2 and a demo combining it with a modified async-profiler can be found at https://github.com/parttimenerd/asgct2-demo</span>
</div>
</body>
</html>