Proposal of a new version of AsyncGetCallTrace
David Holmes
david.holmes at oracle.com
Sun Mar 20 22:41:17 UTC 2022
Hi Johannes,
On 18/03/2022 7:43 pm, Bechberger, Johannes wrote:
> Hi,
>
> I would like propose to
>
> 1. Replace duplicated stack walking code with unified API
> 2. Create a new version of AsyncGetCallTrace, tentatively called "AsyncGetCallTrace2", with more information on more frames using the unified API
>
> A demo (as well as this text) is available at https://github.com/parttimenerd/asgct2-demo
> if you want to see a prototype of this proposal in action.
>
> Unify Stack Walking
> ================
>
> There are currently multiple implementations of stack walking in JFR and for AsyncGetCallTrace.
> They each implement their own extension of vframeStream but with comparable features
> and check for problematic frames.
>
> My proposal is, therefore, to replace the stack walking code with a unified API that
> includes all error checking and vframeStream extensions in a single place.
> The prosposed new class is called StackWalker and could be part of
> `jfr/recorder/stacktrace` [1].
So we already have the StackWalker API provided at the Java level and
with the implementation in the VM in
src/hotspot/share/prims/stackwalk.cpp. How does that fit in with what
you propose?
Cheers,
David
-----
> This class also supports getting information on C frames so it can be potentially
> used for walking stacks in VMError (used to create hs_err files), further
> reducing the amount of different stack walking code.
>
> AsyncGetCallTrace2
> ================
>
> The AsyncGetCallTrace call has seen increasing use in recent years
> in profilers like async-profiler.
> But it is not really an API (not exported in any header) and
> the information on frames it returns is pretty limited
> (only the method and bci for Java frames) which makes implementing
> profilers and other tooling harder. Tools like async-profiler
> have to resort to complicated code to partially obtain the information
> that the JVM already has.
> Information that is currently hidden and impossible to obtain is
>
> - whether a compiled frame is inlined (currently only obtainable for the topmost compiled frames)
> - although this can be obtained using JFR
> - C frames that are not at the top of the stack
> - compilation level (C1 or C2 compiled)
>
> This information is helpful when profiling and tuning the VM for
> a given application and also for profiling code that uses
> JNI heavily.
>
> Using the proposed StackWalker class, implementing a new API
> that returns more information on frames is possible
> as a thin wrapper over the StackWalker API [2].
> This also improves the maintainability as the code used
> in this API is used in multiple places and is therefore
> also better tested than the previous implementation, see
> [1] for the implementation.
>
> The following describes the proposed API:
>
> ```cpp
> void AsyncGetCallTrace2(asgct2::CallTrace *trace, jint depth, void* ucontext);
> ```
>
> The structure of `CallTrace` is the same as the original
> `ASGCT_CallTrace` with the same error codes encoded in <= 0
> values of `num_frames`.
>
> ```cpp
> typedef struct {
> JNIEnv *env_id; // Env where trace was recorded
> jint num_frames; // number of frames in this trace
> CallFrame *frames; // frames
> void* frame_info; // more information on frames
> } CallTrace;
> ```
>
> The only difference is that the `frames` array also contains
> information on C frames and the field `frame_info`.
> The `frame_info` is currently null and can later be used
> for extended information on each frame, being an array with
> an element for each frame. But the type of the
> elements in this array is implementation specific.
> This akin to `compile_info` field in JVMTI's CompiledMethodLoad
> [3] and used for extending the information returned by the
> API later.
>
> Protoype
> ------------
>
> Currently `CallFrame` is implemented in the prototype [4] as
>
> ```cpp
> typedef struct {
> void *machine_pc; // program counter, for C and native frames (frames of native methods)
> uint8_t type; // frame type (single byte)
> uint8_t comp_level; // highest compilation level of a method related to a Java frame
> // information from original CallFrame
> jint bci; // bci for Java frames
> jmethodID method_id; // method ID for Java frames
> } CallFrame;
> ```
>
> The `FrameTypeId` is based on the frame type in JFRStackFrame:
>
> ```cpp
> enum FrameTypeId {
> FRAME_INTERPRETED = 0,
> FRAME_JIT = 1, // JIT compiled
> FRAME_INLINE = 2, // inlined JITed methods
> FRAME_NATIVE = 3, // native wrapper to call C methods from Java
> FRAME_CPP = 4 // c/c++/... frames, stub frames have CompLevel_all
> };
> ```
>
> The `comp_level` states the compilation level of the method related to the frame
> with higher numbers representing "more" compilation. `0` is defined as
> interpreted. It is modeled after the `CompLevel` enum in `compiler/compilerDefinitions`:
>
> ```cpp
> // Enumeration to distinguish tiers of compilation
> enum CompLevel {
> // ...
> CompLevel_none = 0, // Interpreter
> CompLevel_simple = 1, // C1
> CompLevel_limited_profile = 2, // C1, invocation & backedge counters
> CompLevel_full_profile = 3, // C1, invocation & backedge counters + mdo
> CompLevel_full_optimization = 4 // C2 or JVMCI
> };
> ```
>
> The traces produced by this prototype are fairly large
> (each frame requires 24 is instead of 16 bytes on 64 bit systems) and some data is
> duplicated.
> The reason for this is that it simplified the extension of async-profiler
> for the prototype, as it only extends the data structures of
> the original AsyncGetCallTrace API without changing the original fields.
>
> Proposal
> ------------
>
> But packing the information and reducing duplication is of course possible
> if we step away from the former constraint:
>
> ```cpp
> enum FrameTypeId {
> FRAME_JAVA = 1, // JIT compiled and interpreted
> FRAME_JAVA_INLINED = 2, // inlined JIT compiled
> FRAME_NATIVE = 3, // native wrapper to call C methods from Java
> FRAME_STUB = 4, // VM generated stubs
> FRAME_CPP = 5 // C/C++/... frames
> };
>
> typedef struct {
> uint8_t type; // frame type
> uint8_t comp_level;
> uint16_t bci; // 0 < bci < 65536
> jmethodID method_id;
> } JavaFrame; // used for FRAME_JAVA and FRAME_JAVA_INLINED
>
> typedef struct {
> FrameTypeId type; // single byte type
> void *machine_pc;
> } NonJavaFrame; // used for FRAME_NATIVE, FRAME_STUB and FRAME_CPP
>
> typedef union {
> FrameTypeId type; // to distinguish between JavaFrame and NonJavaFrame
> JavaFrame java_frame;
> NonJavaFrame non_java_frame;
> } CallFrame;
> ```
>
> This uses the same amount of space per frame (16 bytes) as the original but encodes far more information.
>
> Best regards
> Johannes
>
> [1] https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/jfr/recorder/stacktrace/stackWalker.hpp
>
> [2] https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.cpp****
>
> [3] https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#CompiledMethodLoad
>
> [4] https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.hpp
>
More information about the hotspot-jfr-dev
mailing list