Proposal of a new version of AsyncGetCallTrace

David Holmes david.holmes at oracle.com
Sun Mar 20 22:41:17 UTC 2022


Hi Johannes,

On 18/03/2022 7:43 pm, Bechberger, Johannes wrote:
> Hi,
> 
> I would like propose to
> 
> 1. Replace duplicated stack walking code with unified API
> 2. Create a new version of AsyncGetCallTrace, tentatively called "AsyncGetCallTrace2", with more information on more frames using the unified API
> 
> A demo (as well as this text) is available at https://github.com/parttimenerd/asgct2-demo
> if you want to see a prototype of this proposal in action.
> 
> Unify Stack Walking
> ================
> 
> There are currently multiple implementations of stack walking in JFR and for AsyncGetCallTrace.
> They each implement their own extension of vframeStream but with comparable features
> and check for problematic frames.
> 
> My proposal is, therefore, to replace the stack walking code with a unified API that
> includes all error checking and vframeStream extensions in a single place.
> The prosposed new class is called StackWalker and could be part of
> `jfr/recorder/stacktrace` [1].

So we already have the StackWalker API provided at the Java level and 
with the implementation in the VM in 
src/hotspot/share/prims/stackwalk.cpp. How does that fit in with what 
you propose?

Cheers,
David
-----

> This class also supports getting information on C frames so it can be potentially
> used for walking stacks in VMError (used to create hs_err files), further
> reducing the amount of different stack walking code.
> 
> AsyncGetCallTrace2
> ================
> 
> The AsyncGetCallTrace call has seen increasing use in recent years
> in profilers like async-profiler.
> But it is not really an API (not exported in any header) and
> the information on frames it returns is pretty limited
> (only the method and bci for Java frames) which makes implementing
> profilers and other tooling harder. Tools like async-profiler
> have to resort to complicated code to partially obtain the information
> that the JVM already has.
> Information that is currently hidden and impossible to obtain is
> 
> - whether a compiled frame is inlined (currently only obtainable for the topmost compiled frames)
>    -  although this can be obtained using JFR
> - C frames that are not at the top of the stack
> - compilation level (C1 or C2 compiled)
> 
> This information is helpful when profiling and tuning the VM for
> a given application and also for profiling code that uses
> JNI heavily.
> 
> Using the proposed StackWalker class, implementing a new API
> that returns more information on frames is possible
> as a thin wrapper over the StackWalker API [2].
> This also improves the maintainability as the code used
> in this API is used in multiple places and is therefore
> also better tested than the previous implementation, see
> [1] for the implementation.
> 
> The following describes the proposed API:
> 
> ```cpp
> void AsyncGetCallTrace2(asgct2::CallTrace *trace, jint depth, void* ucontext);
> ```
> 
> The structure of `CallTrace` is the same as the original
> `ASGCT_CallTrace` with the same error codes encoded in <= 0
> values of `num_frames`.
> 
> ```cpp
> typedef struct {
>    JNIEnv *env_id;                   // Env where trace was recorded
>    jint num_frames;                  // number of frames in this trace
>    CallFrame *frames;                // frames
>    void* frame_info;                 // more information on frames
> } CallTrace;
> ```
> 
> The only difference is that the `frames` array also contains
> information on C frames and the field `frame_info`.
> The `frame_info` is currently null and can later be used
> for extended information on each frame, being an array with
> an element for each frame. But the type of the
> elements in this array is implementation specific.
> This akin to `compile_info` field in JVMTI's CompiledMethodLoad
> [3] and used for extending the information returned by the
> API later.
> 
> Protoype
> ------------
> 
> Currently `CallFrame` is implemented in the prototype [4] as
> 
> ```cpp
> typedef struct {
>    void *machine_pc;           // program counter, for C and native frames (frames of native methods)
>    uint8_t type;               // frame type (single byte)
>    uint8_t comp_level;         // highest compilation level of a method related to a Java frame
>    // information from original CallFrame
>    jint bci;                   // bci for Java frames
>    jmethodID method_id;        // method ID for Java frames
> } CallFrame;
> ```
> 
> The `FrameTypeId` is based on the frame type in JFRStackFrame:
> 
> ```cpp
> enum FrameTypeId {
>    FRAME_INTERPRETED = 0,
>    FRAME_JIT         = 1, // JIT compiled
>    FRAME_INLINE      = 2, // inlined JITed methods
>    FRAME_NATIVE      = 3, // native wrapper to call C methods from Java
>    FRAME_CPP         = 4  // c/c++/... frames, stub frames have CompLevel_all
> };
> ```
> 
> The `comp_level` states the compilation level of the method related to the frame
> with higher numbers representing "more" compilation. `0` is defined as
> interpreted. It is modeled after the `CompLevel` enum in `compiler/compilerDefinitions`:
> 
> ```cpp
> // Enumeration to distinguish tiers of compilation
> enum CompLevel {
>    // ...
>    CompLevel_none              = 0,         // Interpreter
>    CompLevel_simple            = 1,         // C1
>    CompLevel_limited_profile   = 2,         // C1, invocation & backedge counters
>    CompLevel_full_profile      = 3,         // C1, invocation & backedge counters + mdo
>    CompLevel_full_optimization = 4          // C2 or JVMCI
> };
> ```
> 
> The traces produced by this prototype are fairly large
> (each frame requires 24 is instead of 16 bytes on 64 bit systems) and some data is
> duplicated.
> The reason for this is that it simplified the extension of async-profiler
> for the prototype, as it only extends the data structures of
> the original AsyncGetCallTrace API without changing the original fields.
> 
> Proposal
> ------------
> 
> But packing the information and reducing duplication is of course possible
> if we step away from the former constraint:
> 
> ```cpp
> enum FrameTypeId {
>    FRAME_JAVA         = 1, // JIT compiled and interpreted
>    FRAME_JAVA_INLINED = 2, // inlined JIT compiled
>    FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
>    FRAME_STUB         = 4, // VM generated stubs
>    FRAME_CPP          = 5  // C/C++/... frames
> };
> 
> typedef struct {
>    uint8_t type;            // frame type
>    uint8_t comp_level;
>    uint16_t bci;            // 0 < bci < 65536
>    jmethodID method_id;
> } JavaFrame;               // used for FRAME_JAVA and FRAME_JAVA_INLINED
> 
> typedef struct {
>    FrameTypeId type;     // single byte type
>    void *machine_pc;
> } NonJavaFrame;         // used for FRAME_NATIVE, FRAME_STUB and FRAME_CPP
> 
> typedef union {
>    FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame
>    JavaFrame java_frame;
>    NonJavaFrame non_java_frame;
> } CallFrame;
> ```
> 
> This uses the same amount of space per frame (16 bytes) as the original but encodes far more information.
> 
> Best regards
> Johannes
> 
> [1] https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/jfr/recorder/stacktrace/stackWalker.hpp
> 
> [2] https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.cpp****
> 
> [3] https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#CompiledMethodLoad
> 
> [4] https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.hpp
> 


More information about the serviceability-dev mailing list