RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread

Wed Sep 23 18:57:02 UTC 2020

On Wed, 23 Sep 2020 18:16:11 GMT, Christian Humer <github.com+4346215+chumer at openjdk.org> wrote:

>>> > > > I would like to hear answer to @dholmes-ora question in JBS:
>>> > > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?"
>>> > > 
>>> > > 
>>> > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current
>>> > > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the
>>> > > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it
>>> > > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than
>>> > > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other
>>> > > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM?
>>> > 
>>> > 
>>> > 
>>> > 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not
>>> > sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so
>>> > much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual
>>> > performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging
>>> > Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in
>>> > JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be
>>> > limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives
>>> > you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read
>>> > locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something
>>> > we do, unless you change the locals, which you said you are not.  Please let me know if there is anything I missed. But
>>> > so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I
>>> > would like to better understand the problem domain before taking this further.
>>> 
>>> Thanks for your quick follow-up.
>>> 
>>>     1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing
>>>     Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for
>>>     implementing part of the management API.
>>> 
>>>     2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when the decision to implement and include
>>>     iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that decision. I was assuming that
>>>     whatever reason not to go with JVMTI back then would still hold true today. So say we wanted to adopt the JVMTI
>>>     approach now. Would the design be an in-process always on and in-process JVMTI agent? Would there be security
>>>     implications from such an approach leaving any VM running anything Truffle more vulnerable?
>>> 
>>>     3. No need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that.
>> 
>> So I understand it, you really have 2 cases:
>> 1) Using the debugger
>> 2) To support other APIs that need a stack trace
>> 
>> So if you use JVMTI for the debugging (like everybody else), that seems to be a solved problem.
>> As for the second use case, I hope you can use java.lang.StackWalker? It should give you all the info you could dream
>> of. If you can't use all classes in java.lang.* then I fear that you are in a lot of trouble using HotSpot in general.
>
> Tuning in to provide some background on why Truffle needs this and why we spent a lot of time to stabilize this PR. If
> we could have gone a different route we would have.
> Truffle introduces the separation of guest and host language. As host language, we understand the Java host VM. This is
> either HotSpot (relevant for this PR) or SubstrateVM (Native Image). Guest languages are interpreters implemented on
> top of Truffle, like JavaScript, Ruby, or Python, but also Espresso our Java implementation based on Truffle. Truffle
> uses Graal and JVMCI to optimize these guest languages to optimized machine code using a technique called the first
> Futamura projection. This Graal compilation is limited to JDKs that provide JVMCI APIs.  In Truffle we use the notion
> of guest and host stack frames. Guest stack frames represent a method activation in the guest language and host stack
> frames represent host Java method activations. A guest stack frame entry consists of the call location (Node) and guest
> frame (VirtualFrame) that contains the guest local variables.  Truffle languages need to access guest frames of the
> current thread to construct a stack trace or to lazily access variables in a parent guest frame. There are two
> techniques to do this: 1. Have a separate stack data structure on the heap that keeps track of the guest frames for
> each thread. 2. Walk the alive local variables in Java host frames to access the guest frame and node call location.
> We use the technique (1) to implement Truffle guest stack traces on a JVM without JVMCI support. This is pretty simple
> and allows us to walk the guest stack for any thread we need. But, there are downsides with this though:
> * For each method invocation we have additional overhead for writing the external data structure.
> * The frame always escapes the current compilation scope and can therefore not be escape analyzed by Graal.
> 
> Both of these issues are deal-breakers, performance-wise. With Truffle we want to be competitive with other specialized
> VMs, so technique (1) is not good enough. JVMCI exposes stack walking APIs for Truffle that allows us to access the
> host frame local variables of the current thread. This allows us to lazily reconstruct the guest frames from the host
> frames from certain known and alive local variables. We also have special logic to reconstruct read-only guest frames
> from optimized Truffle+Graal compiled methods without the need to invalidate the optimized code.  We are using the
> technique (2) successfully for many years, but now with the growing maturity of Truffle we have new requirements:  1.
> We need to be able to walk all the root pointers of a guest language. This includes all active guest frames. This is
> needed to allow languages to walk all alive objects (e.g. Ruby needs that) and to compute the retained size of a
> truffle guest language context. 2. We need to be able to read locals from other threads to produce the guest stack
> trace of other threads in the Truffle debugger.  This was not a big issue before, because we were mostly dealing with
> single-threaded languages (JavaScript).  The Truffle debugger should not be confused with the Java host debugger. The
> Truffle debugger works based on the Truffle instrumentation framework and cannot debug Java host code. It only shows
> guest stack frames and statements and is entirely agnostic to which Java methods were used to implement it and on which
> Java VM it runs on. It is entirely built with Java, without the use of JVMTI, this allows us to debug guest code
> without having the Java debugger attached. It allows us to on-demand enable debugging in a production scenario when it
> is needed and only for a guest context that needs it without slowing down others (e.g. in an app server). Truffle
> debugging works on SubstrateVM (native-image) which has currently no support for JVMTI. Enabling and not using the
> debugger also comes without any peak performance overhead (some memory and warmup overhead).  To summarize: 1. We
> cannot use the StackWalker API as it does not allow us to access the local variables we need. 2. We cannot manually
> push/pop guest language frames, as this would be too bad for performance. 3. We cannot use JVMTI because: 3a. We need
> it to implement language features, not just debugger features. 3b. There is no way to enable it on demand for an
> individual guest application (we run multiple guest applications per host VM).  3c. Using JVMTI would slow down the
> host VM.  Therefore our best idea was to introduce this new JVMCI API. We are of course open to other suggestions, if
> they solve our problem. This is also not an entirely new feature, this PR is an extension to the existing JVMCI
> functionality to walk the stack frames with local variable access.  I hope these clarifications were helpful.

java.lang.StackWalker does expose locals as well. What am I missing?

-------------

PR: https://git.openjdk.java.net/jdk/pull/110