RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread

Wed Sep 23 18:22:25 UTC 2020

On Wed, 23 Sep 2020 12:19:52 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>>> > > I would like to hear answer to @dholmes-ora question in JBS:
>>> > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?"
>>> > 
>>> > 
>>> > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current
>>> > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the
>>> > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it
>>> > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than
>>> > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other
>>> > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM?
>>> 
>>> 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not
>>> sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so
>>> much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual
>>> performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging
>>> Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in
>>> JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be
>>> limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives
>>> you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read
>>> locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something
>>> we do, unless you change the locals, which you said you are not.  Please let me know if there is anything I missed. But
>>> so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I
>>> would like to better understand the problem domain before taking this further.
>> 
>> Thanks for your quick follow-up.
>> 
>> 1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing
>> Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for
>> implementing part of the management API.   2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when
>> the decision to implement and include iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that
>> decision. I was assuming that whatever reason not to go with JVMTI back then would still hold true today. So say we
>> wanted to adopt the JVMTI approach now. Would the design be an in-process always on and in-process JVMTI agent? Would
>> there be security implications from such an approach leaving any VM running anything Truffle more vulnerable?  3. No
>> need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that.
>
>> > > > I would like to hear answer to @dholmes-ora question in JBS:
>> > > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?"
>> > > 
>> > > 
>> > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current
>> > > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the
>> > > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it
>> > > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than
>> > > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other
>> > > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM?
>> > 
>> > 
>> > 
>> > 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not
>> > sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so
>> > much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual
>> > performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging
>> > Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in
>> > JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be
>> > limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives
>> > you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read
>> > locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something
>> > we do, unless you change the locals, which you said you are not.  Please let me know if there is anything I missed. But
>> > so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I
>> > would like to better understand the problem domain before taking this further.
>> 
>> Thanks for your quick follow-up.
>> 
>>     1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing
>>     Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for
>>     implementing part of the management API.
>> 
>>     2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when the decision to implement and include
>>     iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that decision. I was assuming that
>>     whatever reason not to go with JVMTI back then would still hold true today. So say we wanted to adopt the JVMTI
>>     approach now. Would the design be an in-process always on and in-process JVMTI agent? Would there be security
>>     implications from such an approach leaving any VM running anything Truffle more vulnerable?
>> 
>>     3. No need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that.
> 
> So I understand it, you really have 2 cases:
> 1) Using the debugger
> 2) To support other APIs that need a stack trace
> 
> So if you use JVMTI for the debugging (like everybody else), that seems to be a solved problem.
> As for the second use case, I hope you can use java.lang.StackWalker? It should give you all the info you could dream
> of. If you can't use all classes in java.lang.* then I fear that you are in a lot of trouble using HotSpot in general.

Tuning in to provide some background on why Truffle needs this and why we spent a lot of time to stabilize this PR. If
we could have gone a different route we would have.

Truffle introduces the separation of guest and host language. As host language, we understand the Java host VM. This is
either HotSpot (relevant for this PR) or SubstrateVM (Native Image). Guest languages are interpreters implemented on
top of Truffle, like JavaScript, Ruby, or Python, but also Espresso our Java implementation based on Truffle. Truffle
uses Graal and JVMCI to optimize these guest languages to optimized machine code using a technique called the first
Futamura projection. This Graal compilation is limited to JDKs that provide JVMCI APIs.

In Truffle we use the notion of guest and host stack frames. Guest stack frames represent a method activation in the
guest language and host stack frames represent host Java method activations. A guest stack frame entry consists of the
call location (Node) and guest frame (VirtualFrame) that contains the guest local variables.

Truffle languages need to access guest frames of the current thread to construct a stack trace or to lazily access
variables in a parent guest frame. There are two techniques to do this: 1. Have a separate stack data structure on the
heap that keeps track of the guest frames for each thread. 2. Walk the alive local variables in Java host frames to
access the guest frame and node call location.

We use the technique (1) to implement Truffle guest stack traces on a JVM without JVMCI support. This is pretty simple
and allows us to walk the guest stack for any thread we need. But, there are downsides with this though:
* For each method invocation we have additional overhead for writing the external data structure.
* The frame always escapes the current compilation scope and can therefore not be escape analyzed by Graal.

Both of these issues are deal-breakers, performance-wise. With Truffle we want to be competitive with other specialized
VMs, so technique (1) is not good enough. JVMCI exposes stack walking APIs for Truffle that allows us to access the
host frame local variables of the current thread. This allows us to lazily reconstruct the guest frames from the host
frames from certain known and alive local variables. We also have special logic to reconstruct read-only guest frames
from optimized Truffle+Graal compiled methods without the need to invalidate the optimized code.

We are using the technique (2) successfully for many years, but now with the growing maturity of Truffle we have new
requirements:

1. We need to be able to walk all the root pointers of a guest language. This includes all active guest frames. This is
needed to allow languages to walk all alive objects (e.g. Ruby needs that) and to compute the retained size of a
truffle guest language context. 2. We need to be able to read locals from other threads to produce the guest stack
trace of other threads in the Truffle debugger.

This was not a big issue before, because we were mostly dealing with single-threaded languages (JavaScript).

The Truffle debugger should not be confused with the Java host debugger. The Truffle debugger works based on the
Truffle instrumentation framework and cannot debug Java host code. It only shows guest stack frames and statements and
is entirely agnostic to which Java methods were used to implement it and on which Java VM it runs on. It is entirely
built with Java, without the use of JVMTI, this allows us to debug guest code without having the Java debugger
attached. It allows us to on-demand enable debugging in a production scenario when it is needed and only for a guest
context that needs it without slowing down others (e.g. in an app server). Truffle debugging works on SubstrateVM
(native-image) which has currently no support for JVMTI. Enabling and not using the debugger also comes without any
peak performance overhead (some memory and warmup overhead).

To summarize:
1. We cannot use the StackWalker API as it does not allow us to access the local variables we need.
2. We cannot manually push/pop guest language frames, as this would be too bad for performance.
3. We cannot use JVMTI because:
3a. We need it to implement language features, not just debugger features.
3b. There is no way to enable it on demand for an individual guest application (we run multiple guest applications per
host VM).  3c. Using JVMTI would slow down the host VM.

Therefore our best idea was to introduce this new JVMCI API. We are of course open to other suggestions, if they solve
our problem. This is also not an entirely new feature, this PR is an extension to the existing JVMCI functionality to
walk the stack frames with local variable access.

I hope these clarifications were helpful.

-------------

PR: https://git.openjdk.java.net/jdk/pull/110