RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread

Thu Sep 24 08:04:52 UTC 2020

On Thu, 24 Sep 2020 06:38:10 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>> The special case we support in the `StackWalker` API is intentionally limited, because a thread examining its own stack
>> is the least risky and most performant scenario.  The `StackWalker::walk` API point, in particular, is carefully
>> designed so that its internal implementation can internally use unsafe "dangling" pointers from the thread into its own
>> stack.  This reduces copying and buffering, which is obviously the least expensive way to "take a quick peek" at what's
>> on the stack.  It is reasonable to ask to extend such functionality to a second, uncooperative thread, but this brings
>> in lots of extra baggage:
>> - How does the requesting thread get permission to look inside the target thread?  (New security analysis.)
>> - At what point does the target thread get its state taken as a snapshot?  Any random moment?
>> - How is the target thread "held still" while it is being sampled?  (And then, "Where is this term 'safepoint' defined in
>>   the JVM specifications?")
>> - Can a target thread refuse or defer the request, to defend some particular encapsulation?
>> - How is that state stored, and what are the time and space costs for such storage?
>> - What happens if the requesting thread just wants to look at a few bits?  Do we still buffer up a whole backtrace?
>> - Or, is the target thread required to execute callbacks provided by the requesting thread, with a temporary view, and if
>>   so, that limits are there on such callbacks?
>> - Can the observation process ever cause the target thread to fail, or will any and all failures (OOME, SOE, etc.) be
>>   attributed to the requesting thread?
>> - What happens if the requesting thread makes two requests in a row:  Are there any guarantees about relations between
>>   the two sets of results?  (To be fair, this is also an issue with the self-walking case.)
>> - What happens if the requesting thread asks to change a value in a frame or pop or re-invoke or replace a frame?  (Not
>>   allowed in the self-walking case either, but a plausible extension.)
>> 
>> If only "just adding a thread parameter" were a straightforward extension…  Instead, we have serious user model issues
>> (see above), and serious implementation issues (see the PR).
>> I think we could perhaps add cross-thread access to the current `StackWalker` API, if we came up with answers to the
>> above.  I think, in order to engineer it correctly, we would want to factor it as the composition of a self-walking
>> request, *plus* a cross-call mechanism which would allow one thread to ask another thread to run a function.  Jumbling
>> these complex operations together into a big pile of new code would be the wrong way to do it.  The self-walking API is
>> pretty well understood, and there is a good literature on cross-call mechanisms too.  Let's break the problem up.
>>  BTW, the current `StackWalker` API could certainly accept minor extensions to inspect locals, and/or to perform frame
>>  replacement, as hinted above.  The JVM currently benefits from performing on-stack replacement when it can tell that a
>>  slow loop is worth (re-)optimizing as a fast loop.  There's no reason the JDK libraries (say, the streams runtime, in
>>  particular) shouldn't have a shot at doing something similar.  That would require internal JDK hooks self-inspect and
>>  replace loops with improved "customizations", on the fly.
>> 
>> All of the above comments apply only to what might be called the self-inspecting, self-reflective, or "introspective"
>> modes of stack walking.  Debuggers usually don't do this (except in one-world environments like Lisp and SmallTalk),
>> but rather operate from the side, through a privileged channel "under the virtual metal" like JVMTI.  I suppose for
>> those use cases, JVMTI is plenty good.  If there is some trick for self-attachment (either direct or through a
>> conspirator process), then some introspection is also possible, via JVMTI.  For best performance, a more "one world"
>> implementation is desirable, but this implies that we create a whole category of "debugging/monitoring code".  Such
>> debugging/monitoring code would (like today's runtime internals like those that use `Unsafe`) have privileges beyond
>> regular application code.  It might also have eBPF-like limitations on resource usage, so that its executions could be
>> hidden "under the metal" of regular executions.  IMO these are promising ideas.  They might help us define a better,
>> more cooperative debugging/monitoring primitives.  I raise the ideas here because I think there may be a root issue
>> here:  How can we use the JDK's on-line introspection APIs for more purposes?  How can we inject privileged monitoring
>> code into Java executions?  Adding yet another stack walking mechanism to the JVM seems to me like an inefficient way
>> to move, a little bit, in the direction of cooperative debugging/monitoring facilities in the JDK.  Conversely, if we
>> can create a way to do (privileged) cross-calls, then we won't need yet another stack walking mechanism.  I guess this
>> is where I end up:  Please consider refactoring this into an extension (if any is needed) to the self-inspection API
>> (`StackWalker`) and something a cross-call API.  Then we should consider hooking it up to JVMCI.
>
> John there is a lot to be said here in the solution domain. But before we get there, I want to get answers about the
> problem domain, so I know if we are solving a real or imaginary problem. The crucial question it boils down to is: "is
> remote thread stack sampling with locals needed in the non-debugger case"? If so, we can start discussing the solution
> domain of that. But I suspect we already have all the APIs in place that are needed.

A lot of good comments here. Thanks!

I agree that we should look at the problem domain first. Hence, let's look at the use-cases that was brought up once
more.

1. The Truffle debugger (not to be confused with a Java host debugger) in general needs to access all live stack traces
for all guest-language threads and read/write access to local variables.

2. Some Truffle guest languages need to access all live objects (e.g. Ruby). Espresso also needs this for implementing
e.g. getReferringObjects (through the debugger though) etc.

Onto discussing potential solutions:

After discussing the capabilities of JVMTI internally, it seems that the current implementation of getting locals might
not be able to return anything for escape-analyzed objects. This obviously poses a serious limitation for Truffle given
the fact that this is a corner-stone in GraalVM. I take it this played a big role in introducing iterateFrames (current
thread only) in JVMCI a while back.

Even for the debugger case we would need JVMTI to guarantee the following:
1. A safe suspension mechanism for all target threads (what if two debuggers are connected at the same time?) that
would span the entirety of getAllStackTraces + fetching all locals for all frames. I don't see how something like
suspendThreadList would provide the safe guards that we would need here. 2. A bulk getAllLocals to avoid fetching all
stack traces + retrieving all locals in bulk for all frames.

Given that Truffle/GraalVM will continue to support Java 8 for quite some time, this new API should probably be exposed
through JVMCI and backported like it was done for iterateFrames regardless of the underlying implementation.

Note: In the current PR we need to refactor materializeVirtualObjects into using VM operations to guarantee that we run
stack walking and sanity checks for locating the frame in question at a safe point. I'll hold my horses a bit on that
until there is a consensus on where this is going.

-------------

PR: https://git.openjdk.java.net/jdk/pull/110