RFR: 8259383: AsyncGetCallTrace() crashes sporadically

Andrei Pangin github.com+1749416+apangin at openjdk.java.net
Wed Jan 20 03:44:55 UTC 2021


On Tue, 19 Jan 2021 22:23:38 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> Hi,
>> may I please ask the community to review this small fix? It closes another hole in AsyncGetCallTrace(). 
>> Thanks a lot!
>> Lutz
>
> The changes look ok to me. I think it would be good to get someone from the compiler team to verify your reasoning here.

Hi Lutz,

Thank you for working on this issue.
I'm not a formal OpenJDK Reviewer, but as an author of async-profiler project (which heavily relies on AsyncGetCallTrace), I'm more or less familiar with this code. Let me leave a few comments.

1. The statement `if (candidate.pc() != NULL) return false;` looks a bit odd to me. When a profiler calls AsyncGetCallTrace, pc of the initial frame is obtained from a signal context, and it is never NULL. Returning false when pc != NULL basically invalidates the entire `if (fr->cb() == NULL)` branch. If this was intended, the branch could be simplified.
2. At the same time, returning false whenever cb == NULL, discards a fair amount of valid stack traces, e.g. when a runtime function is called from Java without switching to in_vm state. I guess, the original intention of that loop was to handle such cases, but, as you noticed, it does not currently work well because of the assertions. I'd rather prefer revising the logic of that loop then instead of dropping all those valid samples.
3. What I don't really understand from the bug report is why `if (fr->cb() == NULL)` branch is executed at all in this particular example. According to the stack trace from the crash dump, the top frame (before a signal handler) is `ThreadCritical::~ThreadCritical()`. This means, a thread is in_vm state, and its last_Java_frame should have been set. In this case AsyncGetCallTrace prefers to take last_Java_frame, and thus cb should be non-NULL. Another suspicious thing is that a frame below `ThreadCritical::~ThreadCritical()` is not a C frame. This cannot normally happen (the execution is in the middle of `ThreadCritical::~ThreadCritical()`, where the stack frame is consistent). These facts make me think something bad has already happened with the stack earlier, and the failed guarantee is probably just one manifestation of a bigger problem.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2032


More information about the hotspot-compiler-dev mailing list