RFR: 8350111: [PPC] AsyncGetCallTrace crashes when called while handling SIGTRAP

Richard Reingruber rrich at openjdk.org
Wed Feb 26 08:04:53 UTC 2025


On Tue, 25 Feb 2025 15:20:50 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

> > Can this also happen on other platforms when in signal handling (e.g. segfault based nullchecks?)
> 
> I guess such problems can happen on all platforms which use some kind of link register (aarch64, s390, ?).

The actual issue here is that an attempt to walk native stack frames fails and we don't recognize that the stack is not walkable for our stackwalking code. The concrete problem is (likely) that caller pc was not yet stored to the stack. This specific problem cannot occur on x86 (caller pc passed on stack) but also there pushing a new frame isn't atomic and there are states where our stackwalking code can crash I'm sure.

>I also don't like that we lose so many samples with this current solution. I only approved it because I think it is better than crashing.
> Recognizing that a signal handler is on stack may be a better solution.

This would avoid this specific type of crash.
Attempts to walk native frames until the top java frame is found can fail, though, in similar ways.
That's what I meant referring to ffi calls in the pr description.

> Do we already have functionality for that? There are efforts to read the stack at a safepoint. @parttimenerd: Would it make sense to wait for that?

With that enhancement we would capture the top java frame (sp, pc) in the signal handler too and then do the stack walk at the safepoint. Finding the top java frame is the purpose of [find_initial_Java_frame](https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/hotspot/share/prims/forte.cpp#L271) but it crashes and would also crash with the walk of java frames delayed to the next safepoint.
It would only help if we would use the java frame (sp, pc) we find on top at the safepoint but doing so you loose precision, e.g. if you where in an critical ffi call when the thread was interrupted then you would loose this information.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23641#issuecomment-2684215884


More information about the hotspot-runtime-dev mailing list