RFR: 8350111: [PPC] AsyncGetCallTrace crashes when called while handling SIGTRAP [v3]

Martin Doerr mdoerr at openjdk.org
Thu Feb 27 10:38:58 UTC 2025


On Thu, 27 Feb 2025 01:51:58 GMT, Andrei Pangin <apangin at openjdk.org> wrote:

>> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Improve whitespace
>
> A couple of comments for the record.
> 
> Detecting another signal handler on the stack or blocking SIGPROF inside a handler is not a solution: a signal number that profiler uses is configurable; there may be multiple profilers working at the same time or one profiler working in dual mode (cpu + wall clock).
> 
> In any case, the problem is not specific to signal handlers: it may happen with any frame that does not store frame pointer at a known location. A typical example is `clock_gettime` function called from `System.currentTimeMillis` and `System.nanoTime`. If libc is compiled without frame pointers, JVM fails to unwind `clock_gettime`. Note that `currentTimeMillis` and `nanoTime` are JVM intrinsics: they do not do regular state transition; a thread remains `in_Java` while executing `clock_gettime`. A signal trampoline is just another example of code with uncommon frame layout (not only on PPC).
> 
> I'm OK with the proposed fix as long as it reduces possibility of crashes, but it's likely not a bullet-proof solution. Any native frame that does not belong to `libjvm.so` is potentially dangerous to walk.

@apangin: Thanks for looking at this PR!

> In any case, the problem is not specific to signal handlers: it may happen with any frame that does not store frame pointer at a known location. A typical example is clock_gettime function called from System.currentTimeMillis and System.nanoTime. If libc is compiled without frame pointers, JVM fails to unwind clock_gettime. Note that currentTimeMillis and nanoTime are JVM intrinsics: they do not do regular state transition; a thread remains in_Java while executing clock_gettime. A signal trampoline is just another example of code with uncommon frame layout (not only on PPC).

I think frame pointers are problematic on some platforms, but not on PPC64. The PPC64 ABI requires a valid back chain at all time. *SP always points to the previous frame and frames are pushed atomically.
On PPC64, retrieving the return PC from the top function (PC from ucontext) is unreliable because we don't know if it lives in the LR register or it was already written on stack. x86 doesn't have this problem because the call instruction writes the return PC on stack. I guess most other platforms have the problem like PPC64 that it's hard to figure out if we are before or after the frame complete offset.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23641#issuecomment-2687549517


More information about the hotspot-runtime-dev mailing list