RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing [v4]

Thu Apr 14 11:33:15 UTC 2022

On Thu, 14 Apr 2022 11:06:50 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> We have only one other subsystem that does this kind of wholesale catching of error signals and then unwinding the stack (JFR) and presumably they fine-combed their coding and are sure exactly what they do under the sigjmp guard. Are we also certain?

They share almost all of their code. Allocating on the heap is not safe in the signal handler and should be considered an error. My goal is remove all code that (potentially) modifies the heap or the state of the VM from AsyncGetCallTrace, with the focus on the methods that are called from AsyncGetCallTrace but from JFR. Both are so similar that this practical.

Regarding the `NoHandleMark`: I must have missed this.

> I think the safer approach, albeit much more work intensive, would be to make sure we do not crash. Starting with removing RA allocation. I tried to make RA signal safe with https://bugs.openjdk.java.net/browse/JDK-8282405, but that got totally stuck, so better just remove it altogether from AGCT. Over using SafeFetch or plain defensive coding to avoid crashing.

They are complementary goals. My goal is to improve ASGCT and its stability but this PR should help prevent rare segmentation faults to appear in production.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8225