RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

Thu Mar 10 18:07:40 UTC 2022

On Thu, 10 Mar 2022 12:31:06 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

> > Signal handlers are per-process not per-thread, so a thread need not be attached to the VM for our signal handlers to get involved - that is why they call Thread::current_or_null_safe().
> 
> Oh, right, thanks. I was too concentrated on thinking another platforms like windows, that missed the example will work for *nix.
> 
> A general question to the bug. The signal mask is per-thread, and a native thread may block the JVM signal. I think safefetch will fail, if somehow we manage to call it on this thread (without jsig). So I'm not sure the safefetch is really safe on all platforms and in all contexts, that is, it always recovers from the read failure if called on a random thread.

To expand on @dholmes-ora answer: blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault. 

For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

> 
> Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads which are in the process of getting dismantled, etc).

Another situation is error handling itself. When writing an hs-err file, we use SafeFetch to do carefully tiptoe around the possibly corrupt VM state. If the original crash happened in a foreign thread, we still want some of these reports to work (e.g. dumping register content or printing stacks). So SafeFetch should be as robust as possible.

> I'm not opposing the refactoring, it has some advantages, but I'd want to separate functional change from the refactoring. This would aid backporting, at least.

I agree that a minimal patch would be good. I feel partly guilty for this expanding discussion. I'm fine with a minimal change, without any refactoring, in whatever form @parttimenerd choses - be it OS thread local or the approach proposed by you, @AntonKozlov.

Cheers Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/7727