RFR: 8359820: Improve handshake/safepoint timeout diagnostic messages [v4]

Fri Jul 18 10:01:58 UTC 2025

On Fri, 18 Jul 2025 09:14:39 GMT, Anton Artemov <duke at openjdk.org> wrote:

>> Hi, please consider the following changes:
>> 
>> The problem in the issue description is not a problem by itself, the behavior is not unexpected, but it is somewhat difficult to find out what caused SIGILL to be fired.
>> 
>> We propagate this information from `handshake::handle_timeout()` to `VMError::report()` with a help of a global variable.  The same mechanism is used to address a similar issue in the safepoint timeout handler. 
>> 
>> Tested in tiers 1-3.
>
> Anton Artemov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8359820: Addressed reviewer's comments

To clarify, I am concerned about misleading printouts. Signal issues are difficult to analyze. Signals can interleave, overlap, get lost, or be blocked. I think it's important to be precise.

For example, saving gobal information A, sending signal, and in the handler printing "signal X relates to global information A" can be wrong. The signal can have some other cause. 

It would be good to save both the sender and recipent's thread id in the global information before sending the signal, the in the signal handler to correlate this with what sigaction_t says who send the signal, and with the receiving thread's thread id. 

That way we can be sure that, yes, X send a signal A to Y, I am Y, I got signal A from X, so this is safe information I could now print. It would make investigating issues like the one motivating this change a lot easier, especially when multiple signals are involved.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26309#issuecomment-3088872911