RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v3]
Thomas Stuefe
stuefe at openjdk.java.net
Wed May 4 10:56:31 UTC 2022
On Tue, 3 May 2022 12:42:19 GMT, Jaroslav Bachorik <jbachorik at openjdk.org> wrote:
>> A gist of the fix is to allow relaxed instantiation of a frame when done from a signal handler - eg. for profiling purposes.
>>
>> Currently, a frame instantiation will fail on guarantee when we happen to hit a zombie method which is still on stack. While this would indicate a serious error for the normal execution flow, in case of profiling where the executing thread can be expected at any possible method this is something which may happen and we really should not take the profiled JVM down due to it.
>>
>> The behaviour defaults to checking the code blob status in the guarantee so nothing will change for the rest of the callers - just ASGCT will be affected.
>>
>> <hr>
>> Unfortunately, I am not able to create a simple reproducer for the crash other that testing in our production where the crash is happening sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be reproduced quite reliably.
>
> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision:
>
> Extract common method for finding code blob in a signal handler
Hi,
sorry for the late chiming in.
Could we find a better way to do this that to funnel "is_signal_handler" thru every layer? How about a thread-local state instead? E.g. "bool Thread::_unsafe_code_cache_lookup"? Or, for that matter, "bool Thread::in_agct"? Could be set and restored at the beginning resp. end of AGCT.
I dislike the name "in_signal_handler" since the fact that we are inside signal handling is incidental. IIUC this is probably much more often a thread-safety issue than a reentrance issue, right? You would have the same problem if you analyzed the code cache without getting the code cache lock, while it is concurrently modified?
So if you want to keep the argument, how about renaming it to "unsafe", or "unsafe_access", or "unsafe_codecache_lookup"? We use "unsafe" in other places as well.
Cheers, Thomas
-------------
PR: https://git.openjdk.java.net/jdk/pull/8061
More information about the hotspot-dev
mailing list