RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v2]
David Holmes
dholmes at openjdk.java.net
Thu May 5 12:05:20 UTC 2022
On Thu, 5 May 2022 11:51:47 GMT, Jaroslav Bachorik <jbachorik at openjdk.org> wrote:
>> A gist of the fix is to allow relaxed special handling of code blob lookup when done for ASGCT.
>>
>> Currently, a guarantee will fail when we happen to hit a zombie method which is still on stack. While this would indicate a serious error for the normal execution flow, in case of ASGCT being in progress when the executing thread can be expected at any possible method this is something which may happen and we really should not take the profiled JVM down due to it.
>>
>> <hr>
>> Unfortunately, I am not able to create a simple reproducer for the crash other that testing in our production where the crash is happening sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be reproduced quite reliably.
>>
>> <br><br>
>>
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision:
>
> Make sure the code blob result check is correct
Changes requested by dholmes (Reviewer).
src/hotspot/share/code/codeCache.cpp line 657:
> 655: if (current_thread != NULL && current_thread->in_asgct()) {
> 656: // when in ASGCT handler things might get rough and not all guarantees are held
> 657: // if the resolved blob is already a zombie return NULL instead of crashing on guarantee
Suggestion:
// If called from ACGT the usual invariants may not apply so if we find
// a zombie method just return NULL.
src/hotspot/share/runtime/thread.hpp line 647:
> 645: #endif // __APPLE__ && AARCH64
> 646:
> 647: // support ASGCT
Nit: the abbreviation for AsyncGetCallTrace is AGCT not ASGCT
src/hotspot/share/runtime/thread.hpp line 649:
> 647: // support ASGCT
> 648: private:
> 649: bool _in_asgct;
The position of this field may be significant. See if there are gaps in the Thread structure which this bool might be able to fill.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8549
More information about the serviceability-dev
mailing list