RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor
Erik Österlund
eosterlund at openjdk.org
Wed Nov 27 09:59:41 UTC 2024
On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
> There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames.
>
> This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint.
>
> The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor.
>
> Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead.
>
> Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging.
>
> Testing: tier 1-4, 20k runs of the failing test on linux-aarch64.
The fix looks good to me.
-------------
Marked as reviewed by eosterlund (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/21742#pullrequestreview-2464531609
More information about the core-libs-dev
mailing list