RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor
Jorn Vernee
jvernee at openjdk.org
Wed Nov 20 22:53:15 UTC 2024
On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
> There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames.
>
> This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint.
>
> The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor.
>
> Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead.
>
> Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging.
>
> Testing: tier 1-4, 20k runs of the failing test on linux-aarch64.
> > I wonder if we can assert we are in a safepoint-safe state when doing so?
>
> I think we can do this. I've prototyped this here: [pr/21742...JornVernee:jdk:SafeFrameAnchor+assert](https://github.com/openjdk/jdk/compare/pr/21742...JornVernee:jdk:SafeFrameAnchor+assert)
>
> This catches the issue fixed by this patch, and it passes at least tier 1. We'd need something similar in assembly where we touch the frame anchor, is `MacroAssembler::set_last_Java_frame` and `MacroAssembler::reset_last_Java_frame`.
Thinking some more about this: there might be other instances of `JavaFrameAnchor` that are fine to touch when the thread is in the native state. It's just the frame anchor inside a `JavaThread` that can not be touched if that thread is in a certain state. It might be possible to encapsulate the `JavaFrameAnchor` instance inside the thread, and then guard any accesses to it. But, that seems like a much more invasive change, so I'll hold off on that and focus this PR on fixing the issue.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21742#issuecomment-2489692928
More information about the core-libs-dev
mailing list