RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching
Tobias Hartmann
thartmann at openjdk.org
Tue Oct 8 14:34:03 UTC 2024
On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way.
>
> Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body:
> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185
>
> While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary:
> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282
>
> Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned.
>
> I propose to simply align it in `PatchingStub::emit_code`.
>
> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well:
> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197
>
> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking.
>
> AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms.
>
> Thanks,
> Tobias
Thanks for looking at this, Vladimir and Dean!
> Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod.
Yes, that would be an alternative solution. I went with the alignment because I thought it has the least impact. I'll ping the GC team, maybe they want to have a say in this.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2400023164
More information about the hotspot-compiler-dev
mailing list