RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching

Tobias Hartmann thartmann at openjdk.org
Mon Oct 7 16:22:05 UTC 2024


C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move,  and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way.

Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body:
https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185

While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary:
https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282

Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned.

I propose to simply align it in `PatchingStub::emit_code`.

The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well:
https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197

I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking.

AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms.

Thanks,
Tobias

-------------

Commit messages:
 - Increased timeout
 - Removed platform specific asserts from shared code
 - 8340313: Crash due to invalid oop in nmethod after C1 patching

Changes: https://git.openjdk.org/jdk/pull/21389/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8340313
  Stats: 152 lines in 3 files changed: 147 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/21389.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389

PR: https://git.openjdk.org/jdk/pull/21389


More information about the hotspot-compiler-dev mailing list