RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2]

Tobias Hartmann thartmann at openjdk.org
Tue Oct 15 09:17:50 UTC 2024


> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move,  and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way.
> 
> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body:
> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185
> 
> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary:
> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282
> 
> In short:
> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops.
> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic.
> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently.
> 
> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned.
> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity.
> 
> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well:
> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197
> 
> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo...

Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Extend the Patching_lock instead
 - Merge branch 'master' into 8340313
 - Extending patching lock
 - Increased timeout
 - Removed platform specific asserts from shared code
 - 8340313: Crash due to invalid oop in nmethod after C1 patching

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21389/files
  - new: https://git.openjdk.org/jdk/pull/21389/files/050e2c8f..ec5d105b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=00-01

  Stats: 16085 lines in 245 files changed: 13548 ins; 1161 del; 1376 mod
  Patch: https://git.openjdk.org/jdk/pull/21389.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389

PR: https://git.openjdk.org/jdk/pull/21389


More information about the hotspot-compiler-dev mailing list