RFR: 8371046: Segfault in compiler/whitebox/StressNMethodRelocation.java with -XX:+UseZGC
Chad Rakoczy
duke at openjdk.org
Fri Nov 21 22:25:55 UTC 2025
On Fri, 21 Nov 2025 20:43:33 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> [JDK-8371046](https://bugs.openjdk.org/browse/JDK-8371046)
>>
>> This pull request fixes two crashes (see below) and adds `InvalidationReason::RELOCATED` to better describe why an nmethod is marked not entrant during relocation.
>>
>> ---
>>
>> #### 1. Test Bug
>>
>> It’s possible for an `nmethod` to be unloaded without its `_state` being explicitly set to `not_entrant`. Checking only `is_in_use()` isn’t sufficient, since the `nmethod` may already be in the process of unloading and therefore may not have a lock (as with ZGC, where `nmethods` are locked individually).
>>
>> The fix adds an additional `is_unloading()` check in WhiteBox before acquiring the lock.
>>
>> This issue was reproducible fairly consistently (every few runs) by executing `compiler/whitebox/StressNMethodRelocation.java` with `-XX:+UseZGC -XX:ReservedCodeCacheSize=32m`
>>
>>
>> After applying this patch, the original crash stopped occurring, though a more infrequent crash was still observed.
>>
>> ---
>>
>> #### 2. Implementation Bug
>>
>> `nmethod::relocate` works by copying the instructions of an `nmethod` and then adjusting the call sites to account for new PC-relative offsets.
>>
>> Previously, this fix-up happened *after* calling `post_init()`, which registers the `nmethod` and makes it visible to the GC. This introduced a race condition where the GC might attempt to resolve a call site before it had been fixed.
>>
>> The fix ensures that all call sites are patched **before** the `nmethod` is registered.
>>
>> In testing, the crash previously occurred roughly 60 times in 5,000 runs (~1.2%). With this patch, no crashes were observed in the same number of runs.
>
> src/hotspot/share/code/nmethod.cpp line 1508:
>
>> 1506: #ifdef USE_TRAMPOLINE_STUB_FIX_OWNER
>> 1507: // Direct calls may no longer be in range and the use of a trampoline may now be required.
>> 1508: // Instead, allow trampoline relocations to update their owners and perform the necessary checks.
>
> `Instead` is wrong word here I think. May be `Otherwise`.
>
> Also where you add trampoline in new nmethod's copy if needed? I don't see it in `fix_relocation_after_move()`.
We do not add trampolines to the new nmethod if they were not present in the original.
Does this comment better describe the need to do this?
// A direct call whose destination was within the maximum branch range may now
// be out of range after the nmethod is moved.
//
// CallRelocation::fix_relocation_after_move() does not perform range checks and
// assumes that the call target is always directly reachable. If we were to call
// it unconditionally, it could incorrectly rewrite a call site whose target now
// requires a trampoline, leaving the call out of range.
//
// When a call site has an associated trampoline, we skip the normal call
// relocation here. The corresponding trampoline_stub_Relocation will handle both
// the call site and the trampoline, including performing the required range
// checks and updating the call to branch through the trampoline if required.
//
// If no trampoline exists for the call, we know the target remains within the
// direct-branch range and CallRelocation::fix_relocation_after_move() is safe.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28241#discussion_r2551119626
More information about the graal-dev
mailing list