RFR: 8303612: runtime/StackGuardPages/TestStackGuardPagesNative.java fails with exit code 139
Johan Sjölen
jsjolen at openjdk.org
Mon Aug 25 18:17:38 UTC 2025
On Wed, 9 Jul 2025 01:41:17 GMT, mazhen <duke at openjdk.org> wrote:
>> This pull request addresses an issue in `runtime/StackGuardPages/TestStackGuardPagesNative` where the native test component (`exeinvoke.c`) exhibited platform-dependent behavior and did not fully align with the intended test objectives for verifying stack guard page removal on thread detachment.
>>
>> **Summary of the Problem:**
>>
>> The `test_native_overflow_initial` scenario within `TestStackGuardPagesNative` showed inconsistent results:
>> * On certain Linux distributions (e.g., CentOS 7), the test would hang and eventually time out during its second phase of stack allocation.
>> * On other distributions (e.g., Ubuntu 24), the test would pass, but this pass was found to be coincidental, relying on an unintended `SEGV_MAPERR` to terminate a loop that should have had a defined exit condition.
>>
>> The core issue was that the native code's second stack overflow attempt, designed to check for guard page removal, used an unbounded loop. Its termination (and thus the test's outcome) depended on platform-specific OS behavior regarding extensive stack allocation after guard pages are supposedly modified.
>>
>> **Test Objective Analysis:**
>>
>> The primary goal of `TestStackGuardPagesNative`, particularly for the initial thread (`test_native_overflow_initial`), is to:
>> 1. **Verify Guard Page Presence:** Confirm that when a native thread is attached to the JVM, a deliberate stack overflow triggers a `SIGSEGV` with `si_code = SEGV_ACCERR`, indicating an active stack guard page.
>> 2. **Verify Guard Page Removal/Modification:** After the thread detaches from the JVM via `DetachCurrentThread()`, confirm that the previously active stack guard page is no longer enforcing the same strict protection. This is ideally demonstrated by successfully allocating stack space up to the depth that previously caused the `SEGV_ACCERR`, **without encountering any signal**.
>>
>> **How the Original Implementation Deviated from the Test Intent:**
>>
>> The native `do_overflow` function, when invoked for the second phase (to check guard page removal), implemented an unconditional `for(;;)` loop.
>> * **Intended Logic vs. Actual Behavior:** The test intended for this second phase to demonstrate that allocations up to the prior failure depth are now "clean" (no `SEGV_ACCERR`). However, the unbounded loop meant:
>> * On systems like CentOS 7, where deep stack allocation without an immediate `SEGV_MAPERR` was possible, this loop ran for an excessive duration, leading to a hang.
>> ...
>
> Hi @jdksjolen ,
>
> Following up on this. As per the bot's message two weeks ago, I've been waiting for my OCA to be processed, but it seems to be stuck.
> I understand that PRs are not reviewed until the OCA is cleared, but since the suggested two-week waiting period has passed, I was hoping someone could help to check or escalate the status of my OCA application internally.
>
> My Oracle Account Email: mz1999 at gmail.com
>
> Any help would be greatly appreciated. Thank you!
Ping @mz1999
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25689#issuecomment-3221274470
More information about the hotspot-runtime-dev
mailing list