RFR: 8373128: Stack overflow handling for native stack overflows
David Holmes
dholmes at openjdk.org
Mon Mar 2 02:10:39 UTC 2026
On Wed, 4 Feb 2026 07:19:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> Still Draft, pls ignore for now. Patch is not done yet.
>
> This patch enables hs-err file generation for native out-of-stack cases. It is an optional analysis feature one can use when JVMs mysteriously vanish - typically, vanishing JVMs are either native stack overflows or OOM kills.
>
> This was motivated by the analysis difficulties of bugs like https://bugs.openjdk.org/browse/JDK-8371630. There are many more examples.
>
> ### Motivation
>
> Today, when native stack overflows, the JVM dies immediately without an hs-err file. This is because C++-compiled code does not bang - if the stack is too small, we walk right into whatever caps the stack. That might be our own yellow/red guard pages, native guard pages placed by libc or kernel, or possibly unmapped area after the end of the stack.
>
> Since we don't have a stack left to run the signal handler on, we cannot produce the hs-err file. If one is very lucky, the libc writes a short "Stack overflow" to stderr. But usually not: if it is a JavaThread and we run into our own yellow/red pages, it counts as a simple segmentation fault from the OS's point of view, since the fault address is inside of what it thinks is a valid pthread stack. So, typically, you just see "Segmentation fault" on stderr.
>
> ***Why do we need this patch? Don't we bang enough space for native code we call?***
>
> We bang when entering a native function from Java. The maximum stack size we assume at that time might not be enough; moreover, the native code may be buggy or just too deeply or infinitely recursive.
>
> ***We could just increase `ShadowPages`, right?***
>
> Sure, but the point is we have no hs-err file, so we don't even know it was a stack overflow. One would have to start debugging, which is work-intensive and may not even be possible in a customer scenario. And for buggy recursive code, any `ShadowPages` value might be too small. The code would need to be fixed.
>
> ### Implementation
>
> The patch uses alternative signal stacks. That is a simple, robust solution with few moving parts. It works out of the box for all cases:
> - Stack overflows inside native JNI code from Java
> - Stack overflows inside Hotspot-internal JavaThread children (e.g. CompilerThread, AttachListenerThread etc)
> - Stack overflows in non-Java threads (e.g. VMThread, ConcurrentGCThread)
> - Stack overflows in outside threads that are attached to the JVM, e.g. third-party JVMTI threads
>
> The drawback of this simplicity is that it is not suitable for always-on production use. That is du...
Hi Thomas,
My main concern with this is that it is so hard to test and we will never know to what extent it is getting used. It's usefulness depends entirely on support organizations knowing about it and telling customers to enable this (in production - which they might balk at) to try and better diagnose mystery crashes.
I will run it through our testing - enabled by default - see if anything crops up - while I continue to think about it.
A few minor nits/comments below.
Thanks
src/hotspot/os/posix/threadAltSigStack_posix.cpp line 115:
> 113:
> 114: if (success) {
> 115: step ++;
Suggestion:
step++;
src/hotspot/os/posix/threadAltSigStack_posix.cpp line 120:
> 118:
> 119: if (success) {
> 120: step ++;
Suggestion:
step++;
src/hotspot/os/posix/threadAltSigStack_posix.cpp line 140:
> 138: sigaltstack_and_log(&ss, &oss);
> 139:
> 140: // --- From here on, if we receive a signal, we'll run on the alternative stack ----
Only for SIGSEGV/BUS right?
src/hotspot/os/posix/threadAltSigStack_posix.cpp line 155:
> 153:
> 154: assert(this == Thread::current_or_null_safe(), "Only for current thread");
> 155: assert(_altsigstack != nullptr, "Not enabled?");
This is redundant given you checked for null above.
src/hotspot/os/posix/threadAltSigStack_posix.cpp line 170:
> 168:
> 169: assert(oss.ss_sp == _altsigstack, "Different stack? " PTR_FORMAT " vs " PTR_FORMAT, p2i(oss.ss_sp), p2i(_altsigstack));
> 170: assert(oss.ss_size == stacksize, "Different size?");
Please report the two sizes
src/hotspot/share/code/nmethod.cpp line 947:
> 945: // Buffering to a stringStream, disable internal buffering so it's not done twice.
> 946: method()->print_codes_on(&ss, 0, false);
> 947: }
So we have lost some debugging information whenever alt-stacks are enabled. ??
src/hotspot/share/nmt/memTag.hpp line 62:
> 60: f(mtObjectMonitor, "Object Monitors") \
> 61: f(mtJNI, "JNI") \
> 62: f(mtAltStack, "Alternate Stacks") \
Suggestion:
f(mtAltStack, "Alternate Signal Stacks") \
src/hotspot/share/runtime/globals.hpp line 2011:
> 2009: \
> 2010: product(bool, UseAltSigStacks, false, DIAGNOSTIC, \
> 2011: "Enable the use of alternative signal stacks.") \
Should state it has no affect on Windows (or non-Posix).
src/hotspot/share/runtime/stackOverflow.hpp line 2:
> 1: /*
> 2: * Copyright (c) 2020, 2026, Oracle and/or its affiliates. All rights reserved.
There are no changes in this file.
test/hotspot/jtreg/gtest/NativeStackOverflowGtest.java line 2:
> 1: /*
> 2: * Copyright (c) 2025, 2026, Oracle and/or its affiliates. All rights reserved.
Suggestion:
* Copyright (c) 2026, Oracle and/or its affiliates. All rights reserved.
-------------
PR Review: https://git.openjdk.org/jdk/pull/29559#pullrequestreview-3873717807
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870155311
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870155758
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870159957
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870162567
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870167765
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870172171
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870172810
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870175316
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870176469
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2870184900
More information about the hotspot-dev
mailing list