Backporting stack guard fixes from JDK-9 (8169373+8159335+8139864)
Jan Kratochvil (Azul)
jkratochvil at azul.com
Wed Dec 6 14:33:00 UTC 2023
Hello,
I got a crash report for an OpenJDK-8 derivation (Azul company's build Zulu-8)
on aarch64. Details are at the end. I believe it could be fixed by some
backports below but the resulting patch gets too big (>10k LoC = Lines of
Code).
Do you have an idea how to make the backport feasible for JDK-8?
JDK-8169373: Work around linux NPTL stack guard error
13 files changed, 165 insertions(+), 425 deletions(-)
Unfortunately this patch depends on a lot of other code from JDK-9 and
I believe these fixes are related anyway so they also need a backport:
JDK-8159335: Fix problems with stack overflow handling
33 files changed, 183 insertions(+), 236 deletions(-)
JDK-8139864: Improve handling of stack protection zones
43 files changed, 312 insertions(+), 226 deletions(-)
To satisfy dependencies of the patches above I had to backport also:
8078513: [linux] Clean up code relevant to LinuxThreads implementation
8080298: Clean up os::...::supports_variable_stack_size()
8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop
8059847: complement JDK-8055286 and JDK-8056964 changes
8059606: Enable per-method usage of CompileThresholdScaling (per-method compilation thresholds)
8074119: [AARCH64] stage repo misses fixes from several Hotspot changes
8013393: Merge template interpreter files for x86 _32 and _64
8122937: [JEP 245] Validate JVM Command-Line Flag Arguments
8078556: Runtime: implement ranges (optionally constraints) for those flags that have them missing
8048241: Introduce umbrella header os.inline.hpp and clean up includes
^^^ 8139864: Improve handling of stack protection zones
^^^ 8159335: Fix problems with stack overflow handling
8140520: segfault on solaris-amd64 with "-XX:VMThreadStackSize=1" option
^^^ 8169373: Work around linux NPTL stack guard error
8049325: Introduce and clean up umbrella headers for the files in the cpu subdirectories
8064611: AARCH64: Changes to HotSpot shared code
8160189: Fix for 8159335 breaks AArch64
8130858: CICompilerCount=1 when tiered is off is not allowed any more
8072931: JEP-JDK-8059557: Test task: test framework development
228 files changed, 13311 insertions(+), 8831 deletions(-)
Which is just a too big patch for a backport into JDK-8. There are some
possibilities for minor reduction of the whole patch but it will be still
around 10k LoC. None of the patches above have been backported to JDK-8.
Unfortunately due to time constraints I do not yet have confirmed these
backports really fix this crash. It needs to be tested at a customer as I do
not have a reproducer (and it is even difficult to reproduce on the target
system).
I can publish the whole patchset above but I would need to rebase it first
from the Zulu-8 derivation to plain OpenJDK-8.
Thanks for your opinion,
Jan Kratochvil
------------------------------------------------------------------------------
The crashing memory access of __resp is in TLS (thread-local storage) which
points to an unmapped memory during very early thread startup still in glibc:
#1 <signal handler called>
#2 start_thread (arg=0x7ef47cd160) at /usr/src/debug/glibc/2.23-r0/git/nptl/pthread_create.c:265
265 __resp = &pd->res;
=> 0x0000007f836e7f1c <+36>: str x0, [x2, x1]
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x10e75000 0x0000007ef45ce000 0x0000000000000000 0x1ff000 0x1ff000 RW 0x1000
0x7ef47cd000=end
7ef47cd850=accessed memory __resp
LOAD 0x11074000 0x0000007ef47ce000 0x0000000000000000 0x003000 0x003000 0x1000
A different thread was creating the thread above:
[Current thread is 13 (LWP 834)]
#5 0x0000007f811b83a4 in os::create_thread (thread=0x7f20010420, thr_type=<optimized out>, stack_size=<optimized out>)
at zulu8-arm64-dev/hotspot/src/os/linux/vm/os_linux.cpp:939
tid = 545267700064 = 0x7ef47cd160
Someone did unmap the page where TLS of the new thread is being located before
the thread really started.
Given there is always gap 0x3000 between the mappings - it should be the guard
pages.
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x10e75000 0x0000007ef45ce000 0x0000000000000000 0x1ff000 0x1ff000 RW 0x1000
0x7ef47cb6c0=$sp of Thread 1 (LWP 1358)
0x7ef47cd000=mapped area end
0x7ef47cd160=thread tid
0x7ef47cd850=accessed memory __resp
LOAD 0x11074000 0x0000007ef47ce000 0x0000000000000000 0x003000 0x003000 0x1000
LOAD 0x11077000 0x0000007ef47d1000 0x0000000000000000 0x1fd000 0x1fd000 RW 0x1000
0x7ef49cc240=$sp of Thread 37 (LWP 1349)
LOAD 0x11274000 0x0000007ef49ce000 0x0000000000000000 0x003000 0x003000 0x1000
LOAD 0x11277000 0x0000007ef49d1000 0x0000000000000000 0x1fd000 0x1fd000 RW 0x1000
0x7ef4bc8fa0=$sp of Thread 36 (LWP 1290)
LOAD 0x11474000 0x0000007ef4bce000 0x0000000000000000 0x000000 0x003000 0x1000
More information about the jdk8u-dev
mailing list