Backporting stack guard fixes from JDK-9 (8169373+8159335+8139864)

Jan Kratochvil (Azul) jkratochvil at azul.com
Wed Dec 6 14:33:00 UTC 2023


Hello,

I got a crash report for an OpenJDK-8 derivation (Azul company's build Zulu-8)
on aarch64. Details are at the end. I believe it could be fixed by some
backports below but the resulting patch gets too big (>10k LoC = Lines of
Code).

Do you have an idea how to make the backport feasible for JDK-8?

JDK-8169373: Work around linux NPTL stack guard error
 13 files changed, 165 insertions(+), 425 deletions(-)

Unfortunately this patch depends on a lot of other code from JDK-9 and
I believe these fixes are related anyway so they also need a backport:

JDK-8159335: Fix problems with stack overflow handling
 33 files changed, 183 insertions(+), 236 deletions(-)
JDK-8139864: Improve handling of stack protection zones
 43 files changed, 312 insertions(+), 226 deletions(-)

To satisfy dependencies of the patches above I had to backport also:
    8078513: [linux] Clean up code relevant to LinuxThreads implementation
    8080298: Clean up os::...::supports_variable_stack_size()
    8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop
    8059847: complement JDK-8055286 and JDK-8056964 changes
    8059606: Enable per-method usage of CompileThresholdScaling (per-method compilation thresholds)
    8074119: [AARCH64] stage repo misses fixes from several Hotspot changes
    8013393: Merge template interpreter files for x86 _32 and _64
    8122937: [JEP 245] Validate JVM Command-Line Flag Arguments
    8078556: Runtime: implement ranges (optionally constraints) for those flags that have them missing
    8048241: Introduce umbrella header os.inline.hpp and clean up includes
^^^ 8139864: Improve handling of stack protection zones
^^^ 8159335: Fix problems with stack overflow handling
    8140520: segfault on solaris-amd64 with "-XX:VMThreadStackSize=1" option
^^^ 8169373: Work around linux NPTL stack guard error
    8049325: Introduce and clean up umbrella headers for the files in the cpu subdirectories
    8064611: AARCH64: Changes to HotSpot shared code
    8160189: Fix for 8159335 breaks AArch64
    8130858: CICompilerCount=1 when tiered is off is not allowed any more
    8072931: JEP-JDK-8059557: Test task: test framework development
 228 files changed, 13311 insertions(+), 8831 deletions(-)

Which is just a too big patch for a backport into JDK-8. There are some
possibilities for minor reduction of the whole patch but it will be still
around 10k LoC. None of the patches above have been backported to JDK-8.

Unfortunately due to time constraints I do not yet have confirmed these
backports really fix this crash. It needs to be tested at a customer as I do
not have a reproducer (and it is even difficult to reproduce on the target
system).

I can publish the whole patchset above but I would need to rebase it first
from the Zulu-8 derivation to plain OpenJDK-8.


Thanks for your opinion,
Jan Kratochvil

------------------------------------------------------------------------------

The crashing memory access of __resp is in TLS (thread-local storage) which
points to an unmapped memory during very early thread startup still in glibc:

#1  <signal handler called>
#2  start_thread (arg=0x7ef47cd160) at /usr/src/debug/glibc/2.23-r0/git/nptl/pthread_create.c:265
265      __resp = &pd->res;
=> 0x0000007f836e7f1c <+36>:    str    x0, [x2, x1]

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x10e75000 0x0000007ef45ce000 0x0000000000000000 0x1ff000 0x1ff000 RW  0x1000
                                  0x7ef47cd000=end
                                    7ef47cd850=accessed memory __resp
  LOAD           0x11074000 0x0000007ef47ce000 0x0000000000000000 0x003000 0x003000     0x1000

A different thread was creating the thread above:

[Current thread is 13 (LWP 834)]
#5  0x0000007f811b83a4 in os::create_thread (thread=0x7f20010420, thr_type=<optimized out>, stack_size=<optimized out>)
    at zulu8-arm64-dev/hotspot/src/os/linux/vm/os_linux.cpp:939
        tid = 545267700064 = 0x7ef47cd160

Someone did unmap the page where TLS of the new thread is being located before
the thread really started.

Given there is always gap 0x3000 between the mappings - it should be the guard
pages.

  Type           Offset     VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x10e75000 0x0000007ef45ce000 0x0000000000000000 0x1ff000 0x1ff000 RW  0x1000
                                  0x7ef47cb6c0=$sp of Thread 1 (LWP 1358)
                                  0x7ef47cd000=mapped area end
                                  0x7ef47cd160=thread tid
                                  0x7ef47cd850=accessed memory __resp
  LOAD           0x11074000 0x0000007ef47ce000 0x0000000000000000 0x003000 0x003000     0x1000
  LOAD           0x11077000 0x0000007ef47d1000 0x0000000000000000 0x1fd000 0x1fd000 RW  0x1000
                                  0x7ef49cc240=$sp of Thread 37 (LWP 1349)
  LOAD           0x11274000 0x0000007ef49ce000 0x0000000000000000 0x003000 0x003000     0x1000
  LOAD           0x11277000 0x0000007ef49d1000 0x0000000000000000 0x1fd000 0x1fd000 RW  0x1000
                                  0x7ef4bc8fa0=$sp of Thread 36 (LWP 1290)
  LOAD           0x11474000 0x0000007ef4bce000 0x0000000000000000 0x000000 0x003000     0x1000


More information about the jdk8u-dev mailing list