RFR(M): 8169373: Work around linux NPTL stack guard error.

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Tue Nov 22 10:55:58 UTC 2016


Hi Dan, 

I ran into a row of issues, some errors on the platforms.

What I meant with that comment is that
os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
                                      JavaThread::stack_guard_zone_size() +
                                      JavaThread::stack_shadow_zone_size() +
                                      (4*BytesPerWord COMPILER2_PRESENT(+2)) * 4 * K);
was executed, and min_stack_allowed used for all stacks. Now, compiler and 
vm minimum stack sizes are not increased by these sizes.
At least the _compiler_thread_min_stack_allowed should be increased similarly
by red/yellow zone size.  The compiler thread is a Java thread and must have space 
for these zones.

Also, the change added a test that is failing now.

Best regards,
  Goetz.

> -----Original Message-----
> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> Sent: Montag, 21. November 2016 17:28
> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
> 
> Sorry for the delayed responses to this thread. I've been on vacation...
> 
> One comment/query embedded below...
> 
> 
> On 11/10/16 8:40 PM, David Holmes wrote:
> > Hi Goetz,
> >
> > On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
> >> Hi David,
> >>
> >> This issue is different to 6675312, see also my comment in the bug.
> >>
> >> It appears running jtreg test runtime/Thread/TooSmallStackSize.java,
> >> with my patch below you can reproduce it on linuxx86_64. You can not
> >> do that with 6675312. Also, I would assume there are systems out there
> >> on x86 that uses 64-K pages, did you run the tests on these? I would
> >> assume you get hard crashes with stack overflows in the first C2
> >> compilation if there is only 64K usable CompilerThreadStack.
> >>
> >> My fix does not affect Java threads, which are the largest amount
> >> of threads used by the VM. It affects only the non-Java threads.
> >> It adds one page to these threads. The page does not require memory,
> >> as it's protected. The stack will only require more space if the thread
> >> ran into a stack overflow before the fix as else the pages are not
> >> mapped.
> >> This are stack overflows that cause hard crashes, at least on ppc the VM
> >> does not properly catch these stack overflows, so any setup working now
> >> will not run into the additional space. Altogether there should be no
> >> effect on running systems besides requiring one more entry in the
> >> page table per non-Java thread.
> >>
> >> The problem is caused by a rather recent change (8140520: segfault on
> >> solaris-amd64
> >> with "-XX:VMThreadStackSize=1" option) which was pushed after
> >> feature-close. As this was a rather recent change, it must be
> >> possible to
> >> fix this follow up issue. What else is this period in the project good
> >> for if not fixing issues?
> >
> > So I am seeing a number of factors here.
> >
> > First, 8140520, set:
> >
> > size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
> 
> So I'm confused by the above comment:
> 
>  > The problem is caused by a rather recent change (8140520: segfault
>  > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
> 
> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
> open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
> 
> shows this change:
> 
> @@ -531,19 +531,17 @@
>   }
> 
> 
> //////////////////////////////////////////////////////////////////////////////
> //
>   // thread stack
> 
> -size_t os::Linux::min_stack_allowed = 128*K;
> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128 * K;
> 
> so the existing single variable of 'min_stack_allowed' was
> replaced by three variables: _compiler_thread_min_stack_allowed,
> _java_thread_min_stack_allowed, and
> _vm_internal_thread_min_stack_allowed.
> 
> The old single variable and the three new variables are all
> initialized to the same value (128K) so the fix for 8140520
> did not change stack sizes for this platform. In fact, only
> one platform had a size change (Solaris X64).
> 
> So I'm confused about how the fix for 8140520 caused this problem.
> 
> Based on David's analysis below, it looks to me like this 64K stack
> guard page problem should also exist prior to the fix for 8140520.
> 
> Goetz, can you please explain how 8140520 caused this problem?
> 
> Dan
> 
> 
> 
> >
> > Second on linux PPC it is hardwired to use 2 guard pages:
> >
> >  return 2 * page_size();
> >
> > Third, you had a pagesize of 64K.
> >
> > Fourth, NPTL takes the guard space from the stack space - hence with 2
> > x 64K guard, and a 128K stack it was all consumed.
> >
> > ---
> >
> > In the proposed changes you now only use page_size() for the guard, so
> > that alone would have fixed the observed problem.
> >
> > But in addition you want to address the NPTL problem by adding back
> > the guard space to the stack size requested. That alone would also
> > have fixed the observed problem. :)
> >
> > But in addition you have increased the minimum stack size:
> >
> > ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
> >
> > which again, on its own would have fixed the original problem. :)
> >
> > Did you really intend to increase the real minimum stack from 128K to
> > 256K ? (on a 64K page system)
> >
> > ---
> >
> > Focusing simply on the shared code change to adjust the requested
> > stacksize by the amount of guard space (if any), this does not seem
> > unreasonable. As you note it is restricted to non-JavaThreads and only
> > adds a page to reserved stack space.
> >
> > My only query now is whether the minimum stacksize detection logic
> > will correctly report the real minimum stack size (taking into account
> > the need for the guard page) ?
> >
> > Thanks,
> > David
> >
> >> So I really think this issue should be fixed.
> >>
> >> Best regards,
> >>   Goetz.
> >>
> >>
> >>> -----Original Message-----
> >>> From: David Holmes [mailto:david.holmes at oracle.com]
> >>> Sent: Thursday, November 10, 2016 10:02 PM
> >>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> runtime-
> >>> dev at openjdk.java.net
> >>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
> >>>
> >>> Hi Goetz,
> >>>
> >>> As per the bug report, this issue was already known (6675312) and we
> >>> chose not to try and address it due to no reported issues at the time.
> >>> While I see that you have encountered an issue (is it real or
> >>> fabricated?) I think this change is too intrusive to be applied at this
> >>> stage of the JDK 9 release cycle, as it will change the stack
> >>> requirements of every application running on Linux.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
> >>>> Hi,
> >>>>
> >>>>
> >>>>
> >>>> Please review this change. I please need a sponsor:
> >>>>
> >>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> stackFix/webrev.01/
> >>>>
> >>>>
> >>>>
> >>>> In the Linux NPTL pthread implementation the guard size mechanism
> >>>> is not
> >>>> implemented properly. The posix standard requires to add the size
> >>>> of the
> >>>> guard pages to the stack size, instead Linux takes the space out of
> >>>> 'stacksize'.
> >>>>
> >>>> The Posix standard
> http://pubs.opengroup.org/onlinepubs/9699919799/
> >>>>  says "the implementation allocates extra memory at the overflow
> >>>> end of
> >>>> the stack". The linux man page
> >>>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As at
> >>>> glibc
> >>>> 2.8, the NPTL threading implementation includes the guard area within
> >>>> the stack size allocation, rather than allocating extra space at
> >>>> the end
> >>>> of the stack, as POSIX.1 requires".
> >>>>
> >>>> I encounter this problem in runtime/Thread/TooSmallStackSize.java
> >>>> on ppc
> >>>> with 64K pages. _compiler_thread_min_stack_allowed is 128K on ppc,
> and
> >>>> ppc specifies two OS guard pages. The VM crashes in pthread creation
> >>>> because there is no usable space in the thread stack after allocating
> >>>> the guard pages.
> >>>>
> >>>> But TooSmallStackSize.java requires that the VM comes up with the
> >>>> stack
> >>>> size mentioned in the error message.
> >>>>
> >>>> This fix adapts the requested stack size on Linux by the size of the
> >>>> guard pages to mimick proper behaviour, see change to os_linux.cpp.
> >>>>
> >>>>
> >>>>
> >>>> The change also streamlines usage of stack_guard_page on linuxppc,
> >>>> linuxppcle, aixppc and linuxs390.
> >>>>
> >>>>
> >>>>
> >>>> To reproduce the error on linux_x86_64, apply below patch and call the
> >>>> VM with -XX:CompilerThreadStackSize=64.
> >>>>
> >>>>
> >>>>
> >>>> I'm still exploring why I had to choose such big compiler stacks on
> >>>> ppc
> >>>> to get -version passing, but I wanted to send the RFR now as people
> >>>> obviously looked at the bug I opened (Thanks David!).
> >>>>
> >>>>
> >>>>
> >>>> Best regards,
> >>>>
> >>>>   Goetz.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
> >>>>
> >>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Mon Nov 07 12:37:28
> >>> 2016
> >>>> +0100
> >>>>
> >>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Thu Nov 10
> 16:52:17
> >>> 2016
> >>>> +0100
> >>>>
> >>>> @@ -701,7 +701,7 @@
> >>>>
> >>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
> >>>>
> >>>>    // Creating guard page is very expensive. Java thread has HotSpot
> >>>>
> >>>>    // guard page, only enable glibc guard page for non-Java threads.
> >>>>
> >>>> -  return (thr_type == java_thread ? 0 : page_size());
> >>>>
> >>>> +  return (thr_type == java_thread ? 0 : 64*K);
> >>>>
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> // Java thread:
> >>>>



More information about the hotspot-runtime-dev mailing list