RFR(M): 8169373: Work around linux NPTL stack guard error.
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Fri Nov 11 08:12:55 UTC 2016
Hi David,
thanks for having a closer look!
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Freitag, 11. November 2016 04:41
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>
> Hi Goetz,
>
> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
> > Hi David,
> >
> > This issue is different to 6675312, see also my comment in the bug.
> >
> > It appears running jtreg test runtime/Thread/TooSmallStackSize.java,
> > with my patch below you can reproduce it on linuxx86_64. You can not
> > do that with 6675312. Also, I would assume there are systems out there
> > on x86 that uses 64-K pages, did you run the tests on these? I would
> > assume you get hard crashes with stack overflows in the first C2
> > compilation if there is only 64K usable CompilerThreadStack.
> >
> > My fix does not affect Java threads, which are the largest amount
> > of threads used by the VM. It affects only the non-Java threads.
> > It adds one page to these threads. The page does not require memory,
> > as it's protected. The stack will only require more space if the thread
> > ran into a stack overflow before the fix as else the pages are not mapped.
> > This are stack overflows that cause hard crashes, at least on ppc the VM
> > does not properly catch these stack overflows, so any setup working now
> > will not run into the additional space. Altogether there should be no
> > effect on running systems besides requiring one more entry in the
> > page table per non-Java thread.
> >
> > The problem is caused by a rather recent change (8140520: segfault on
> solaris-amd64
> > with "-XX:VMThreadStackSize=1" option) which was pushed after
> > feature-close. As this was a rather recent change, it must be possible to
> > fix this follow up issue. What else is this period in the project good
> > for if not fixing issues?
>
> So I am seeing a number of factors here.
>
> First, 8140520, set:
>
> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>
> Second on linux PPC it is hardwired to use 2 guard pages:
>
> return 2 * page_size();
>
> Third, you had a pagesize of 64K.
>
> Fourth, NPTL takes the guard space from the stack space - hence with 2 x
> 64K guard, and a 128K stack it was all consumed.
Yes.
>
> ---
>
> In the proposed changes you now only use page_size() for the guard, so
> that alone would have fixed the observed problem.
2*page_size() is a setting we inherited from our ia64 port, from where we branched
the original ppc port somewhere around 2006/7. I took the chance to adapt
this.
> But in addition you want to address the NPTL problem by adding back the
> guard space to the stack size requested. That alone would also have
> fixed the observed problem. :)
Yes, this is the real problem. _compiler_thread_min_stack_allowed & friends
are minimal values required to start up the VM. If you reduce them
slightly, the VM will run into stack overflow. I would assume they are tuned
on linuxx86_64 for 4K page size. As the value there is 64K, the VM probably
utilizes around 50K in a simple -version run. Thus, if you start the VM
on a system with 32K page size, there is only 32K usable space on the stack,
and -version will crash (I verified this by setting
_compiler_thread_min_stack_allowed = 32K on linuxx86_64).
The lower bound is pointless if it is that much dependent on the system.
It should reflect the size of the frames of the binary, which are basically
the same independent of page size.
> But in addition you have increased the minimum stack size:
>
> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
I had to do so because I ran into stack overflow of the compiler thread
starting with 128K after fixing the other issues. I'm looking for the cause
of this. I thought we have more aggressive compiler flag settings, but
these flags look quite reasonable so far.
> which again, on its own would have fixed the original problem. :)
Well, I would have needed even more ...
>
> Did you really intend to increase the real minimum stack from 128K to
> 256K ? (on a 64K page system)
>
> ---
>
> Focusing simply on the shared code change to adjust the requested
> stacksize by the amount of guard space (if any), this does not seem
> unreasonable. As you note it is restricted to non-JavaThreads and only
> adds a page to reserved stack space.
>
> My only query now is whether the minimum stacksize detection logic will
> correctly report the real minimum stack size (taking into account the
> need for the guard page) ?
It will report the proper value that the user must set on the command line
(i.e., the test mentioned above passes). This value does not include the
size required for the pthread guard page.
> Thanks,
> David
>
> > So I really think this issue should be fixed.
> >
> > Best regards,
> > Goetz.
> >
> >
> >> -----Original Message-----
> >> From: David Holmes [mailto:david.holmes at oracle.com]
> >> Sent: Thursday, November 10, 2016 10:02 PM
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
> >>
> >> Hi Goetz,
> >>
> >> As per the bug report, this issue was already known (6675312) and we
> >> chose not to try and address it due to no reported issues at the time.
> >> While I see that you have encountered an issue (is it real or
> >> fabricated?) I think this change is too intrusive to be applied at this
> >> stage of the JDK 9 release cycle, as it will change the stack
> >> requirements of every application running on Linux.
> >>
> >> Thanks,
> >> David
> >>
> >> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
> >>> Hi,
> >>>
> >>>
> >>>
> >>> Please review this change. I please need a sponsor:
> >>>
> >>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> stackFix/webrev.01/
> >>>
> >>>
> >>>
> >>> In the Linux NPTL pthread implementation the guard size mechanism is
> not
> >>> implemented properly. The posix standard requires to add the size of
> the
> >>> guard pages to the stack size, instead Linux takes the space out of
> >>> 'stacksize'.
> >>>
> >>> The Posix standard http://pubs.opengroup.org/onlinepubs/9699919799/
> >>> says "the implementation allocates extra memory at the overflow end
> of
> >>> the stack". The linux man page
> >>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As at glibc
> >>> 2.8, the NPTL threading implementation includes the guard area within
> >>> the stack size allocation, rather than allocating extra space at the end
> >>> of the stack, as POSIX.1 requires".
> >>>
> >>> I encounter this problem in runtime/Thread/TooSmallStackSize.java on
> ppc
> >>> with 64K pages. _compiler_thread_min_stack_allowed is 128K on ppc,
> and
> >>> ppc specifies two OS guard pages. The VM crashes in pthread creation
> >>> because there is no usable space in the thread stack after allocating
> >>> the guard pages.
> >>>
> >>> But TooSmallStackSize.java requires that the VM comes up with the
> stack
> >>> size mentioned in the error message.
> >>>
> >>> This fix adapts the requested stack size on Linux by the size of the
> >>> guard pages to mimick proper behaviour, see change to os_linux.cpp.
> >>>
> >>>
> >>>
> >>> The change also streamlines usage of stack_guard_page on linuxppc,
> >>> linuxppcle, aixppc and linuxs390.
> >>>
> >>>
> >>>
> >>> To reproduce the error on linux_x86_64, apply below patch and call the
> >>> VM with -XX:CompilerThreadStackSize=64.
> >>>
> >>>
> >>>
> >>> I'm still exploring why I had to choose such big compiler stacks on ppc
> >>> to get -version passing, but I wanted to send the RFR now as people
> >>> obviously looked at the bug I opened (Thanks David!).
> >>>
> >>>
> >>>
> >>> Best regards,
> >>>
> >>> Goetz.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
> >>>
> >>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Mon Nov 07 12:37:28
> >> 2016
> >>> +0100
> >>>
> >>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Thu Nov 10 16:52:17
> >> 2016
> >>> +0100
> >>>
> >>> @@ -701,7 +701,7 @@
> >>>
> >>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
> >>>
> >>> // Creating guard page is very expensive. Java thread has HotSpot
> >>>
> >>> // guard page, only enable glibc guard page for non-Java threads.
> >>>
> >>> - return (thr_type == java_thread ? 0 : page_size());
> >>>
> >>> + return (thr_type == java_thread ? 0 : 64*K);
> >>>
> >>> }
> >>>
> >>>
> >>>
> >>> // Java thread:
> >>>
More information about the hotspot-runtime-dev
mailing list