RFR(M): 8169373: Work around linux NPTL stack guard error.

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Thu Nov 24 12:15:34 UTC 2016


Hi, 

I now edited the stuff I had proposed below:
http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-stackFix/webrev.03/
This includes
  - the NPTL fix from webrev.02
  - merging code on linux
  - not adding OS guard to compiler threads.

I think this should be pushed for this bug ID. For the other changes I'll 
make another bug.

Best regards,
  Goetz.

> -----Original Message-----
> From: Lindenmaier, Goetz
> Sent: Wednesday, November 23, 2016 8:11 AM
> To: David Holmes <david.holmes at oracle.com>;
> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
> Subject: RE: RFR(M): 8169373: Work around linux NPTL stack guard error.
> 
> Hi,
> 
> > Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The
> > ability for a CompilerThread to execute Java code (can_call_java()) is
> > now a dynamic property depending on whether the current compiler is the
> > JVMCI compiler.
> Ah, then I should also leave space for shadow pages in the minimal stack size
> of comiler threads.
> 
> Do we agree on the cleanup and on leaving out the OS guard page on
> compiler threads?
> Then I would edit a change comprising the NPTL workaround and these
> additional changes, and split the other issue into a new bug?  I think this
> will easy the reviewing.
> 
> Best regards,
>   Goetz.
> 
> > -----Original Message-----
> > From: David Holmes [mailto:david.holmes at oracle.com]
> > Sent: Mittwoch, 23. November 2016 02:50
> > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>;
> > daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
> > Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
> >
> > On 22/11/2016 11:19 PM, Lindenmaier, Goetz wrote:
> > > Hi Dan,
> > >
> > >> -----Original Message-----
> > >> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> > >> Sent: Dienstag, 22. November 2016 14:01
> > >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David Holmes
> > >> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
> > >> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
> error.
> > >>
> > >> On 11/22/16 3:55 AM, Lindenmaier, Goetz wrote:
> > >>> Hi Dan,
> > >>>
> > >>> I ran into a row of issues, some errors on the platforms.
> > >>>
> > >>> What I meant with that comment is that
> > >>> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
> > >>>                                        JavaThread::stack_guard_zone_size() +
> > >>>                                        JavaThread::stack_shadow_zone_size() +
> > >>>                                        (4*BytesPerWord COMPILER2_PRESENT(+2)) * 4
> *
> > K);
> > >>> was executed, and min_stack_allowed used for all stacks. Now,
> compiler
> > >> and
> > >>> vm minimum stack sizes are not increased by these sizes.
> > >>
> > >> Now I see what you mean. Thanks for clearing this up.
> > >>
> > >> I should have remembered that part of the change because we went
> back
> > >> and forth about removing the red/yellow zone pages from the other
> > >> threads. In particular, we discussed the compiler thread because it
> > >> is-a JavaThread. Our conclusion was that a compiler thread doesn't
> > >> execute Java bytecode so we could remove the red/yellow pages...
> > > Yes, it does not execute java byte code.
> >
> > Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The
> > ability for a CompilerThread to execute Java code (can_call_java()) is
> > now a dynamic property depending on whether the current compiler is the
> > JVMCI compiler.
> >
> > David
> > -----
> >
> > > Therefore you can remove the shadow pages.  There is no code that
> > > will bang.
> > > But red/yellow pages are protected right during thread startup.
> > > Therefore you must have enough space for them.
> > > On ppc, we try to protect three 64K pages out of the 128K compiler stack.
> > > That obviously fails.
> > >
> > > Therefore I propose:
> > >      size_t os::Posix::_java_thread_min_stack_allowed = 48 * K;   // Set
> > platform dependent.
> > > in  os::Posix::set_minimum_stack_sizes():
> > >   _java_thread_min_stack_allowed = _java_thread_min_stack_allowed +
> > >                                    JavaThread::stack_guard_zone_size() +
> > >                                    JavaThread::stack_shadow_zone_size();
> > >
> > > (Similar for _compiler_thread_min_stack_allowed).
> > >
> > > The minimal stack size is made up of three components:
> > >    1. Sizes of interpreter/C1/C2 frames. Depends on HotSpot
> > implementation/platform/os.
> > >    2. Sizes of C++ frames: depends on C++ compiler.
> > > These are fixed at compile time.
> > >    3. Sizes of red/yellow/reserved/shadow pages. Depends on the system
> > the
> > >        VM is used on.  This is not fixed at compile time. (Our ppc machines
> > differ
> > >       in page size.)
> > > Therefore 3. should not be included in a compile time constant.
> > >
> > >
> > >
> > >> And that decision allowed us to be exposed to the 64K page issue
> > >> because the "extra" space isn't there anymore.
> > >>
> > >>> At least the _compiler_thread_min_stack_allowed should be increased
> > >>> similarly by red/yellow zone size.  The compiler thread is a Java
> > >>> thread and must have space for these zones.
> > >>
> > >> I'm not sure that I completely agree (yet). To me, the red/yellow
> > >> pages are there for Java thread stack overflow semantics. Yes, the
> > >> compiler thread needs extra space when 64K pages are used, but I
> > >> would prefer that we add that space via a different calculation.
> > > Yes they are. But compiler threads happen tob e a subclass of
> > > Java threads and use the same run() method that puts the pages
> > > There.
> > > I agree that they are not needed for Compiler threads, nor for
> > > CodeCacheSweeperThreads. I don't really now about
> > >  JvmtiAgentThreads and ServiceThreads, but all of the get the guard
> pages
> > > because they are derived from JavaThread.
> > >
> > >> To put it another way, I'd like to see us add extra space to solve
> > >> the 64K page issue directly instead of as a side effect of the
> > >> red/yellow page addition.
> > > I don't understand.  What do you mean by 'directly'?
> > >
> > >>> Also, the change added a test that is failing now.
> > >>
> > >> And that's a "good thing" (TM), right? :-)
> > > Yes, it showed a bug and thus raised the need to fix it! :)
> > >
> > > Best regards,
> > >   Goetz.
> > >
> > >
> > >> Again, thanks for clarifying 8140520's role in this issue.
> > >>
> > >> Dan
> > >>
> > >>
> > >>>
> > >>> Best regards,
> > >>>    Goetz.
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> > >>>> Sent: Montag, 21. November 2016 17:28
> > >>>> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
> > >>>> <goetz.lindenmaier at sap.com>; hotspot-runtime-
> > dev at openjdk.java.net
> > >>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
> > error.
> > >>>>
> > >>>> Sorry for the delayed responses to this thread. I've been on vacation...
> > >>>>
> > >>>> One comment/query embedded below...
> > >>>>
> > >>>>
> > >>>> On 11/10/16 8:40 PM, David Holmes wrote:
> > >>>>> Hi Goetz,
> > >>>>>
> > >>>>> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
> > >>>>>> Hi David,
> > >>>>>>
> > >>>>>> This issue is different to 6675312, see also my comment in the bug.
> > >>>>>>
> > >>>>>> It appears running jtreg test
> > runtime/Thread/TooSmallStackSize.java,
> > >>>>>> with my patch below you can reproduce it on linuxx86_64. You can
> > not
> > >>>>>> do that with 6675312. Also, I would assume there are systems out
> > there
> > >>>>>> on x86 that uses 64-K pages, did you run the tests on these? I
> would
> > >>>>>> assume you get hard crashes with stack overflows in the first C2
> > >>>>>> compilation if there is only 64K usable CompilerThreadStack.
> > >>>>>>
> > >>>>>> My fix does not affect Java threads, which are the largest amount
> > >>>>>> of threads used by the VM. It affects only the non-Java threads.
> > >>>>>> It adds one page to these threads. The page does not require
> > memory,
> > >>>>>> as it's protected. The stack will only require more space if the
> thread
> > >>>>>> ran into a stack overflow before the fix as else the pages are not
> > >>>>>> mapped.
> > >>>>>> This are stack overflows that cause hard crashes, at least on ppc
> the
> > VM
> > >>>>>> does not properly catch these stack overflows, so any setup
> working
> > >> now
> > >>>>>> will not run into the additional space. Altogether there should be
> no
> > >>>>>> effect on running systems besides requiring one more entry in the
> > >>>>>> page table per non-Java thread.
> > >>>>>>
> > >>>>>> The problem is caused by a rather recent change (8140520:
> segfault
> > on
> > >>>>>> solaris-amd64
> > >>>>>> with "-XX:VMThreadStackSize=1" option) which was pushed after
> > >>>>>> feature-close. As this was a rather recent change, it must be
> > >>>>>> possible to
> > >>>>>> fix this follow up issue. What else is this period in the project good
> > >>>>>> for if not fixing issues?
> > >>>>> So I am seeing a number of factors here.
> > >>>>>
> > >>>>> First, 8140520, set:
> > >>>>>
> > >>>>> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
> > >>>> So I'm confused by the above comment:
> > >>>>
> > >>>>   > The problem is caused by a rather recent change (8140520:
> segfault
> > >>>>   > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
> > >>>>
> > >>>> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
> > >>>>
> > open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
> > >>>>
> > >>>> shows this change:
> > >>>>
> > >>>> @@ -531,19 +531,17 @@
> > >>>>    }
> > >>>>
> > >>>>
> > >>>>
> > >>
> > //////////////////////////////////////////////////////////////////////////////
> > >>>> //
> > >>>>    // thread stack
> > >>>>
> > >>>> -size_t os::Linux::min_stack_allowed = 128*K;
> > >>>> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
> > >>>> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
> > >>>> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128 * K;
> > >>>>
> > >>>> so the existing single variable of 'min_stack_allowed' was
> > >>>> replaced by three variables: _compiler_thread_min_stack_allowed,
> > >>>> _java_thread_min_stack_allowed, and
> > >>>> _vm_internal_thread_min_stack_allowed.
> > >>>>
> > >>>> The old single variable and the three new variables are all
> > >>>> initialized to the same value (128K) so the fix for 8140520
> > >>>> did not change stack sizes for this platform. In fact, only
> > >>>> one platform had a size change (Solaris X64).
> > >>>>
> > >>>> So I'm confused about how the fix for 8140520 caused this problem.
> > >>>>
> > >>>> Based on David's analysis below, it looks to me like this 64K stack
> > >>>> guard page problem should also exist prior to the fix for 8140520.
> > >>>>
> > >>>> Goetz, can you please explain how 8140520 caused this problem?
> > >>>>
> > >>>> Dan
> > >>>>
> > >>>>
> > >>>>
> > >>>>> Second on linux PPC it is hardwired to use 2 guard pages:
> > >>>>>
> > >>>>>   return 2 * page_size();
> > >>>>>
> > >>>>> Third, you had a pagesize of 64K.
> > >>>>>
> > >>>>> Fourth, NPTL takes the guard space from the stack space - hence
> with
> > 2
> > >>>>> x 64K guard, and a 128K stack it was all consumed.
> > >>>>>
> > >>>>> ---
> > >>>>>
> > >>>>> In the proposed changes you now only use page_size() for the guard,
> > so
> > >>>>> that alone would have fixed the observed problem.
> > >>>>>
> > >>>>> But in addition you want to address the NPTL problem by adding
> back
> > >>>>> the guard space to the stack size requested. That alone would also
> > >>>>> have fixed the observed problem. :)
> > >>>>>
> > >>>>> But in addition you have increased the minimum stack size:
> > >>>>>
> > >>>>> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
> > >>>>>
> > >>>>> which again, on its own would have fixed the original problem. :)
> > >>>>>
> > >>>>> Did you really intend to increase the real minimum stack from 128K
> to
> > >>>>> 256K ? (on a 64K page system)
> > >>>>>
> > >>>>> ---
> > >>>>>
> > >>>>> Focusing simply on the shared code change to adjust the requested
> > >>>>> stacksize by the amount of guard space (if any), this does not seem
> > >>>>> unreasonable. As you note it is restricted to non-JavaThreads and
> only
> > >>>>> adds a page to reserved stack space.
> > >>>>>
> > >>>>> My only query now is whether the minimum stacksize detection logic
> > >>>>> will correctly report the real minimum stack size (taking into account
> > >>>>> the need for the guard page) ?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> David
> > >>>>>
> > >>>>>> So I really think this issue should be fixed.
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>>    Goetz.
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
> > >>>>>>> Sent: Thursday, November 10, 2016 10:02 PM
> > >>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> > >>>> runtime-
> > >>>>>>> dev at openjdk.java.net
> > >>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack
> guard
> > >> error.
> > >>>>>>>
> > >>>>>>> Hi Goetz,
> > >>>>>>>
> > >>>>>>> As per the bug report, this issue was already known (6675312) and
> > we
> > >>>>>>> chose not to try and address it due to no reported issues at the
> > time.
> > >>>>>>> While I see that you have encountered an issue (is it real or
> > >>>>>>> fabricated?) I think this change is too intrusive to be applied at this
> > >>>>>>> stage of the JDK 9 release cycle, as it will change the stack
> > >>>>>>> requirements of every application running on Linux.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> David
> > >>>>>>>
> > >>>>>>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Please review this change. I please need a sponsor:
> > >>>>>>>>
> > >>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> > >>>> stackFix/webrev.01/
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> In the Linux NPTL pthread implementation the guard size
> > mechanism
> > >>>>>>>> is not
> > >>>>>>>> implemented properly. The posix standard requires to add the
> > size
> > >>>>>>>> of the
> > >>>>>>>> guard pages to the stack size, instead Linux takes the space out
> of
> > >>>>>>>> 'stacksize'.
> > >>>>>>>>
> > >>>>>>>> The Posix standard
> > >>>> http://pubs.opengroup.org/onlinepubs/9699919799/
> > >>>>>>>>   says "the implementation allocates extra memory at the
> > overflow
> > >>>>>>>> end of
> > >>>>>>>> the stack". The linux man page
> > >>>>>>>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As
> > at
> > >>>>>>>> glibc
> > >>>>>>>> 2.8, the NPTL threading implementation includes the guard area
> > >> within
> > >>>>>>>> the stack size allocation, rather than allocating extra space at
> > >>>>>>>> the end
> > >>>>>>>> of the stack, as POSIX.1 requires".
> > >>>>>>>>
> > >>>>>>>> I encounter this problem in
> > runtime/Thread/TooSmallStackSize.java
> > >>>>>>>> on ppc
> > >>>>>>>> with 64K pages. _compiler_thread_min_stack_allowed is 128K
> on
> > >> ppc,
> > >>>> and
> > >>>>>>>> ppc specifies two OS guard pages. The VM crashes in pthread
> > >> creation
> > >>>>>>>> because there is no usable space in the thread stack after
> > allocating
> > >>>>>>>> the guard pages.
> > >>>>>>>>
> > >>>>>>>> But TooSmallStackSize.java requires that the VM comes up with
> > the
> > >>>>>>>> stack
> > >>>>>>>> size mentioned in the error message.
> > >>>>>>>>
> > >>>>>>>> This fix adapts the requested stack size on Linux by the size of
> the
> > >>>>>>>> guard pages to mimick proper behaviour, see change to
> > >> os_linux.cpp.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> The change also streamlines usage of stack_guard_page on
> > linuxppc,
> > >>>>>>>> linuxppcle, aixppc and linuxs390.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> To reproduce the error on linux_x86_64, apply below patch and
> > call
> > >> the
> > >>>>>>>> VM with -XX:CompilerThreadStackSize=64.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> I'm still exploring why I had to choose such big compiler stacks
> on
> > >>>>>>>> ppc
> > >>>>>>>> to get -version passing, but I wanted to send the RFR now as
> > people
> > >>>>>>>> obviously looked at the bug I opened (Thanks David!).
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Best regards,
> > >>>>>>>>
> > >>>>>>>>    Goetz.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
> > >>>>>>>>
> > >>>>>>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Mon Nov 07
> > >> 12:37:28
> > >>>>>>> 2016
> > >>>>>>>> +0100
> > >>>>>>>>
> > >>>>>>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Thu Nov 10
> > >>>> 16:52:17
> > >>>>>>> 2016
> > >>>>>>>> +0100
> > >>>>>>>>
> > >>>>>>>> @@ -701,7 +701,7 @@
> > >>>>>>>>
> > >>>>>>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
> > >>>>>>>>
> > >>>>>>>>     // Creating guard page is very expensive. Java thread has
> > HotSpot
> > >>>>>>>>
> > >>>>>>>>     // guard page, only enable glibc guard page for non-Java
> > threads.
> > >>>>>>>>
> > >>>>>>>> -  return (thr_type == java_thread ? 0 : page_size());
> > >>>>>>>>
> > >>>>>>>> +  return (thr_type == java_thread ? 0 : 64*K);
> > >>>>>>>>
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> // Java thread:
> > >>>>>>>>
> > >


More information about the hotspot-runtime-dev mailing list