RFR(M): 8169373: Work around linux NPTL stack guard error.
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Wed Nov 23 07:11:17 UTC 2016
Hi,
> Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The
> ability for a CompilerThread to execute Java code (can_call_java()) is
> now a dynamic property depending on whether the current compiler is the
> JVMCI compiler.
Ah, then I should also leave space for shadow pages in the minimal stack size
of comiler threads.
Do we agree on the cleanup and on leaving out the OS guard page on
compiler threads?
Then I would edit a change comprising the NPTL workaround and these
additional changes, and split the other issue into a new bug? I think this
will easy the reviewing.
Best regards,
Goetz.
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 23. November 2016 02:50
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>;
> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>
> On 22/11/2016 11:19 PM, Lindenmaier, Goetz wrote:
> > Hi Dan,
> >
> >> -----Original Message-----
> >> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> >> Sent: Dienstag, 22. November 2016 14:01
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David Holmes
> >> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
> >> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
> >>
> >> On 11/22/16 3:55 AM, Lindenmaier, Goetz wrote:
> >>> Hi Dan,
> >>>
> >>> I ran into a row of issues, some errors on the platforms.
> >>>
> >>> What I meant with that comment is that
> >>> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
> >>> JavaThread::stack_guard_zone_size() +
> >>> JavaThread::stack_shadow_zone_size() +
> >>> (4*BytesPerWord COMPILER2_PRESENT(+2)) * 4 *
> K);
> >>> was executed, and min_stack_allowed used for all stacks. Now, compiler
> >> and
> >>> vm minimum stack sizes are not increased by these sizes.
> >>
> >> Now I see what you mean. Thanks for clearing this up.
> >>
> >> I should have remembered that part of the change because we went back
> >> and forth about removing the red/yellow zone pages from the other
> >> threads. In particular, we discussed the compiler thread because it
> >> is-a JavaThread. Our conclusion was that a compiler thread doesn't
> >> execute Java bytecode so we could remove the red/yellow pages...
> > Yes, it does not execute java byte code.
>
> Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The
> ability for a CompilerThread to execute Java code (can_call_java()) is
> now a dynamic property depending on whether the current compiler is the
> JVMCI compiler.
>
> David
> -----
>
> > Therefore you can remove the shadow pages. There is no code that
> > will bang.
> > But red/yellow pages are protected right during thread startup.
> > Therefore you must have enough space for them.
> > On ppc, we try to protect three 64K pages out of the 128K compiler stack.
> > That obviously fails.
> >
> > Therefore I propose:
> > size_t os::Posix::_java_thread_min_stack_allowed = 48 * K; // Set
> platform dependent.
> > in os::Posix::set_minimum_stack_sizes():
> > _java_thread_min_stack_allowed = _java_thread_min_stack_allowed +
> > JavaThread::stack_guard_zone_size() +
> > JavaThread::stack_shadow_zone_size();
> >
> > (Similar for _compiler_thread_min_stack_allowed).
> >
> > The minimal stack size is made up of three components:
> > 1. Sizes of interpreter/C1/C2 frames. Depends on HotSpot
> implementation/platform/os.
> > 2. Sizes of C++ frames: depends on C++ compiler.
> > These are fixed at compile time.
> > 3. Sizes of red/yellow/reserved/shadow pages. Depends on the system
> the
> > VM is used on. This is not fixed at compile time. (Our ppc machines
> differ
> > in page size.)
> > Therefore 3. should not be included in a compile time constant.
> >
> >
> >
> >> And that decision allowed us to be exposed to the 64K page issue
> >> because the "extra" space isn't there anymore.
> >>
> >>> At least the _compiler_thread_min_stack_allowed should be increased
> >>> similarly by red/yellow zone size. The compiler thread is a Java
> >>> thread and must have space for these zones.
> >>
> >> I'm not sure that I completely agree (yet). To me, the red/yellow
> >> pages are there for Java thread stack overflow semantics. Yes, the
> >> compiler thread needs extra space when 64K pages are used, but I
> >> would prefer that we add that space via a different calculation.
> > Yes they are. But compiler threads happen tob e a subclass of
> > Java threads and use the same run() method that puts the pages
> > There.
> > I agree that they are not needed for Compiler threads, nor for
> > CodeCacheSweeperThreads. I don't really now about
> > JvmtiAgentThreads and ServiceThreads, but all of the get the guard pages
> > because they are derived from JavaThread.
> >
> >> To put it another way, I'd like to see us add extra space to solve
> >> the 64K page issue directly instead of as a side effect of the
> >> red/yellow page addition.
> > I don't understand. What do you mean by 'directly'?
> >
> >>> Also, the change added a test that is failing now.
> >>
> >> And that's a "good thing" (TM), right? :-)
> > Yes, it showed a bug and thus raised the need to fix it! :)
> >
> > Best regards,
> > Goetz.
> >
> >
> >> Again, thanks for clarifying 8140520's role in this issue.
> >>
> >> Dan
> >>
> >>
> >>>
> >>> Best regards,
> >>> Goetz.
> >>>
> >>>> -----Original Message-----
> >>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> >>>> Sent: Montag, 21. November 2016 17:28
> >>>> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
> >>>> <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> >>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
> error.
> >>>>
> >>>> Sorry for the delayed responses to this thread. I've been on vacation...
> >>>>
> >>>> One comment/query embedded below...
> >>>>
> >>>>
> >>>> On 11/10/16 8:40 PM, David Holmes wrote:
> >>>>> Hi Goetz,
> >>>>>
> >>>>> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
> >>>>>> Hi David,
> >>>>>>
> >>>>>> This issue is different to 6675312, see also my comment in the bug.
> >>>>>>
> >>>>>> It appears running jtreg test
> runtime/Thread/TooSmallStackSize.java,
> >>>>>> with my patch below you can reproduce it on linuxx86_64. You can
> not
> >>>>>> do that with 6675312. Also, I would assume there are systems out
> there
> >>>>>> on x86 that uses 64-K pages, did you run the tests on these? I would
> >>>>>> assume you get hard crashes with stack overflows in the first C2
> >>>>>> compilation if there is only 64K usable CompilerThreadStack.
> >>>>>>
> >>>>>> My fix does not affect Java threads, which are the largest amount
> >>>>>> of threads used by the VM. It affects only the non-Java threads.
> >>>>>> It adds one page to these threads. The page does not require
> memory,
> >>>>>> as it's protected. The stack will only require more space if the thread
> >>>>>> ran into a stack overflow before the fix as else the pages are not
> >>>>>> mapped.
> >>>>>> This are stack overflows that cause hard crashes, at least on ppc the
> VM
> >>>>>> does not properly catch these stack overflows, so any setup working
> >> now
> >>>>>> will not run into the additional space. Altogether there should be no
> >>>>>> effect on running systems besides requiring one more entry in the
> >>>>>> page table per non-Java thread.
> >>>>>>
> >>>>>> The problem is caused by a rather recent change (8140520: segfault
> on
> >>>>>> solaris-amd64
> >>>>>> with "-XX:VMThreadStackSize=1" option) which was pushed after
> >>>>>> feature-close. As this was a rather recent change, it must be
> >>>>>> possible to
> >>>>>> fix this follow up issue. What else is this period in the project good
> >>>>>> for if not fixing issues?
> >>>>> So I am seeing a number of factors here.
> >>>>>
> >>>>> First, 8140520, set:
> >>>>>
> >>>>> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
> >>>> So I'm confused by the above comment:
> >>>>
> >>>> > The problem is caused by a rather recent change (8140520: segfault
> >>>> > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
> >>>>
> >>>> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
> >>>>
> open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
> >>>>
> >>>> shows this change:
> >>>>
> >>>> @@ -531,19 +531,17 @@
> >>>> }
> >>>>
> >>>>
> >>>>
> >>
> //////////////////////////////////////////////////////////////////////////////
> >>>> //
> >>>> // thread stack
> >>>>
> >>>> -size_t os::Linux::min_stack_allowed = 128*K;
> >>>> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
> >>>> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
> >>>> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128 * K;
> >>>>
> >>>> so the existing single variable of 'min_stack_allowed' was
> >>>> replaced by three variables: _compiler_thread_min_stack_allowed,
> >>>> _java_thread_min_stack_allowed, and
> >>>> _vm_internal_thread_min_stack_allowed.
> >>>>
> >>>> The old single variable and the three new variables are all
> >>>> initialized to the same value (128K) so the fix for 8140520
> >>>> did not change stack sizes for this platform. In fact, only
> >>>> one platform had a size change (Solaris X64).
> >>>>
> >>>> So I'm confused about how the fix for 8140520 caused this problem.
> >>>>
> >>>> Based on David's analysis below, it looks to me like this 64K stack
> >>>> guard page problem should also exist prior to the fix for 8140520.
> >>>>
> >>>> Goetz, can you please explain how 8140520 caused this problem?
> >>>>
> >>>> Dan
> >>>>
> >>>>
> >>>>
> >>>>> Second on linux PPC it is hardwired to use 2 guard pages:
> >>>>>
> >>>>> return 2 * page_size();
> >>>>>
> >>>>> Third, you had a pagesize of 64K.
> >>>>>
> >>>>> Fourth, NPTL takes the guard space from the stack space - hence with
> 2
> >>>>> x 64K guard, and a 128K stack it was all consumed.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> In the proposed changes you now only use page_size() for the guard,
> so
> >>>>> that alone would have fixed the observed problem.
> >>>>>
> >>>>> But in addition you want to address the NPTL problem by adding back
> >>>>> the guard space to the stack size requested. That alone would also
> >>>>> have fixed the observed problem. :)
> >>>>>
> >>>>> But in addition you have increased the minimum stack size:
> >>>>>
> >>>>> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
> >>>>>
> >>>>> which again, on its own would have fixed the original problem. :)
> >>>>>
> >>>>> Did you really intend to increase the real minimum stack from 128K to
> >>>>> 256K ? (on a 64K page system)
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> Focusing simply on the shared code change to adjust the requested
> >>>>> stacksize by the amount of guard space (if any), this does not seem
> >>>>> unreasonable. As you note it is restricted to non-JavaThreads and only
> >>>>> adds a page to reserved stack space.
> >>>>>
> >>>>> My only query now is whether the minimum stacksize detection logic
> >>>>> will correctly report the real minimum stack size (taking into account
> >>>>> the need for the guard page) ?
> >>>>>
> >>>>> Thanks,
> >>>>> David
> >>>>>
> >>>>>> So I really think this issue should be fixed.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Goetz.
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
> >>>>>>> Sent: Thursday, November 10, 2016 10:02 PM
> >>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> >>>> runtime-
> >>>>>>> dev at openjdk.java.net
> >>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
> >> error.
> >>>>>>>
> >>>>>>> Hi Goetz,
> >>>>>>>
> >>>>>>> As per the bug report, this issue was already known (6675312) and
> we
> >>>>>>> chose not to try and address it due to no reported issues at the
> time.
> >>>>>>> While I see that you have encountered an issue (is it real or
> >>>>>>> fabricated?) I think this change is too intrusive to be applied at this
> >>>>>>> stage of the JDK 9 release cycle, as it will change the stack
> >>>>>>> requirements of every application running on Linux.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> David
> >>>>>>>
> >>>>>>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Please review this change. I please need a sponsor:
> >>>>>>>>
> >>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
> >>>> stackFix/webrev.01/
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> In the Linux NPTL pthread implementation the guard size
> mechanism
> >>>>>>>> is not
> >>>>>>>> implemented properly. The posix standard requires to add the
> size
> >>>>>>>> of the
> >>>>>>>> guard pages to the stack size, instead Linux takes the space out of
> >>>>>>>> 'stacksize'.
> >>>>>>>>
> >>>>>>>> The Posix standard
> >>>> http://pubs.opengroup.org/onlinepubs/9699919799/
> >>>>>>>> says "the implementation allocates extra memory at the
> overflow
> >>>>>>>> end of
> >>>>>>>> the stack". The linux man page
> >>>>>>>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As
> at
> >>>>>>>> glibc
> >>>>>>>> 2.8, the NPTL threading implementation includes the guard area
> >> within
> >>>>>>>> the stack size allocation, rather than allocating extra space at
> >>>>>>>> the end
> >>>>>>>> of the stack, as POSIX.1 requires".
> >>>>>>>>
> >>>>>>>> I encounter this problem in
> runtime/Thread/TooSmallStackSize.java
> >>>>>>>> on ppc
> >>>>>>>> with 64K pages. _compiler_thread_min_stack_allowed is 128K on
> >> ppc,
> >>>> and
> >>>>>>>> ppc specifies two OS guard pages. The VM crashes in pthread
> >> creation
> >>>>>>>> because there is no usable space in the thread stack after
> allocating
> >>>>>>>> the guard pages.
> >>>>>>>>
> >>>>>>>> But TooSmallStackSize.java requires that the VM comes up with
> the
> >>>>>>>> stack
> >>>>>>>> size mentioned in the error message.
> >>>>>>>>
> >>>>>>>> This fix adapts the requested stack size on Linux by the size of the
> >>>>>>>> guard pages to mimick proper behaviour, see change to
> >> os_linux.cpp.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The change also streamlines usage of stack_guard_page on
> linuxppc,
> >>>>>>>> linuxppcle, aixppc and linuxs390.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> To reproduce the error on linux_x86_64, apply below patch and
> call
> >> the
> >>>>>>>> VM with -XX:CompilerThreadStackSize=64.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I'm still exploring why I had to choose such big compiler stacks on
> >>>>>>>> ppc
> >>>>>>>> to get -version passing, but I wanted to send the RFR now as
> people
> >>>>>>>> obviously looked at the bug I opened (Thanks David!).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>>
> >>>>>>>> Goetz.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
> >>>>>>>>
> >>>>>>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Mon Nov 07
> >> 12:37:28
> >>>>>>> 2016
> >>>>>>>> +0100
> >>>>>>>>
> >>>>>>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Thu Nov 10
> >>>> 16:52:17
> >>>>>>> 2016
> >>>>>>>> +0100
> >>>>>>>>
> >>>>>>>> @@ -701,7 +701,7 @@
> >>>>>>>>
> >>>>>>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
> >>>>>>>>
> >>>>>>>> // Creating guard page is very expensive. Java thread has
> HotSpot
> >>>>>>>>
> >>>>>>>> // guard page, only enable glibc guard page for non-Java
> threads.
> >>>>>>>>
> >>>>>>>> - return (thr_type == java_thread ? 0 : page_size());
> >>>>>>>>
> >>>>>>>> + return (thr_type == java_thread ? 0 : 64*K);
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> // Java thread:
> >>>>>>>>
> >
More information about the hotspot-runtime-dev
mailing list