RFR(M): 8169373: Work around linux NPTL stack guard error.

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Nov 30 15:20:14 UTC 2016


 > Would you mind sponsoring this change?

Yes, I will sponsor this change. However, I would like to get a
thumbs up from David H. on the latest version and I don't see
any review from someone on the Compiler Team.

Vladimir! We need someone on the Compiler Team to look at these
CompilerThread stack size changes...

Dan


On 11/30/16 12:57 AM, Lindenmaier, Goetz wrote:
> Hi Dan,
>
>> Thumbs up! I don't need to see a new webrev if you choose
>> to fix the minor comments above.
> I anyways did a new one fixing the comments:
> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-stackFix/webrev.06/
> Would you mind sponsoring this change?
>
> I took the minimum stack sizes from my experimental runs where
> I had removed the automatic resizing to find the really needed space.
> If I put something smaller there, I could as well put '0' ... as
> _java_thread_min_stack_allowed = MAX2(_java_thread_min_stack_allowed,
>                                          JavaThread::stack_guard_zone_size() +
>                                          JavaThread::stack_shadow_zone_size() +
>                                          (4 * BytesPerWord COMPILER2_PRESENT(+ 2)) * 4 * K);
> will fix it.
> This code effectively limits the usable stack size to
>     (4 * BytesPerWord COMPILER2_PRESENT(+ 2)) * 4 * K)
> which makes the initialization of _java_thread_min_stack_allowed with platform
> values pointless.
>
> I'll open a new bug for the follow-up stack issue, probably tomorrow.
> I don't feel like addressing testing all the possible error codes, as
> they probably should be fixed in more places, and there is no issue
> pending currently.  Maybe it should be fixed in jdk10 at some point.
>
> Best regards,
>    Goetz
>
>
>> -----Original Message-----
>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>> Sent: Dienstag, 29. November 2016 20:04
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-
>> dev at openjdk.java.net; 'hotspot-runtime-dev at openjdk.java.net' <hotspot-
>> runtime-dev at openjdk.java.net>
>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>
>> On 11/29/16 2:30 AM, Lindenmaier, Goetz wrote:
>>> Hi Dan,
>>>
>>> see my replies inline ...
>>> New webrev:
>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-stackFix/webrev.05/
>> src/os/aix/vm/os_aix.cpp
>>       L887:   // libc guard page
>>           nit - You made other existing comments into sentences (leading
>>                 capital and trailing '.'), but not this new comment. Why?
>>
>> src/os/aix/vm/os_aix.hpp
>>       No comments.
>>
>> src/os/linux/vm/os_linux.cpp
>>       L6096: //    |                        |/  1 page glibc guard.
>>           nit - "1 page glibc guard" -> "1 glibc guard page."
>>
>> src/os/posix/vm/os_posix.cpp
>>       No comments.
>>
>> src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
>>       No comments.
>>
>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>>       L875: //    |                        |/  1 page glibc guard.
>>           nit - "1 page glibc guard" -> "1 glibc guard page."
>>
>> src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp
>>       No comments.
>>
>> src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp
>>       No comments.
>>
>> src/os_cpu/linux_s390/vm/os_linux_s390.cpp
>>       No comments.
>>
>> src/os_cpu/linux_sparc/vm/os_linux_sparc.cpp
>>       No comments.
>>
>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>       No comments.
>>
>> src/os_cpu/linux_zero/vm/os_linux_zero.cpp
>>       No comments.
>>
>>
>> Thumbs up! I don't need to see a new webrev if you choose
>> to fix the minor comments above.
>>
>> Some replies embedded below.
>>
>>
>>>> -----Original Message-----
>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>> Sent: Dienstag, 29. November 2016 01:38
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-compiler-
>>>> dev at openjdk.java.net; 'hotspot-runtime-dev at openjdk.java.net' <hotspot-
>>>> runtime-dev at openjdk.java.net>
>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>>>
>>>> On 11/28/16 2:08 AM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> I'm working on a fix for OS guard pages on stacks.  I figured there
>>>>> are VM guard pages (reserved, yellow, red) on the compiler stacks
>>>>> _and_ OS guard pages.  For Java threads, the OS guard pages are left
>>>>> out.  I think this should be done for compiler threads, too. Please
>>>>> confirm.
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-stackFix/webrev.04/
>>>> src/os/aix/vm/os_aix.cpp
>>>>        L888:   pthread_attr_setguardsize(&attr,
>>>> os::Aix::default_guard_size(thr_type));
>>>>            No check or assert on the return status of this call.
>>>>            Is one needed?
>>> I implemented this as the existing code on linux which has
>>> no check either.  I think a failure is quite theoretical.
>>> Because of your comment below I'll leave it as-is.
>> OK.
>>
>>
>>>>        L3044:   // guard pages, so only enable libc guard pages for
>>>> non-Java threads.
>>>>            src/os/linux/vm/os_linux.cpp also has this comment:
>>>>            // (Remember: compiler thread is a Java thread, too!)
>>> Fixed.
>>>
>>>>        L3051:   return ((thr_type == java_thread || thr_type ==
>>>> compiler_thread) ? 0 : 4*K);
>>>>            nit - please add spaces around the '*' so '4 * K'.'
>>> Fixed.
>>>
>>>> src/os/aix/vm/os_aix.hpp
>>>>        No comments.
>>>>
>>>> src/os/linux/vm/os_linux.cpp
>>>>        L729:   // is not implemented properly. The posix standard requires
>>>> to add
>>>>            Typo: 'to add' -> 'adding'
>>> Fixed.
>>>
>>>>        L738:   pthread_attr_setguardsize(&attr,
>>>> os::Linux::default_guard_size(thr_type));
>>>>            No check or assert on the return status of this call.
>>>>            Is one needed?
>>> See above.
>>>
>>>>        L2851:   // Creating guard page is very expensive. Java thread has
>>>> HotSpot
>>>>        L2852:   // guard page, only enable glibc guard page for non-Java
>>>> threads.
>>>>        L2853:   // (Remember: compiler thread is a java thread, too!)
>>>>            Typo: "java thread" -> "Java thread" (consistency)
>>>>
>>>>            This comment block should be common to all the platforms that
>>>>            define default_guard_size(). Yes, I can see that AIX needs to
>>>>            add another paragraph, but we should make the core comment
>> common
>>>>            if possible.
>>> I made the first three lines look alike.
>>>
>>>>        L6090: // Java/Compiler thread:
>>>>            Thanks for making this common in os_linux.cpp.
>>>>
>>>>        L6095: //    |    glibc guard page    | - guard, attached Java
>>>> thread usually has
>>>>            Clarity: "guard," -> "guard page,"
>>> Fixed.
>>>
>>>>            Typo: "Java thread" -> "JavaThread" (consistency)
>>> I changed both to Java thread as at the other locations.
>>>
>>>>        L6099: //    |  HotSpot Guard Pages   | - red and yellow pages
>>>>            The fairly recently added reserved page should be mentioned
>>>>            here also?
>>> Yes. Fixed.   Also fixed it to say JavaThread::stack_reserved_zone_base().
>>> Also fixed comment on bsd.
>> Thanks for also fixing BSD.
>>
>>
>>>>        L6120 // ** P1 (aka bottom) and size ( P2 = P1 - size) are the
>>>> address and stack size returned from
>>>>            Typo: "( P2 = ..." -> "(P2 = ..."
>>> Fixed.
>>>
>>>>        L6148:       fatal("Can not locate current stack attributes!");
>>>>            Typo: "Can not" -> "Cannot"
>>> Fixed.
>>>
>>>>        L6175:   // stack size includes normal stack and HotSpot guard pages
>>>>            Perhaps add to the comment:
>>>>            "for the threads that have HotSpot guard pages."
>>> Fixed.  I also checked my comments for "OS guard pages" and
>>> replaced it by "glibc guard pages" which is used in several places
>>> already, same for "VM guard page" --> "HotSpot guard page". I
>>> think this is also more consistent.
>> I agree!
>>
>>
>>>> src/os/posix/vm/os_posix.cpp
>>>>        L1097:   pthread_attr_getstacksize(attr, &stack_size);
>>>>        L1098:   pthread_attr_getguardsize(attr, &guard_size);
>>>>            Do these two calls need to have their result checked?
>>>>
>>>>            Now I'm starting to wonder if all the uses of these
>>>>            two APIs need to be checked? Separate bug?
>>> It would be more consistent with the specification of the methods,
>>> On the other side it's quite unlikely that these fail if attr != NULL.
>> So should we file a new bug? Or does this fall into the realm of
>> other OS/libc code that we call and assume never fails? :-)
>>
>>
>>>> src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
>>>>        L540: size_t os::Posix::_compiler_thread_min_stack_allowed = 512 * K;
>>>>        L541: size_t os::Posix::_java_thread_min_stack_allowed = 512 * K;
>>>>            So prior to the fix for 8140520, src/os/aix/vm/os_aix.cpp had
>>>>            this single min_stack_allowed value:
>>>>
>>>>            L3601:   os::Aix::min_stack_allowed =
>>>> MAX2(os::Aix::min_stack_allowed,
>>>>            L3602: JavaThread::stack_guard_zone_size() +
>>>>            L3603: JavaThread::stack_shadow_zone_size() +
>>>>            L3604:                                     (4*BytesPerWord
>>>> COMPILER2_PRESENT(+2)) * 4 * K);
>>>>
>>>>            and the fix for 8140520 changed that for *NIX platforms to
>>>>            three mins in src/os/posix/vm/os_posix.cpp:
>>>>
>>>>            L1108:   _java_thread_min_stack_allowed =
>>>> MAX2(_java_thread_min_stack_allowed,
>>>>            L1109: JavaThread::stack_guard_zone_size() +
>>>>            L1110: JavaThread::stack_shadow_zone_size() +
>>>>            L1111:                                         (4 *
>>>> BytesPerWord COMPILER2_PRESENT(+ 2)) * 4 * K);
>>>>
>>>>            L1150:   _compiler_thread_min_stack_allowed =
>>>> align_size_up(_compiler_thread_min_stack_allowed, vm_page_size());
>>>>
>>>>            L1161   _vm_internal_thread_min_stack_allowed =
>>>> align_size_up(_vm_internal_thread_min_stack_allowed, vm_page_size());
>>>>
>>>>            Which means that the compiler_thread no longer benefits from
>>>>            the extra space for quard and shadow pages. The thinking in
>>>>            8140520 was that the compiler_thread and vm_internal_threads
>>>>            don't need the quard and shadow pages since they don't run
>>>>            Java code (ignoring JVMCI for now).
>>>>
>>>>            So I can see bumping _compiler_thread_min_stack_allowed from
>>>>            128 -> 512 as one solution for getting that extra space back.
>>>>            However, I don't understand why _java_thread_min_stack_allowed
>>>>            has changed from 128 -> 512.
>>> Because it was never correct before.
>> OK. That sounds like the new test that I included with 8140520 would
>> have failed with JavaThread stack sizes even before the product code
>> changes from 8140520 were made.
>>
>> Since the size calculation for JavaThread stack sizes wasn't changed
>> for any platform in 8140520, that tends to indicate that the more
>> limited JDK test (test/tools/launcher/TooSmallStackSize.java) should
>> also have failed before the fix for 8140520.
>>
>> Please clarify the need for the _java_thread_min_stack_allowed change
>> from 128 -> 512. Unless test/tools/launcher/TooSmallStackSize.java
>> is never run in your testing, I'm having troubling seeing why the
>> _java_thread_min_stack_allowed increase is needed.
>>
>>
>>>>            I had previously made this comment:
>>>>            > To put it another way, I'd like to see us add extra space to
>>>>            > solve the 64K page issue directly instead of as a side effect
>>>>            > of the red/yellow page addition.
>>>>            And Goetz replied with:
>>>>            > I don't understand.  What do you mean by 'directly'?
>>>>
>>>>            So prior to the fix for 8140520, src/os/solaris/vm/os_solaris.cpp
>>>>            had a block like this:
>>>>
>>>>            L4468:   // For 64kbps there will be a 64kb page size, which makes
>>>>            L4469:   // the usable default stack size quite a bit less.
>>>> Increase the
>>>>            L4470:   // stack for 64kb (or any > than 8kb) pages, this
>>>> increases
>>>>            L4471:   // virtual memory fragmentation (since we're not
>>>> creating the
>>>>            L4472   // stack on a power of 2 boundary.  The real fix for this
>>>>            L4473   // should be to fix the guard page mechanism.
>>>>            L4474
>>>>            L4475   if (vm_page_size() > 8*K) {
>>>>            L4476     threadStackSizeInBytes = (threadStackSizeInBytes != 0)
>>>>            L4477        ? threadStackSizeInBytes +
>>>>            L4478          JavaThread::stack_red_zone_size() +
>>>>            L4479          JavaThread::stack_yellow_zone_size()
>>>>            L4480        : 0;
>>>>            L4481     ThreadStackSize = threadStackSizeInBytes/K;
>>>>            L4482   }
>>>>
>>>>            The above is an example of what I mean by solving the 64K
>>>>            page issue directly. In the fix for 8140520, that code block
>>>>            was moved to os::Posix::set_minimum_stack_sizes() in
>>>>            src/os/posix/vm/os_posix.cpp and put in a "#ifdef SOLARIS...
>>>>            #endif // SOLARIS" block. Coleen filed a bug to determine
>>>>            whether that code can be deleted:
>>>>
>>>>            JDK-8161093 Solaris for >8k pagesize adds extra guard pages
>>>>            https://bugs.openjdk.java.net/browse/JDK-8161093
>>>>
>>>>            but perhaps this bug shows that the code is needed?
>>>>
>>>>
>>>>            OK so this is probably the longest code review comment
>>>>            I have ever written, but the summary is:
>>>>
>>>>            - I understand bumping _compiler_thread_min_stack_allowed,
>>>>              but should it be solved in a different way?
>>>>            - I don't understand bumping _java_thread_min_stack_allowed
>>> I plan to do a follow up change to fix this.  Let's leave this discussion
>>> to that review.  Here I just want to fix the NPTL issue and the basic
>>> sizing that is needed to pass the new test on ppc/s390.
>> Same question here about the simpler JDK version of the test.
>>
>> Does test/tools/launcher/TooSmallStackSize.java get run in
>> your test environments?
>>
>>
>>>> src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp
>>>>        No comments.
>>>>
>>>> src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp
>>>>        L538: size_t os::Posix::_compiler_thread_min_stack_allowed = 384 * K;
>>>>        L539: size_t os::Posix::_java_thread_min_stack_allowed = 384 * K;
>>>>
>>>>            Same monster comment as src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
>>>>            and the same summary:
>>>>
>>>>            - I understand bumping _compiler_thread_min_stack_allowed,
>>>>              but should it be solved in a different way?
>>>>            - I don't understand bumping _java_thread_min_stack_allowed
>>>>
>>>> src/os_cpu/linux_s390/vm/os_linux_s390.cpp
>>>>        L478: size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>>>>        L479: size_t os::Posix::_java_thread_min_stack_allowed = 236 * K;
>>>>            Bumping _java_thread_min_stack_allowed but not bumping
>>>>            _compiler_thread_min_stack_allowed. I'm confused here.
>>> The numbers are what I need to startup on the machines.  128 is just
>>> fine on the machines we have.  (That's the problem of the
>>> current setup: you have to tune this compile time constant for the
>>> page size of the machine you are running on. But let's discuss this
>>> in the follow up change.)
>> OK about discussing this with a follow-up change. I guess I see
>> the compile time initialization as a "minimum setting assuming the
>> smallest page size". If we discover (at runtime) that the page
>> size is bigger, then we adjust the minimum that we need...
>>
>> Again, defer to another bug. Do we have a bug ID yet?
>>
>>
>>>> src/os_cpu/linux_sparc/vm/os_linux_sparc.cpp
>>>>        No comments.
>>>>
>>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>        No comments.
>>>>
>>>> src/os_cpu/linux_zero/vm/os_linux_zero.cpp
>>>>        No comments.
>>>>
>>>> Sorry it took me so long to write this up...
>>> No matter, thanks for this thorough review!
>> You are very welcome. Thanks for being willing to dive into such
>> a complicated area (thread stack sizes)...
>>
>> Dan
>>
>>
>>
>>> Best regards,
>>>     Goetz.
>>>
>>>> Dan
>>>>
>>>>> The change affecting the compier threads is in os_linux.cpp,
>>>> default_guard_size(),
>>>>> where '|| thr_type == compiler_thread' has been added. The function
>>>>> was also moved from the os_cpu files, as it's identical on all cpus.
>>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Montag, 28. November 2016 00:25
>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>;
>>>>>> 'daniel.daugherty at oracle.com' <daniel.daugherty at oracle.com>;
>> 'hotspot-
>>>>>> runtime-dev at openjdk.java.net' <hotspot-runtime-dev at openjdk.java.net>
>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> On 24/11/2016 10:15 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I now edited the stuff I had proposed below:
>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
>> stackFix/webrev.03/
>>>>>>> This includes
>>>>>>>      - the NPTL fix from webrev.02
>>>>>> Okay in principle. As discussed this only impacts non-JavaThreads so the
>>>>>> change should be minimal.
>>>>>>
>>>>>>>      - merging code on linux
>>>>>> Went a bit further than I had expected but if this truly isn't CPU
>>>>>> dependent code then great!
>>>>>>
>>>>>>>      - not adding OS guard to compiler threads.
>>>>>> Okay in principle. IIUC we will now save the OS guard page for compiler
>>>>>> thread stacks. Is that the only impact? The hotspot-compiler-dev folk
>>>>>> may want to sign off on this part.
>>>>>>
>>>>>>
>>>>>> A few minor comments:
>>>>>>
>>>>>> src/os/linux/vm/os_linux.cpp
>>>>>>
>>>>>> 2854   return ((thr_type == java_thread || thr_type ==
>>>>>> os::compiler_thread) ...
>>>>>>
>>>>>> os:: should be used for both types or none.
>>>>>>
>>>>>> 6153     pthread_attr_getguardsize(&attr, &guard_size);
>>>>>>
>>>>>> Can you at least verify a zero return code in an assert/assert_status
>>>>>> please.
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
>>>>>> src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp
>>>>>> src/os_cpu/linux_s390/vm/os_linux_s390.cpp
>>>>>>
>>>>>> Are the changes to min_stack_allowed just fixing an existing bug?
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>
>>>>>>> I think this should be pushed for this bug ID. For the other changes I'll
>>>>>>> make another bug.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>      Goetz.
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Lindenmaier, Goetz
>>>>>>>> Sent: Wednesday, November 23, 2016 8:11 AM
>>>>>>>> To: David Holmes <david.holmes at oracle.com>;
>>>>>>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Subject: RE: RFR(M): 8169373: Work around linux NPTL stack guard
>> error.
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>> Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI.
>> The
>>>>>>>>> ability for a CompilerThread to execute Java code (can_call_java()) is
>>>>>>>>> now a dynamic property depending on whether the current compiler
>> is
>>>> the
>>>>>>>>> JVMCI compiler.
>>>>>>>> Ah, then I should also leave space for shadow pages in the minimal
>> stack
>>>>>> size
>>>>>>>> of comiler threads.
>>>>>>>>
>>>>>>>> Do we agree on the cleanup and on leaving out the OS guard page on
>>>>>>>> compiler threads?
>>>>>>>> Then I would edit a change comprising the NPTL workaround and these
>>>>>>>> additional changes, and split the other issue into a new bug?  I think this
>>>>>>>> will easy the reviewing.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>      Goetz.
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>> Sent: Mittwoch, 23. November 2016 02:50
>>>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>;
>>>>>>>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
>>>> error.
>>>>>>>>> On 22/11/2016 11:19 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>> Hi Dan,
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>>>>>>>> Sent: Dienstag, 22. November 2016 14:01
>>>>>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David
>> Holmes
>>>>>>>>>>> <david.holmes at oracle.com>; hotspot-runtime-
>> dev at openjdk.java.net
>>>>>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
>>>>>>>> error.
>>>>>>>>>>> On 11/22/16 3:55 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>
>>>>>>>>>>>> I ran into a row of issues, some errors on the platforms.
>>>>>>>>>>>>
>>>>>>>>>>>> What I meant with that comment is that
>>>>>>>>>>>> os::Linux::min_stack_allowed =
>> MAX2(os::Linux::min_stack_allowed,
>>>>>>>>>>>>                                           JavaThread::stack_guard_zone_size() +
>>>>>>>>>>>>                                           JavaThread::stack_shadow_zone_size() +
>>>>>>>>>>>>                                           (4*BytesPerWord
>>>> COMPILER2_PRESENT(+2)) *
>>>>>> 4
>>>>>>>> *
>>>>>>>>> K);
>>>>>>>>>>>> was executed, and min_stack_allowed used for all stacks. Now,
>>>>>>>> compiler
>>>>>>>>>>> and
>>>>>>>>>>>> vm minimum stack sizes are not increased by these sizes.
>>>>>>>>>>> Now I see what you mean. Thanks for clearing this up.
>>>>>>>>>>>
>>>>>>>>>>> I should have remembered that part of the change because we
>> went
>>>>>>>> back
>>>>>>>>>>> and forth about removing the red/yellow zone pages from the other
>>>>>>>>>>> threads. In particular, we discussed the compiler thread because it
>>>>>>>>>>> is-a JavaThread. Our conclusion was that a compiler thread doesn't
>>>>>>>>>>> execute Java bytecode so we could remove the red/yellow pages...
>>>>>>>>>> Yes, it does not execute java byte code.
>>>>>>>>> Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI.
>> The
>>>>>>>>> ability for a CompilerThread to execute Java code (can_call_java()) is
>>>>>>>>> now a dynamic property depending on whether the current compiler
>> is
>>>> the
>>>>>>>>> JVMCI compiler.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> Therefore you can remove the shadow pages.  There is no code that
>>>>>>>>>> will bang.
>>>>>>>>>> But red/yellow pages are protected right during thread startup.
>>>>>>>>>> Therefore you must have enough space for them.
>>>>>>>>>> On ppc, we try to protect three 64K pages out of the 128K compiler
>>>> stack.
>>>>>>>>>> That obviously fails.
>>>>>>>>>>
>>>>>>>>>> Therefore I propose:
>>>>>>>>>>         size_t os::Posix::_java_thread_min_stack_allowed = 48 * K;   //
>> Set
>>>>>>>>> platform dependent.
>>>>>>>>>> in  os::Posix::set_minimum_stack_sizes():
>>>>>>>>>>      _java_thread_min_stack_allowed =
>> _java_thread_min_stack_allowed
>>>> +
>>>>>>>>>>                                       JavaThread::stack_guard_zone_size() +
>>>>>>>>>>                                       JavaThread::stack_shadow_zone_size();
>>>>>>>>>>
>>>>>>>>>> (Similar for _compiler_thread_min_stack_allowed).
>>>>>>>>>>
>>>>>>>>>> The minimal stack size is made up of three components:
>>>>>>>>>>       1. Sizes of interpreter/C1/C2 frames. Depends on HotSpot
>>>>>>>>> implementation/platform/os.
>>>>>>>>>>       2. Sizes of C++ frames: depends on C++ compiler.
>>>>>>>>>> These are fixed at compile time.
>>>>>>>>>>       3. Sizes of red/yellow/reserved/shadow pages. Depends on the
>>>> system
>>>>>>>>> the
>>>>>>>>>>           VM is used on.  This is not fixed at compile time. (Our ppc
>>>> machines
>>>>>>>>> differ
>>>>>>>>>>          in page size.)
>>>>>>>>>> Therefore 3. should not be included in a compile time constant.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> And that decision allowed us to be exposed to the 64K page issue
>>>>>>>>>>> because the "extra" space isn't there anymore.
>>>>>>>>>>>
>>>>>>>>>>>> At least the _compiler_thread_min_stack_allowed should be
>>>> increased
>>>>>>>>>>>> similarly by red/yellow zone size.  The compiler thread is a Java
>>>>>>>>>>>> thread and must have space for these zones.
>>>>>>>>>>> I'm not sure that I completely agree (yet). To me, the red/yellow
>>>>>>>>>>> pages are there for Java thread stack overflow semantics. Yes, the
>>>>>>>>>>> compiler thread needs extra space when 64K pages are used, but I
>>>>>>>>>>> would prefer that we add that space via a different calculation.
>>>>>>>>>> Yes they are. But compiler threads happen tob e a subclass of
>>>>>>>>>> Java threads and use the same run() method that puts the pages
>>>>>>>>>> There.
>>>>>>>>>> I agree that they are not needed for Compiler threads, nor for
>>>>>>>>>> CodeCacheSweeperThreads. I don't really now about
>>>>>>>>>>     JvmtiAgentThreads and ServiceThreads, but all of the get the guard
>>>>>>>> pages
>>>>>>>>>> because they are derived from JavaThread.
>>>>>>>>>>
>>>>>>>>>>> To put it another way, I'd like to see us add extra space to solve
>>>>>>>>>>> the 64K page issue directly instead of as a side effect of the
>>>>>>>>>>> red/yellow page addition.
>>>>>>>>>> I don't understand.  What do you mean by 'directly'?
>>>>>>>>>>
>>>>>>>>>>>> Also, the change added a test that is failing now.
>>>>>>>>>>> And that's a "good thing" (TM), right? :-)
>>>>>>>>>> Yes, it showed a bug and thus raised the need to fix it! :)
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>      Goetz.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Again, thanks for clarifying 8140520's role in this issue.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>       Goetz.
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Daniel D. Daugherty
>> [mailto:daniel.daugherty at oracle.com]
>>>>>>>>>>>>> Sent: Montag, 21. November 2016 17:28
>>>>>>>>>>>>> To: David Holmes <david.holmes at oracle.com>; Lindenmaier,
>> Goetz
>>>>>>>>>>>>> <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack
>> guard
>>>>>>>>> error.
>>>>>>>>>>>>> Sorry for the delayed responses to this thread. I've been on
>>>> vacation...
>>>>>>>>>>>>> One comment/query embedded below...
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/10/16 8:40 PM, David Holmes wrote:
>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This issue is different to 6675312, see also my comment in the
>>>> bug.
>>>>>>>>>>>>>>> It appears running jtreg test
>>>>>>>>> runtime/Thread/TooSmallStackSize.java,
>>>>>>>>>>>>>>> with my patch below you can reproduce it on linuxx86_64. You
>>>> can
>>>>>>>>> not
>>>>>>>>>>>>>>> do that with 6675312. Also, I would assume there are systems
>> out
>>>>>>>>> there
>>>>>>>>>>>>>>> on x86 that uses 64-K pages, did you run the tests on these? I
>>>>>>>> would
>>>>>>>>>>>>>>> assume you get hard crashes with stack overflows in the first
>> C2
>>>>>>>>>>>>>>> compilation if there is only 64K usable CompilerThreadStack.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My fix does not affect Java threads, which are the largest
>> amount
>>>>>>>>>>>>>>> of threads used by the VM. It affects only the non-Java threads.
>>>>>>>>>>>>>>> It adds one page to these threads. The page does not require
>>>>>>>>> memory,
>>>>>>>>>>>>>>> as it's protected. The stack will only require more space if the
>>>>>>>> thread
>>>>>>>>>>>>>>> ran into a stack overflow before the fix as else the pages are
>> not
>>>>>>>>>>>>>>> mapped.
>>>>>>>>>>>>>>> This are stack overflows that cause hard crashes, at least on
>> ppc
>>>>>>>> the
>>>>>>>>> VM
>>>>>>>>>>>>>>> does not properly catch these stack overflows, so any setup
>>>>>>>> working
>>>>>>>>>>> now
>>>>>>>>>>>>>>> will not run into the additional space. Altogether there should
>> be
>>>>>>>> no
>>>>>>>>>>>>>>> effect on running systems besides requiring one more entry in
>>>> the
>>>>>>>>>>>>>>> page table per non-Java thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is caused by a rather recent change (8140520:
>>>>>>>> segfault
>>>>>>>>> on
>>>>>>>>>>>>>>> solaris-amd64
>>>>>>>>>>>>>>> with "-XX:VMThreadStackSize=1" option) which was pushed
>> after
>>>>>>>>>>>>>>> feature-close. As this was a rather recent change, it must be
>>>>>>>>>>>>>>> possible to
>>>>>>>>>>>>>>> fix this follow up issue. What else is this period in the project
>>>> good
>>>>>>>>>>>>>>> for if not fixing issues?
>>>>>>>>>>>>>> So I am seeing a number of factors here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> First, 8140520, set:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 *
>> K;
>>>>>>>>>>>>> So I'm confused by the above comment:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      > The problem is caused by a rather recent change (8140520:
>>>>>>>> segfault
>>>>>>>>>>>>>      > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
>>>>>>>>>>>>>
>>>>>>>>> open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
>>>>>>>>>>>>> shows this change:
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -531,19 +531,17 @@
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>> //////////////////////////////////////////////////////////////////////////////
>>>>>>>>>>>>> //
>>>>>>>>>>>>>       // thread stack
>>>>>>>>>>>>>
>>>>>>>>>>>>> -size_t os::Linux::min_stack_allowed = 128*K;
>>>>>>>>>>>>> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 *
>> K;
>>>>>>>>>>>>> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
>>>>>>>>>>>>> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128
>> *
>>>> K;
>>>>>>>>>>>>> so the existing single variable of 'min_stack_allowed' was
>>>>>>>>>>>>> replaced by three variables:
>> _compiler_thread_min_stack_allowed,
>>>>>>>>>>>>> _java_thread_min_stack_allowed, and
>>>>>>>>>>>>> _vm_internal_thread_min_stack_allowed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The old single variable and the three new variables are all
>>>>>>>>>>>>> initialized to the same value (128K) so the fix for 8140520
>>>>>>>>>>>>> did not change stack sizes for this platform. In fact, only
>>>>>>>>>>>>> one platform had a size change (Solaris X64).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I'm confused about how the fix for 8140520 caused this
>> problem.
>>>>>>>>>>>>> Based on David's analysis below, it looks to me like this 64K stack
>>>>>>>>>>>>> guard page problem should also exist prior to the fix for
>> 8140520.
>>>>>>>>>>>>> Goetz, can you please explain how 8140520 caused this problem?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Second on linux PPC it is hardwired to use 2 guard pages:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      return 2 * page_size();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Third, you had a pagesize of 64K.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fourth, NPTL takes the guard space from the stack space -
>> hence
>>>>>>>> with
>>>>>>>>> 2
>>>>>>>>>>>>>> x 64K guard, and a 128K stack it was all consumed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the proposed changes you now only use page_size() for the
>>>> guard,
>>>>>>>>> so
>>>>>>>>>>>>>> that alone would have fixed the observed problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But in addition you want to address the NPTL problem by adding
>>>>>>>> back
>>>>>>>>>>>>>> the guard space to the stack size requested. That alone would
>> also
>>>>>>>>>>>>>> have fixed the observed problem. :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But in addition you have increased the minimum stack size:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 *
>> K;
>>>>>>>>>>>>>> which again, on its own would have fixed the original problem.
>> :)
>>>>>>>>>>>>>> Did you really intend to increase the real minimum stack from
>>>> 128K
>>>>>>>> to
>>>>>>>>>>>>>> 256K ? (on a 64K page system)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Focusing simply on the shared code change to adjust the
>> requested
>>>>>>>>>>>>>> stacksize by the amount of guard space (if any), this does not
>> seem
>>>>>>>>>>>>>> unreasonable. As you note it is restricted to non-JavaThreads
>> and
>>>>>>>> only
>>>>>>>>>>>>>> adds a page to reserved stack space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My only query now is whether the minimum stacksize detection
>>>> logic
>>>>>>>>>>>>>> will correctly report the real minimum stack size (taking into
>>>> account
>>>>>>>>>>>>>> the need for the guard page) ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I really think this issue should be fixed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>       Goetz.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>>> Sent: Thursday, November 10, 2016 10:02 PM
>>>>>>>>>>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>;
>> hotspot-
>>>>>>>>>>>>> runtime-
>>>>>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack
>>>>>>>> guard
>>>>>>>>>>> error.
>>>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As per the bug report, this issue was already known
>> (6675312)
>>>> and
>>>>>>>>> we
>>>>>>>>>>>>>>>> chose not to try and address it due to no reported issues at
>> the
>>>>>>>>> time.
>>>>>>>>>>>>>>>> While I see that you have encountered an issue (is it real or
>>>>>>>>>>>>>>>> fabricated?) I think this change is too intrusive to be applied
>> at
>>>> this
>>>>>>>>>>>>>>>> stage of the JDK 9 release cycle, as it will change the stack
>>>>>>>>>>>>>>>> requirements of every application running on Linux.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please review this change. I please need a sponsor:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
>>>>>>>>>>>>> stackFix/webrev.01/
>>>>>>>>>>>>>>>>> In the Linux NPTL pthread implementation the guard size
>>>>>>>>> mechanism
>>>>>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>> implemented properly. The posix standard requires to add
>> the
>>>>>>>>> size
>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>> guard pages to the stack size, instead Linux takes the space
>> out
>>>>>>>> of
>>>>>>>>>>>>>>>>> 'stacksize'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The Posix standard
>>>>>>>>>>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>>>>>>>>>>>>>>      says "the implementation allocates extra memory at the
>>>>>>>>> overflow
>>>>>>>>>>>>>>>>> end of
>>>>>>>>>>>>>>>>> the stack". The linux man page
>>>>>>>>>>>>>>>>> https://linux.die.net/man/3/pthread_attr_setguardsize says
>> "As
>>>>>>>>> at
>>>>>>>>>>>>>>>>> glibc
>>>>>>>>>>>>>>>>> 2.8, the NPTL threading implementation includes the guard
>>>> area
>>>>>>>>>>> within
>>>>>>>>>>>>>>>>> the stack size allocation, rather than allocating extra space
>> at
>>>>>>>>>>>>>>>>> the end
>>>>>>>>>>>>>>>>> of the stack, as POSIX.1 requires".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I encounter this problem in
>>>>>>>>> runtime/Thread/TooSmallStackSize.java
>>>>>>>>>>>>>>>>> on ppc
>>>>>>>>>>>>>>>>> with 64K pages. _compiler_thread_min_stack_allowed is
>> 128K
>>>>>>>> on
>>>>>>>>>>> ppc,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> ppc specifies two OS guard pages. The VM crashes in
>> pthread
>>>>>>>>>>> creation
>>>>>>>>>>>>>>>>> because there is no usable space in the thread stack after
>>>>>>>>> allocating
>>>>>>>>>>>>>>>>> the guard pages.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But TooSmallStackSize.java requires that the VM comes up
>>>> with
>>>>>>>>> the
>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>> size mentioned in the error message.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This fix adapts the requested stack size on Linux by the size
>> of
>>>>>>>> the
>>>>>>>>>>>>>>>>> guard pages to mimick proper behaviour, see change to
>>>>>>>>>>> os_linux.cpp.
>>>>>>>>>>>>>>>>> The change also streamlines usage of stack_guard_page on
>>>>>>>>> linuxppc,
>>>>>>>>>>>>>>>>> linuxppcle, aixppc and linuxs390.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To reproduce the error on linux_x86_64, apply below patch
>>>> and
>>>>>>>>> call
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> VM with -XX:CompilerThreadStackSize=64.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm still exploring why I had to choose such big compiler
>> stacks
>>>>>>>> on
>>>>>>>>>>>>>>>>> ppc
>>>>>>>>>>>>>>>>> to get -version passing, but I wanted to send the RFR now as
>>>>>>>>> people
>>>>>>>>>>>>>>>>> obviously looked at the bug I opened (Thanks David!).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       Goetz.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff -r b7ae012c55c3
>>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>>>>>>>>>>>>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Mon Nov
>> 07
>>>>>>>>>>> 12:37:28
>>>>>>>>>>>>>>>> 2016
>>>>>>>>>>>>>>>>> +0100
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Thu Nov
>> 10
>>>>>>>>>>>>> 16:52:17
>>>>>>>>>>>>>>>> 2016
>>>>>>>>>>>>>>>>> +0100
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> @@ -701,7 +701,7 @@
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> size_t os::Linux::default_guard_size(os::ThreadType
>> thr_type) {
>>>>>>>>>>>>>>>>>        // Creating guard page is very expensive. Java thread
>> has
>>>>>>>>> HotSpot
>>>>>>>>>>>>>>>>>        // guard page, only enable glibc guard page for non-Java
>>>>>>>>> threads.
>>>>>>>>>>>>>>>>> -  return (thr_type == java_thread ? 0 : page_size());
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +  return (thr_type == java_thread ? 0 : 64*K);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> // Java thread:
>>>>>>>>>>>>>>>>>



More information about the hotspot-runtime-dev mailing list