RFR(M): 8169373: Work around linux NPTL stack guard error.
David Holmes
david.holmes at oracle.com
Sun Nov 27 23:24:43 UTC 2016
Hi Goetz,
On 24/11/2016 10:15 PM, Lindenmaier, Goetz wrote:
> Hi,
>
> I now edited the stuff I had proposed below:
> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-stackFix/webrev.03/
> This includes
> - the NPTL fix from webrev.02
Okay in principle. As discussed this only impacts non-JavaThreads so the
change should be minimal.
> - merging code on linux
Went a bit further than I had expected but if this truly isn't CPU
dependent code then great!
> - not adding OS guard to compiler threads.
Okay in principle. IIUC we will now save the OS guard page for compiler
thread stacks. Is that the only impact? The hotspot-compiler-dev folk
may want to sign off on this part.
A few minor comments:
src/os/linux/vm/os_linux.cpp
2854 return ((thr_type == java_thread || thr_type ==
os::compiler_thread) ...
os:: should be used for both types or none.
6153 pthread_attr_getguardsize(&attr, &guard_size);
Can you at least verify a zero return code in an assert/assert_status
please.
---
src/os_cpu/aix_ppc/vm/os_aix_ppc.cpp
src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp
src/os_cpu/linux_s390/vm/os_linux_s390.cpp
Are the changes to min_stack_allowed just fixing an existing bug?
---
Thanks,
David
-----
> I think this should be pushed for this bug ID. For the other changes I'll
> make another bug.
>
> Best regards,
> Goetz.
>
>> -----Original Message-----
>> From: Lindenmaier, Goetz
>> Sent: Wednesday, November 23, 2016 8:11 AM
>> To: David Holmes <david.holmes at oracle.com>;
>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
>> Subject: RE: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>
>> Hi,
>>
>>> Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The
>>> ability for a CompilerThread to execute Java code (can_call_java()) is
>>> now a dynamic property depending on whether the current compiler is the
>>> JVMCI compiler.
>> Ah, then I should also leave space for shadow pages in the minimal stack size
>> of comiler threads.
>>
>> Do we agree on the cleanup and on leaving out the OS guard page on
>> compiler threads?
>> Then I would edit a change comprising the NPTL workaround and these
>> additional changes, and split the other issue into a new bug? I think this
>> will easy the reviewing.
>>
>> Best regards,
>> Goetz.
>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Mittwoch, 23. November 2016 02:50
>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>;
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>>
>>> On 22/11/2016 11:19 PM, Lindenmaier, Goetz wrote:
>>>> Hi Dan,
>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>> Sent: Dienstag, 22. November 2016 14:01
>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David Holmes
>>>>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
>> error.
>>>>>
>>>>> On 11/22/16 3:55 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> I ran into a row of issues, some errors on the platforms.
>>>>>>
>>>>>> What I meant with that comment is that
>>>>>> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
>>>>>> JavaThread::stack_guard_zone_size() +
>>>>>> JavaThread::stack_shadow_zone_size() +
>>>>>> (4*BytesPerWord COMPILER2_PRESENT(+2)) * 4
>> *
>>> K);
>>>>>> was executed, and min_stack_allowed used for all stacks. Now,
>> compiler
>>>>> and
>>>>>> vm minimum stack sizes are not increased by these sizes.
>>>>>
>>>>> Now I see what you mean. Thanks for clearing this up.
>>>>>
>>>>> I should have remembered that part of the change because we went
>> back
>>>>> and forth about removing the red/yellow zone pages from the other
>>>>> threads. In particular, we discussed the compiler thread because it
>>>>> is-a JavaThread. Our conclusion was that a compiler thread doesn't
>>>>> execute Java bytecode so we could remove the red/yellow pages...
>>>> Yes, it does not execute java byte code.
>>>
>>> Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The
>>> ability for a CompilerThread to execute Java code (can_call_java()) is
>>> now a dynamic property depending on whether the current compiler is the
>>> JVMCI compiler.
>>>
>>> David
>>> -----
>>>
>>>> Therefore you can remove the shadow pages. There is no code that
>>>> will bang.
>>>> But red/yellow pages are protected right during thread startup.
>>>> Therefore you must have enough space for them.
>>>> On ppc, we try to protect three 64K pages out of the 128K compiler stack.
>>>> That obviously fails.
>>>>
>>>> Therefore I propose:
>>>> size_t os::Posix::_java_thread_min_stack_allowed = 48 * K; // Set
>>> platform dependent.
>>>> in os::Posix::set_minimum_stack_sizes():
>>>> _java_thread_min_stack_allowed = _java_thread_min_stack_allowed +
>>>> JavaThread::stack_guard_zone_size() +
>>>> JavaThread::stack_shadow_zone_size();
>>>>
>>>> (Similar for _compiler_thread_min_stack_allowed).
>>>>
>>>> The minimal stack size is made up of three components:
>>>> 1. Sizes of interpreter/C1/C2 frames. Depends on HotSpot
>>> implementation/platform/os.
>>>> 2. Sizes of C++ frames: depends on C++ compiler.
>>>> These are fixed at compile time.
>>>> 3. Sizes of red/yellow/reserved/shadow pages. Depends on the system
>>> the
>>>> VM is used on. This is not fixed at compile time. (Our ppc machines
>>> differ
>>>> in page size.)
>>>> Therefore 3. should not be included in a compile time constant.
>>>>
>>>>
>>>>
>>>>> And that decision allowed us to be exposed to the 64K page issue
>>>>> because the "extra" space isn't there anymore.
>>>>>
>>>>>> At least the _compiler_thread_min_stack_allowed should be increased
>>>>>> similarly by red/yellow zone size. The compiler thread is a Java
>>>>>> thread and must have space for these zones.
>>>>>
>>>>> I'm not sure that I completely agree (yet). To me, the red/yellow
>>>>> pages are there for Java thread stack overflow semantics. Yes, the
>>>>> compiler thread needs extra space when 64K pages are used, but I
>>>>> would prefer that we add that space via a different calculation.
>>>> Yes they are. But compiler threads happen tob e a subclass of
>>>> Java threads and use the same run() method that puts the pages
>>>> There.
>>>> I agree that they are not needed for Compiler threads, nor for
>>>> CodeCacheSweeperThreads. I don't really now about
>>>> JvmtiAgentThreads and ServiceThreads, but all of the get the guard
>> pages
>>>> because they are derived from JavaThread.
>>>>
>>>>> To put it another way, I'd like to see us add extra space to solve
>>>>> the 64K page issue directly instead of as a side effect of the
>>>>> red/yellow page addition.
>>>> I don't understand. What do you mean by 'directly'?
>>>>
>>>>>> Also, the change added a test that is failing now.
>>>>>
>>>>> And that's a "good thing" (TM), right? :-)
>>>> Yes, it showed a bug and thus raised the need to fix it! :)
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>>
>>>>> Again, thanks for clarifying 8140520's role in this issue.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>>>> Sent: Montag, 21. November 2016 17:28
>>>>>>> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
>>>>>>> <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>> dev at openjdk.java.net
>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
>>> error.
>>>>>>>
>>>>>>> Sorry for the delayed responses to this thread. I've been on vacation...
>>>>>>>
>>>>>>> One comment/query embedded below...
>>>>>>>
>>>>>>>
>>>>>>> On 11/10/16 8:40 PM, David Holmes wrote:
>>>>>>>> Hi Goetz,
>>>>>>>>
>>>>>>>> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> This issue is different to 6675312, see also my comment in the bug.
>>>>>>>>>
>>>>>>>>> It appears running jtreg test
>>> runtime/Thread/TooSmallStackSize.java,
>>>>>>>>> with my patch below you can reproduce it on linuxx86_64. You can
>>> not
>>>>>>>>> do that with 6675312. Also, I would assume there are systems out
>>> there
>>>>>>>>> on x86 that uses 64-K pages, did you run the tests on these? I
>> would
>>>>>>>>> assume you get hard crashes with stack overflows in the first C2
>>>>>>>>> compilation if there is only 64K usable CompilerThreadStack.
>>>>>>>>>
>>>>>>>>> My fix does not affect Java threads, which are the largest amount
>>>>>>>>> of threads used by the VM. It affects only the non-Java threads.
>>>>>>>>> It adds one page to these threads. The page does not require
>>> memory,
>>>>>>>>> as it's protected. The stack will only require more space if the
>> thread
>>>>>>>>> ran into a stack overflow before the fix as else the pages are not
>>>>>>>>> mapped.
>>>>>>>>> This are stack overflows that cause hard crashes, at least on ppc
>> the
>>> VM
>>>>>>>>> does not properly catch these stack overflows, so any setup
>> working
>>>>> now
>>>>>>>>> will not run into the additional space. Altogether there should be
>> no
>>>>>>>>> effect on running systems besides requiring one more entry in the
>>>>>>>>> page table per non-Java thread.
>>>>>>>>>
>>>>>>>>> The problem is caused by a rather recent change (8140520:
>> segfault
>>> on
>>>>>>>>> solaris-amd64
>>>>>>>>> with "-XX:VMThreadStackSize=1" option) which was pushed after
>>>>>>>>> feature-close. As this was a rather recent change, it must be
>>>>>>>>> possible to
>>>>>>>>> fix this follow up issue. What else is this period in the project good
>>>>>>>>> for if not fixing issues?
>>>>>>>> So I am seeing a number of factors here.
>>>>>>>>
>>>>>>>> First, 8140520, set:
>>>>>>>>
>>>>>>>> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>>>>>>> So I'm confused by the above comment:
>>>>>>>
>>>>>>> > The problem is caused by a rather recent change (8140520:
>> segfault
>>>>>>> > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
>>>>>>>
>>> open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
>>>>>>>
>>>>>>> shows this change:
>>>>>>>
>>>>>>> @@ -531,19 +531,17 @@
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>> //////////////////////////////////////////////////////////////////////////////
>>>>>>> //
>>>>>>> // thread stack
>>>>>>>
>>>>>>> -size_t os::Linux::min_stack_allowed = 128*K;
>>>>>>> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>>>>>>> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
>>>>>>> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128 * K;
>>>>>>>
>>>>>>> so the existing single variable of 'min_stack_allowed' was
>>>>>>> replaced by three variables: _compiler_thread_min_stack_allowed,
>>>>>>> _java_thread_min_stack_allowed, and
>>>>>>> _vm_internal_thread_min_stack_allowed.
>>>>>>>
>>>>>>> The old single variable and the three new variables are all
>>>>>>> initialized to the same value (128K) so the fix for 8140520
>>>>>>> did not change stack sizes for this platform. In fact, only
>>>>>>> one platform had a size change (Solaris X64).
>>>>>>>
>>>>>>> So I'm confused about how the fix for 8140520 caused this problem.
>>>>>>>
>>>>>>> Based on David's analysis below, it looks to me like this 64K stack
>>>>>>> guard page problem should also exist prior to the fix for 8140520.
>>>>>>>
>>>>>>> Goetz, can you please explain how 8140520 caused this problem?
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Second on linux PPC it is hardwired to use 2 guard pages:
>>>>>>>>
>>>>>>>> return 2 * page_size();
>>>>>>>>
>>>>>>>> Third, you had a pagesize of 64K.
>>>>>>>>
>>>>>>>> Fourth, NPTL takes the guard space from the stack space - hence
>> with
>>> 2
>>>>>>>> x 64K guard, and a 128K stack it was all consumed.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> In the proposed changes you now only use page_size() for the guard,
>>> so
>>>>>>>> that alone would have fixed the observed problem.
>>>>>>>>
>>>>>>>> But in addition you want to address the NPTL problem by adding
>> back
>>>>>>>> the guard space to the stack size requested. That alone would also
>>>>>>>> have fixed the observed problem. :)
>>>>>>>>
>>>>>>>> But in addition you have increased the minimum stack size:
>>>>>>>>
>>>>>>>> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
>>>>>>>>
>>>>>>>> which again, on its own would have fixed the original problem. :)
>>>>>>>>
>>>>>>>> Did you really intend to increase the real minimum stack from 128K
>> to
>>>>>>>> 256K ? (on a 64K page system)
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Focusing simply on the shared code change to adjust the requested
>>>>>>>> stacksize by the amount of guard space (if any), this does not seem
>>>>>>>> unreasonable. As you note it is restricted to non-JavaThreads and
>> only
>>>>>>>> adds a page to reserved stack space.
>>>>>>>>
>>>>>>>> My only query now is whether the minimum stacksize detection logic
>>>>>>>> will correctly report the real minimum stack size (taking into account
>>>>>>>> the need for the guard page) ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> So I really think this issue should be fixed.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Goetz.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>> Sent: Thursday, November 10, 2016 10:02 PM
>>>>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>>>>> runtime-
>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack
>> guard
>>>>> error.
>>>>>>>>>>
>>>>>>>>>> Hi Goetz,
>>>>>>>>>>
>>>>>>>>>> As per the bug report, this issue was already known (6675312) and
>>> we
>>>>>>>>>> chose not to try and address it due to no reported issues at the
>>> time.
>>>>>>>>>> While I see that you have encountered an issue (is it real or
>>>>>>>>>> fabricated?) I think this change is too intrusive to be applied at this
>>>>>>>>>> stage of the JDK 9 release cycle, as it will change the stack
>>>>>>>>>> requirements of every application running on Linux.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please review this change. I please need a sponsor:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
>>>>>>> stackFix/webrev.01/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In the Linux NPTL pthread implementation the guard size
>>> mechanism
>>>>>>>>>>> is not
>>>>>>>>>>> implemented properly. The posix standard requires to add the
>>> size
>>>>>>>>>>> of the
>>>>>>>>>>> guard pages to the stack size, instead Linux takes the space out
>> of
>>>>>>>>>>> 'stacksize'.
>>>>>>>>>>>
>>>>>>>>>>> The Posix standard
>>>>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>>>>>>>> says "the implementation allocates extra memory at the
>>> overflow
>>>>>>>>>>> end of
>>>>>>>>>>> the stack". The linux man page
>>>>>>>>>>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As
>>> at
>>>>>>>>>>> glibc
>>>>>>>>>>> 2.8, the NPTL threading implementation includes the guard area
>>>>> within
>>>>>>>>>>> the stack size allocation, rather than allocating extra space at
>>>>>>>>>>> the end
>>>>>>>>>>> of the stack, as POSIX.1 requires".
>>>>>>>>>>>
>>>>>>>>>>> I encounter this problem in
>>> runtime/Thread/TooSmallStackSize.java
>>>>>>>>>>> on ppc
>>>>>>>>>>> with 64K pages. _compiler_thread_min_stack_allowed is 128K
>> on
>>>>> ppc,
>>>>>>> and
>>>>>>>>>>> ppc specifies two OS guard pages. The VM crashes in pthread
>>>>> creation
>>>>>>>>>>> because there is no usable space in the thread stack after
>>> allocating
>>>>>>>>>>> the guard pages.
>>>>>>>>>>>
>>>>>>>>>>> But TooSmallStackSize.java requires that the VM comes up with
>>> the
>>>>>>>>>>> stack
>>>>>>>>>>> size mentioned in the error message.
>>>>>>>>>>>
>>>>>>>>>>> This fix adapts the requested stack size on Linux by the size of
>> the
>>>>>>>>>>> guard pages to mimick proper behaviour, see change to
>>>>> os_linux.cpp.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The change also streamlines usage of stack_guard_page on
>>> linuxppc,
>>>>>>>>>>> linuxppcle, aixppc and linuxs390.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To reproduce the error on linux_x86_64, apply below patch and
>>> call
>>>>> the
>>>>>>>>>>> VM with -XX:CompilerThreadStackSize=64.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm still exploring why I had to choose such big compiler stacks
>> on
>>>>>>>>>>> ppc
>>>>>>>>>>> to get -version passing, but I wanted to send the RFR now as
>>> people
>>>>>>>>>>> obviously looked at the bug I opened (Thanks David!).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> Goetz.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>>>>>>>>
>>>>>>>>>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Mon Nov 07
>>>>> 12:37:28
>>>>>>>>>> 2016
>>>>>>>>>>> +0100
>>>>>>>>>>>
>>>>>>>>>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Thu Nov 10
>>>>>>> 16:52:17
>>>>>>>>>> 2016
>>>>>>>>>>> +0100
>>>>>>>>>>>
>>>>>>>>>>> @@ -701,7 +701,7 @@
>>>>>>>>>>>
>>>>>>>>>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
>>>>>>>>>>>
>>>>>>>>>>> // Creating guard page is very expensive. Java thread has
>>> HotSpot
>>>>>>>>>>>
>>>>>>>>>>> // guard page, only enable glibc guard page for non-Java
>>> threads.
>>>>>>>>>>>
>>>>>>>>>>> - return (thr_type == java_thread ? 0 : page_size());
>>>>>>>>>>>
>>>>>>>>>>> + return (thr_type == java_thread ? 0 : 64*K);
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> // Java thread:
>>>>>>>>>>>
>>>>
More information about the hotspot-runtime-dev
mailing list