RFR(M): 8169373: Work around linux NPTL stack guard error.

David Holmes david.holmes at oracle.com
Wed Nov 23 01:49:35 UTC 2016


On 22/11/2016 11:19 PM, Lindenmaier, Goetz wrote:
> Hi Dan,
>
>> -----Original Message-----
>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>> Sent: Dienstag, 22. November 2016 14:01
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; David Holmes
>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>
>> On 11/22/16 3:55 AM, Lindenmaier, Goetz wrote:
>>> Hi Dan,
>>>
>>> I ran into a row of issues, some errors on the platforms.
>>>
>>> What I meant with that comment is that
>>> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
>>>                                        JavaThread::stack_guard_zone_size() +
>>>                                        JavaThread::stack_shadow_zone_size() +
>>>                                        (4*BytesPerWord COMPILER2_PRESENT(+2)) * 4 * K);
>>> was executed, and min_stack_allowed used for all stacks. Now, compiler
>> and
>>> vm minimum stack sizes are not increased by these sizes.
>>
>> Now I see what you mean. Thanks for clearing this up.
>>
>> I should have remembered that part of the change because we went back
>> and forth about removing the red/yellow zone pages from the other
>> threads. In particular, we discussed the compiler thread because it
>> is-a JavaThread. Our conclusion was that a compiler thread doesn't
>> execute Java bytecode so we could remove the red/yellow pages...
> Yes, it does not execute java byte code.

Bzzzt! Sorry Goetz and Dan but that is no longer correct with JVMCI. The 
ability for a CompilerThread to execute Java code (can_call_java()) is 
now a dynamic property depending on whether the current compiler is the 
JVMCI compiler.

David
-----

> Therefore you can remove the shadow pages.  There is no code that
> will bang.
> But red/yellow pages are protected right during thread startup.
> Therefore you must have enough space for them.
> On ppc, we try to protect three 64K pages out of the 128K compiler stack.
> That obviously fails.
>
> Therefore I propose:
>      size_t os::Posix::_java_thread_min_stack_allowed = 48 * K;   // Set platform dependent.
> in  os::Posix::set_minimum_stack_sizes():
>   _java_thread_min_stack_allowed = _java_thread_min_stack_allowed +
>                                    JavaThread::stack_guard_zone_size() +
>                                    JavaThread::stack_shadow_zone_size();
>
> (Similar for _compiler_thread_min_stack_allowed).
>
> The minimal stack size is made up of three components:
>    1. Sizes of interpreter/C1/C2 frames. Depends on HotSpot implementation/platform/os.
>    2. Sizes of C++ frames: depends on C++ compiler.
> These are fixed at compile time.
>    3. Sizes of red/yellow/reserved/shadow pages. Depends on the system the
>        VM is used on.  This is not fixed at compile time. (Our ppc machines differ
>       in page size.)
> Therefore 3. should not be included in a compile time constant.
>
>
>
>> And that decision allowed us to be exposed to the 64K page issue
>> because the "extra" space isn't there anymore.
>>
>>> At least the _compiler_thread_min_stack_allowed should be increased
>>> similarly by red/yellow zone size.  The compiler thread is a Java
>>> thread and must have space for these zones.
>>
>> I'm not sure that I completely agree (yet). To me, the red/yellow
>> pages are there for Java thread stack overflow semantics. Yes, the
>> compiler thread needs extra space when 64K pages are used, but I
>> would prefer that we add that space via a different calculation.
> Yes they are. But compiler threads happen tob e a subclass of
> Java threads and use the same run() method that puts the pages
> There.
> I agree that they are not needed for Compiler threads, nor for
> CodeCacheSweeperThreads. I don't really now about
>  JvmtiAgentThreads and ServiceThreads, but all of the get the guard pages
> because they are derived from JavaThread.
>
>> To put it another way, I'd like to see us add extra space to solve
>> the 64K page issue directly instead of as a side effect of the
>> red/yellow page addition.
> I don't understand.  What do you mean by 'directly'?
>
>>> Also, the change added a test that is failing now.
>>
>> And that's a "good thing" (TM), right? :-)
> Yes, it showed a bug and thus raised the need to fix it! :)
>
> Best regards,
>   Goetz.
>
>
>> Again, thanks for clarifying 8140520's role in this issue.
>>
>> Dan
>>
>>
>>>
>>> Best regards,
>>>    Goetz.
>>>
>>>> -----Original Message-----
>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>> Sent: Montag, 21. November 2016 17:28
>>>> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
>>>> <goetz.lindenmaier at sap.com>; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>>>
>>>> Sorry for the delayed responses to this thread. I've been on vacation...
>>>>
>>>> One comment/query embedded below...
>>>>
>>>>
>>>> On 11/10/16 8:40 PM, David Holmes wrote:
>>>>> Hi Goetz,
>>>>>
>>>>> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> This issue is different to 6675312, see also my comment in the bug.
>>>>>>
>>>>>> It appears running jtreg test runtime/Thread/TooSmallStackSize.java,
>>>>>> with my patch below you can reproduce it on linuxx86_64. You can not
>>>>>> do that with 6675312. Also, I would assume there are systems out there
>>>>>> on x86 that uses 64-K pages, did you run the tests on these? I would
>>>>>> assume you get hard crashes with stack overflows in the first C2
>>>>>> compilation if there is only 64K usable CompilerThreadStack.
>>>>>>
>>>>>> My fix does not affect Java threads, which are the largest amount
>>>>>> of threads used by the VM. It affects only the non-Java threads.
>>>>>> It adds one page to these threads. The page does not require memory,
>>>>>> as it's protected. The stack will only require more space if the thread
>>>>>> ran into a stack overflow before the fix as else the pages are not
>>>>>> mapped.
>>>>>> This are stack overflows that cause hard crashes, at least on ppc the VM
>>>>>> does not properly catch these stack overflows, so any setup working
>> now
>>>>>> will not run into the additional space. Altogether there should be no
>>>>>> effect on running systems besides requiring one more entry in the
>>>>>> page table per non-Java thread.
>>>>>>
>>>>>> The problem is caused by a rather recent change (8140520: segfault on
>>>>>> solaris-amd64
>>>>>> with "-XX:VMThreadStackSize=1" option) which was pushed after
>>>>>> feature-close. As this was a rather recent change, it must be
>>>>>> possible to
>>>>>> fix this follow up issue. What else is this period in the project good
>>>>>> for if not fixing issues?
>>>>> So I am seeing a number of factors here.
>>>>>
>>>>> First, 8140520, set:
>>>>>
>>>>> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>>>> So I'm confused by the above comment:
>>>>
>>>>   > The problem is caused by a rather recent change (8140520: segfault
>>>>   > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
>>>> open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
>>>>
>>>> shows this change:
>>>>
>>>> @@ -531,19 +531,17 @@
>>>>    }
>>>>
>>>>
>>>>
>> //////////////////////////////////////////////////////////////////////////////
>>>> //
>>>>    // thread stack
>>>>
>>>> -size_t os::Linux::min_stack_allowed = 128*K;
>>>> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>>>> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
>>>> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128 * K;
>>>>
>>>> so the existing single variable of 'min_stack_allowed' was
>>>> replaced by three variables: _compiler_thread_min_stack_allowed,
>>>> _java_thread_min_stack_allowed, and
>>>> _vm_internal_thread_min_stack_allowed.
>>>>
>>>> The old single variable and the three new variables are all
>>>> initialized to the same value (128K) so the fix for 8140520
>>>> did not change stack sizes for this platform. In fact, only
>>>> one platform had a size change (Solaris X64).
>>>>
>>>> So I'm confused about how the fix for 8140520 caused this problem.
>>>>
>>>> Based on David's analysis below, it looks to me like this 64K stack
>>>> guard page problem should also exist prior to the fix for 8140520.
>>>>
>>>> Goetz, can you please explain how 8140520 caused this problem?
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>>> Second on linux PPC it is hardwired to use 2 guard pages:
>>>>>
>>>>>   return 2 * page_size();
>>>>>
>>>>> Third, you had a pagesize of 64K.
>>>>>
>>>>> Fourth, NPTL takes the guard space from the stack space - hence with 2
>>>>> x 64K guard, and a 128K stack it was all consumed.
>>>>>
>>>>> ---
>>>>>
>>>>> In the proposed changes you now only use page_size() for the guard, so
>>>>> that alone would have fixed the observed problem.
>>>>>
>>>>> But in addition you want to address the NPTL problem by adding back
>>>>> the guard space to the stack size requested. That alone would also
>>>>> have fixed the observed problem. :)
>>>>>
>>>>> But in addition you have increased the minimum stack size:
>>>>>
>>>>> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
>>>>>
>>>>> which again, on its own would have fixed the original problem. :)
>>>>>
>>>>> Did you really intend to increase the real minimum stack from 128K to
>>>>> 256K ? (on a 64K page system)
>>>>>
>>>>> ---
>>>>>
>>>>> Focusing simply on the shared code change to adjust the requested
>>>>> stacksize by the amount of guard space (if any), this does not seem
>>>>> unreasonable. As you note it is restricted to non-JavaThreads and only
>>>>> adds a page to reserved stack space.
>>>>>
>>>>> My only query now is whether the minimum stacksize detection logic
>>>>> will correctly report the real minimum stack size (taking into account
>>>>> the need for the guard page) ?
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> So I really think this issue should be fixed.
>>>>>>
>>>>>> Best regards,
>>>>>>    Goetz.
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Thursday, November 10, 2016 10:02 PM
>>>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>>>> runtime-
>>>>>>> dev at openjdk.java.net
>>>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard
>> error.
>>>>>>>
>>>>>>> Hi Goetz,
>>>>>>>
>>>>>>> As per the bug report, this issue was already known (6675312) and we
>>>>>>> chose not to try and address it due to no reported issues at the time.
>>>>>>> While I see that you have encountered an issue (is it real or
>>>>>>> fabricated?) I think this change is too intrusive to be applied at this
>>>>>>> stage of the JDK 9 release cycle, as it will change the stack
>>>>>>> requirements of every application running on Linux.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Please review this change. I please need a sponsor:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
>>>> stackFix/webrev.01/
>>>>>>>>
>>>>>>>>
>>>>>>>> In the Linux NPTL pthread implementation the guard size mechanism
>>>>>>>> is not
>>>>>>>> implemented properly. The posix standard requires to add the size
>>>>>>>> of the
>>>>>>>> guard pages to the stack size, instead Linux takes the space out of
>>>>>>>> 'stacksize'.
>>>>>>>>
>>>>>>>> The Posix standard
>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>>>>>   says "the implementation allocates extra memory at the overflow
>>>>>>>> end of
>>>>>>>> the stack". The linux man page
>>>>>>>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As at
>>>>>>>> glibc
>>>>>>>> 2.8, the NPTL threading implementation includes the guard area
>> within
>>>>>>>> the stack size allocation, rather than allocating extra space at
>>>>>>>> the end
>>>>>>>> of the stack, as POSIX.1 requires".
>>>>>>>>
>>>>>>>> I encounter this problem in runtime/Thread/TooSmallStackSize.java
>>>>>>>> on ppc
>>>>>>>> with 64K pages. _compiler_thread_min_stack_allowed is 128K on
>> ppc,
>>>> and
>>>>>>>> ppc specifies two OS guard pages. The VM crashes in pthread
>> creation
>>>>>>>> because there is no usable space in the thread stack after allocating
>>>>>>>> the guard pages.
>>>>>>>>
>>>>>>>> But TooSmallStackSize.java requires that the VM comes up with the
>>>>>>>> stack
>>>>>>>> size mentioned in the error message.
>>>>>>>>
>>>>>>>> This fix adapts the requested stack size on Linux by the size of the
>>>>>>>> guard pages to mimick proper behaviour, see change to
>> os_linux.cpp.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The change also streamlines usage of stack_guard_page on linuxppc,
>>>>>>>> linuxppcle, aixppc and linuxs390.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> To reproduce the error on linux_x86_64, apply below patch and call
>> the
>>>>>>>> VM with -XX:CompilerThreadStackSize=64.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm still exploring why I had to choose such big compiler stacks on
>>>>>>>> ppc
>>>>>>>> to get -version passing, but I wanted to send the RFR now as people
>>>>>>>> obviously looked at the bug I opened (Thanks David!).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>    Goetz.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>>>>>
>>>>>>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Mon Nov 07
>> 12:37:28
>>>>>>> 2016
>>>>>>>> +0100
>>>>>>>>
>>>>>>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Thu Nov 10
>>>> 16:52:17
>>>>>>> 2016
>>>>>>>> +0100
>>>>>>>>
>>>>>>>> @@ -701,7 +701,7 @@
>>>>>>>>
>>>>>>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
>>>>>>>>
>>>>>>>>     // Creating guard page is very expensive. Java thread has HotSpot
>>>>>>>>
>>>>>>>>     // guard page, only enable glibc guard page for non-Java threads.
>>>>>>>>
>>>>>>>> -  return (thr_type == java_thread ? 0 : page_size());
>>>>>>>>
>>>>>>>> +  return (thr_type == java_thread ? 0 : 64*K);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> // Java thread:
>>>>>>>>
>


More information about the hotspot-runtime-dev mailing list