RFR(M): 8169373: Work around linux NPTL stack guard error.
Daniel D. Daugherty
daniel.daugherty at oracle.com
Tue Nov 22 13:00:37 UTC 2016
On 11/22/16 3:55 AM, Lindenmaier, Goetz wrote:
> Hi Dan,
>
> I ran into a row of issues, some errors on the platforms.
>
> What I meant with that comment is that
> os::Linux::min_stack_allowed = MAX2(os::Linux::min_stack_allowed,
> JavaThread::stack_guard_zone_size() +
> JavaThread::stack_shadow_zone_size() +
> (4*BytesPerWord COMPILER2_PRESENT(+2)) * 4 * K);
> was executed, and min_stack_allowed used for all stacks. Now, compiler and
> vm minimum stack sizes are not increased by these sizes.
Now I see what you mean. Thanks for clearing this up.
I should have remembered that part of the change because we went back
and forth about removing the red/yellow zone pages from the other
threads. In particular, we discussed the compiler thread because it
is-a JavaThread. Our conclusion was that a compiler thread doesn't
execute Java bytecode so we could remove the red/yellow pages...
And that decision allowed us to be exposed to the 64K page issue
because the "extra" space isn't there anymore.
> At least the _compiler_thread_min_stack_allowed should be increased similarly
> by red/yellow zone size. The compiler thread is a Java thread and must have space
> for these zones.
I'm not sure that I completely agree (yet). To me, the red/yellow
pages are there for Java thread stack overflow semantics. Yes, the
compiler thread needs extra space when 64K pages are used, but I
would prefer that we add that space via a different calculation.
To put it another way, I'd like to see us add extra space to solve
the 64K page issue directly instead of as a side effect of the
red/yellow page addition.
> Also, the change added a test that is failing now.
And that's a "good thing" (TM), right? :-)
Again, thanks for clarifying 8140520's role in this issue.
Dan
>
> Best regards,
> Goetz.
>
>> -----Original Message-----
>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>> Sent: Montag, 21. November 2016 17:28
>> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
>> <goetz.lindenmaier at sap.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>
>> Sorry for the delayed responses to this thread. I've been on vacation...
>>
>> One comment/query embedded below...
>>
>>
>> On 11/10/16 8:40 PM, David Holmes wrote:
>>> Hi Goetz,
>>>
>>> On 11/11/2016 8:00 AM, Lindenmaier, Goetz wrote:
>>>> Hi David,
>>>>
>>>> This issue is different to 6675312, see also my comment in the bug.
>>>>
>>>> It appears running jtreg test runtime/Thread/TooSmallStackSize.java,
>>>> with my patch below you can reproduce it on linuxx86_64. You can not
>>>> do that with 6675312. Also, I would assume there are systems out there
>>>> on x86 that uses 64-K pages, did you run the tests on these? I would
>>>> assume you get hard crashes with stack overflows in the first C2
>>>> compilation if there is only 64K usable CompilerThreadStack.
>>>>
>>>> My fix does not affect Java threads, which are the largest amount
>>>> of threads used by the VM. It affects only the non-Java threads.
>>>> It adds one page to these threads. The page does not require memory,
>>>> as it's protected. The stack will only require more space if the thread
>>>> ran into a stack overflow before the fix as else the pages are not
>>>> mapped.
>>>> This are stack overflows that cause hard crashes, at least on ppc the VM
>>>> does not properly catch these stack overflows, so any setup working now
>>>> will not run into the additional space. Altogether there should be no
>>>> effect on running systems besides requiring one more entry in the
>>>> page table per non-Java thread.
>>>>
>>>> The problem is caused by a rather recent change (8140520: segfault on
>>>> solaris-amd64
>>>> with "-XX:VMThreadStackSize=1" option) which was pushed after
>>>> feature-close. As this was a rather recent change, it must be
>>>> possible to
>>>> fix this follow up issue. What else is this period in the project good
>>>> for if not fixing issues?
>>> So I am seeing a number of factors here.
>>>
>>> First, 8140520, set:
>>>
>>> size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>> So I'm confused by the above comment:
>>
>> > The problem is caused by a rather recent change (8140520: segfault
>> > on solaris-amd64 with "-XX:VMThreadStackSize=1" option)
>>
>> http://cr.openjdk.java.net/~dcubed/8140520-webrev/5-jdk9-hs-
>> open/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.cpp.frames.html
>>
>> shows this change:
>>
>> @@ -531,19 +531,17 @@
>> }
>>
>>
>> //////////////////////////////////////////////////////////////////////////////
>> //
>> // thread stack
>>
>> -size_t os::Linux::min_stack_allowed = 128*K;
>> +size_t os::Posix::_compiler_thread_min_stack_allowed = 128 * K;
>> +size_t os::Posix::_java_thread_min_stack_allowed = 128 * K;
>> +size_t os::Posix::_vm_internal_thread_min_stack_allowed = 128 * K;
>>
>> so the existing single variable of 'min_stack_allowed' was
>> replaced by three variables: _compiler_thread_min_stack_allowed,
>> _java_thread_min_stack_allowed, and
>> _vm_internal_thread_min_stack_allowed.
>>
>> The old single variable and the three new variables are all
>> initialized to the same value (128K) so the fix for 8140520
>> did not change stack sizes for this platform. In fact, only
>> one platform had a size change (Solaris X64).
>>
>> So I'm confused about how the fix for 8140520 caused this problem.
>>
>> Based on David's analysis below, it looks to me like this 64K stack
>> guard page problem should also exist prior to the fix for 8140520.
>>
>> Goetz, can you please explain how 8140520 caused this problem?
>>
>> Dan
>>
>>
>>
>>> Second on linux PPC it is hardwired to use 2 guard pages:
>>>
>>> return 2 * page_size();
>>>
>>> Third, you had a pagesize of 64K.
>>>
>>> Fourth, NPTL takes the guard space from the stack space - hence with 2
>>> x 64K guard, and a 128K stack it was all consumed.
>>>
>>> ---
>>>
>>> In the proposed changes you now only use page_size() for the guard, so
>>> that alone would have fixed the observed problem.
>>>
>>> But in addition you want to address the NPTL problem by adding back
>>> the guard space to the stack size requested. That alone would also
>>> have fixed the observed problem. :)
>>>
>>> But in addition you have increased the minimum stack size:
>>>
>>> ! size_t os::Posix::_compiler_thread_min_stack_allowed = 192 * K;
>>>
>>> which again, on its own would have fixed the original problem. :)
>>>
>>> Did you really intend to increase the real minimum stack from 128K to
>>> 256K ? (on a 64K page system)
>>>
>>> ---
>>>
>>> Focusing simply on the shared code change to adjust the requested
>>> stacksize by the amount of guard space (if any), this does not seem
>>> unreasonable. As you note it is restricted to non-JavaThreads and only
>>> adds a page to reserved stack space.
>>>
>>> My only query now is whether the minimum stacksize detection logic
>>> will correctly report the real minimum stack size (taking into account
>>> the need for the guard page) ?
>>>
>>> Thanks,
>>> David
>>>
>>>> So I really think this issue should be fixed.
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Thursday, November 10, 2016 10:02 PM
>>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
>> runtime-
>>>>> dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8169373: Work around linux NPTL stack guard error.
>>>>>
>>>>> Hi Goetz,
>>>>>
>>>>> As per the bug report, this issue was already known (6675312) and we
>>>>> chose not to try and address it due to no reported issues at the time.
>>>>> While I see that you have encountered an issue (is it real or
>>>>> fabricated?) I think this change is too intrusive to be applied at this
>>>>> stage of the JDK 9 release cycle, as it will change the stack
>>>>> requirements of every application running on Linux.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 11/11/2016 1:58 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please review this change. I please need a sponsor:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8169373-ppc-
>> stackFix/webrev.01/
>>>>>>
>>>>>>
>>>>>> In the Linux NPTL pthread implementation the guard size mechanism
>>>>>> is not
>>>>>> implemented properly. The posix standard requires to add the size
>>>>>> of the
>>>>>> guard pages to the stack size, instead Linux takes the space out of
>>>>>> 'stacksize'.
>>>>>>
>>>>>> The Posix standard
>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>>> says "the implementation allocates extra memory at the overflow
>>>>>> end of
>>>>>> the stack". The linux man page
>>>>>> https://linux.die.net/man/3/pthread_attr_setguardsize says "As at
>>>>>> glibc
>>>>>> 2.8, the NPTL threading implementation includes the guard area within
>>>>>> the stack size allocation, rather than allocating extra space at
>>>>>> the end
>>>>>> of the stack, as POSIX.1 requires".
>>>>>>
>>>>>> I encounter this problem in runtime/Thread/TooSmallStackSize.java
>>>>>> on ppc
>>>>>> with 64K pages. _compiler_thread_min_stack_allowed is 128K on ppc,
>> and
>>>>>> ppc specifies two OS guard pages. The VM crashes in pthread creation
>>>>>> because there is no usable space in the thread stack after allocating
>>>>>> the guard pages.
>>>>>>
>>>>>> But TooSmallStackSize.java requires that the VM comes up with the
>>>>>> stack
>>>>>> size mentioned in the error message.
>>>>>>
>>>>>> This fix adapts the requested stack size on Linux by the size of the
>>>>>> guard pages to mimick proper behaviour, see change to os_linux.cpp.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The change also streamlines usage of stack_guard_page on linuxppc,
>>>>>> linuxppcle, aixppc and linuxs390.
>>>>>>
>>>>>>
>>>>>>
>>>>>> To reproduce the error on linux_x86_64, apply below patch and call the
>>>>>> VM with -XX:CompilerThreadStackSize=64.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm still exploring why I had to choose such big compiler stacks on
>>>>>> ppc
>>>>>> to get -version passing, but I wanted to send the RFR now as people
>>>>>> obviously looked at the bug I opened (Thanks David!).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Goetz.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> diff -r b7ae012c55c3 src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>>>
>>>>>> --- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Mon Nov 07 12:37:28
>>>>> 2016
>>>>>> +0100
>>>>>>
>>>>>> +++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp Thu Nov 10
>> 16:52:17
>>>>> 2016
>>>>>> +0100
>>>>>>
>>>>>> @@ -701,7 +701,7 @@
>>>>>>
>>>>>> size_t os::Linux::default_guard_size(os::ThreadType thr_type) {
>>>>>>
>>>>>> // Creating guard page is very expensive. Java thread has HotSpot
>>>>>>
>>>>>> // guard page, only enable glibc guard page for non-Java threads.
>>>>>>
>>>>>> - return (thr_type == java_thread ? 0 : page_size());
>>>>>>
>>>>>> + return (thr_type == java_thread ? 0 : 64*K);
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> // Java thread:
>>>>>>
More information about the hotspot-runtime-dev
mailing list