RFR: 8170307: Stack size option -Xss is ignored

Fri Dec 2 05:51:00 UTC 2016

Investigating this further the history is quite complex, especially when 
we start looking at other platforms. E.g. see

https://bugs.openjdk.java.net/browse/JDK-6269555

Solaris actually hard-wires an 8MB limit for the primordial thread.

I'm very tempted to do the same on Linux.

David
-----

On 30/11/2016 6:46 PM, David Holmes wrote:
> On 30/11/2016 6:17 PM, Thomas Stüfe wrote:
>> On Wed, Nov 30, 2016 at 8:35 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     On 29/11/2016 10:25 PM, David Holmes wrote:
>>
>>         I just realized I overlooked the case where ThreadStackSize=0
>>         and the
>>         stack is unlimited. In that case it isn't clear where the guard
>>         pages
>>         will get inserted - I do know that I don't get a stackoverflow
>>         error.
>>
>>         This needs further investigation.
>>
>>
>>     So what happens here is that the massive stack-size causes
>>     stack-bottom to be higher than stack-top! So we will set a
>>     guard-page goodness knows where, and we can consume the current
>>     stack until such time as we hit an unmapped or protected region at
>>     which point we are killed.
>>
>>     I'm not sure what to do here. My gut feel is that in such a case we
>>     should not attempt to create a guard page in the initial thread.
>>     That would require using a sentinel value for the stack-size. Though
>>     it also presents a problem for stack-bottom - which is implicitly
>>     zero. It may also give false positives in the is_initial_thread()
>> check!
>>
>>     Thoughts? Suggestions?
>>
>>
>> Maybe I am overlooking something, but should
>> os::capture_initial_thread() not call pthread_getattr_np() first to
>> handle the case where the VM was created on a pthread which is not the
>> primordial thread and may have a different stack size than what
>> getrlimit returns? And fall back to getrlimit only if
>> pthread_getattr_np() fails?
>
> My understanding of the problem (which likely no longer exists) is that
> pthread_getattr_np didn't fail as such but returned bogus values - so
> the problem was not detectable and so we just had to not use
> pthread_getattr_np.
>
>> And then we also should handle
>> RLIM_INFINITY. For that case, I also think not setting guard pages would
>> be safest.
>>
>> We also may just refuse to run in that case, because the workaround for
>> the user is easy - just set the limit before process start. Note that on
>> AIX, we currently refuse to run on the primordial thread because it may
>> have different page sizes than pthreads and it is impossible to get the
>> exact stack locations.
>
> I was wondering why the AIX set up seemed so simple in comparison :)
>
> Thanks,
> David
>
>>
>> Thomas
>>
>>
>>
>>         David
>>
>>         On 29/11/2016 9:59 PM, David Holmes wrote:
>>
>>             Hi Thomas,
>>
>>             On 29/11/2016 8:39 PM, Thomas Stüfe wrote:
>>
>>                 Hi David,
>>
>>                 thanks for the good explanation. Change looks good, I
>>                 really like the
>>                 comment in capture_initial_stack().
>>
>>                 Question, with -Xss given and being smaller than current
>>                 thread stack
>>                 size, guard pages may appear in the middle of the
>>                 invoking thread stack?
>>                 I always thought this is a bit dangerous. If your model
>>                 is to have the
>>                 VM created from the main thread, which then goes off to
>>                 do different
>>                 things, and have other threads then attach and run java
>>                 code, main
>>                 thread later may crash in unrelated native code just
>>                 because it reached
>>                 the stack depth of the hava threads? Or am I
>>                 misunderstanding something?
>>
>>
>>             There is no change to the general behaviour other than
>>             allowing a
>>             primordial process thread that launches the VM, to now not
>>             have an
>>             effective stack limited at 2MB. The current logic will
>>             insert guard
>>             pages where ever -Xss states (as long as less than 2MB else
>>             2MB), while
>>             with the fix the guard pages will be inserted above 2MB - as
>>             dictated by
>>             -Xss.
>>
>>             David
>>             -----
>>
>>                 Thanks, Thomas
>>
>>
>>                 On Fri, Nov 25, 2016 at 11:38 AM, David Holmes
>>                 <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>                 <mailto:david.holmes at oracle.com
>>                 <mailto:david.holmes at oracle.com>>> wrote:
>>
>>                     Bug:
>>                 https://bugs.openjdk.java.net/browse/JDK-8170307
>>                 <https://bugs.openjdk.java.net/browse/JDK-8170307>
>>                     <https://bugs.openjdk.java.net/browse/JDK-8170307
>>                 <https://bugs.openjdk.java.net/browse/JDK-8170307>>
>>
>>                     The bug is not public unfortunately for
>>                 non-technical reasons - but
>>                     see my eval below.
>>
>>                     Background: if you load the JVM from the primordial
>>                 thread of a
>>                     process (not done by the java launcher since JDK 6),
>>                 there is an
>>                     artificial stack limit imposed on the initial thread
>>                 (by sticking
>>                     the guard page at the limit position of the actual
>>                 stack) of the
>>                     minimum of the -Xss setting and 2M. So if you set
>>                 -Xss to > 2M it is
>>                     ignored for the main thread even if the true stack
>>                 is, say, 8M. This
>>                     limitation dates back 10-15 years and is no longer
>>                 relevant today
>>                     and should be removed (see below). I've also added
>>                 additional
>>                     explanatory notes.
>>
>>                     webrev:
>>                 http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>>                 <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>
>>                     <http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>>                 <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>>
>>
>>                     Testing was manually done by modifying the launcher
>>                 to not run the
>>                     VM in a new thread, and checking the resulting stack
>>                 size used.
>>
>>                     This change will only affect hosted JVMs launched
>>                 with a -Xss value
>>                     > 2M.
>>
>>                     Thanks,
>>                     David
>>                     -----
>>
>>                     Bug eval:
>>
>>                     JDK-4441425 limits the stack to 8M as a safeguard
>>                 against an
>>                     unlimited value from getrlimit in 1.3.1, but further
>>                 constrained
>>                     that to 2M in 1.4.0 due to JDK-4466587.
>>
>>                     By 1.4.2 we have the basic form of the current
>>                 problematic code:
>>
>>                     #ifndef IA64
>>                       if (rlim.rlim_cur > 2 * K * K) rlim.rlim_cur = 2 *
>>                 K * K;
>>                     #else
>>                       // Problem still exists RH7.2 (IA64 anyway) but
>>                 2MB is a little
>>                 small
>>                       if (rlim.rlim_cur > 4 * K * K) rlim.rlim_cur = 4 *
>>                 K * K;
>>                     #endif
>>
>>                       _initial_thread_stack_size = rlim.rlim_cur &
>>                 ~(page_size() - 1);
>>
>>                       if (max_size && _initial_thread_stack_size >
>>                 max_size) {
>>                          _initial_thread_stack_size = max_size;
>>                       }
>>
>>                     This was added by JDK-4678676 to allow the stack of
>>                 the main thread
>>                     to be _reduced_ below the default 2M/4M if the -Xss
>>                 value was
>>                     smaller than that.** There was no intent to allow
>>                 the stack size to
>>                     follow -Xss arbitrarily due to the operational
>>                 constraints imposed
>>                     by the OS/glibc at the time when dealing with the
>>                 primordial process
>>                     thread.
>>
>>                     ** It could not actually change the actual stack
>>                 size of course, but
>>                     set the guard pages to limit use to the expected
>>                 stack size.
>>
>>                     In JDK 6, under JDK-6316197, the launcher was
>>                 changed to create the
>>                     JVM in a new thread, so that it was not limited by
>> the
>>                     idiosyncracies of the OS or thread library
>>                 primordial thread
>>                     handling. However, the stack size limitations
>>                 remained in place in
>>                     case the VM was launched from the primordial thread
>>                 of a user
>>                     application via the JNI invocation API.
>>
>>                     I believe it should be safe to remove the 2M
>>                 limitation now.
>>
>>
>>