RFR: 8170307: Stack size option -Xss is ignored

Wed Nov 30 08:46:47 UTC 2016

On 30/11/2016 6:17 PM, Thomas Stüfe wrote:
> On Wed, Nov 30, 2016 at 8:35 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     On 29/11/2016 10:25 PM, David Holmes wrote:
>
>         I just realized I overlooked the case where ThreadStackSize=0
>         and the
>         stack is unlimited. In that case it isn't clear where the guard
>         pages
>         will get inserted - I do know that I don't get a stackoverflow
>         error.
>
>         This needs further investigation.
>
>
>     So what happens here is that the massive stack-size causes
>     stack-bottom to be higher than stack-top! So we will set a
>     guard-page goodness knows where, and we can consume the current
>     stack until such time as we hit an unmapped or protected region at
>     which point we are killed.
>
>     I'm not sure what to do here. My gut feel is that in such a case we
>     should not attempt to create a guard page in the initial thread.
>     That would require using a sentinel value for the stack-size. Though
>     it also presents a problem for stack-bottom - which is implicitly
>     zero. It may also give false positives in the is_initial_thread() check!
>
>     Thoughts? Suggestions?
>
>
> Maybe I am overlooking something, but should
> os::capture_initial_thread() not call pthread_getattr_np() first to
> handle the case where the VM was created on a pthread which is not the
> primordial thread and may have a different stack size than what
> getrlimit returns? And fall back to getrlimit only if
> pthread_getattr_np() fails?

My understanding of the problem (which likely no longer exists) is that 
pthread_getattr_np didn't fail as such but returned bogus values - so 
the problem was not detectable and so we just had to not use 
pthread_getattr_np.

> And then we also should handle
> RLIM_INFINITY. For that case, I also think not setting guard pages would
> be safest.
>
> We also may just refuse to run in that case, because the workaround for
> the user is easy - just set the limit before process start. Note that on
> AIX, we currently refuse to run on the primordial thread because it may
> have different page sizes than pthreads and it is impossible to get the
> exact stack locations.

I was wondering why the AIX set up seemed so simple in comparison :)

Thanks,
David

>
> Thomas
>
>
>
>         David
>
>         On 29/11/2016 9:59 PM, David Holmes wrote:
>
>             Hi Thomas,
>
>             On 29/11/2016 8:39 PM, Thomas Stüfe wrote:
>
>                 Hi David,
>
>                 thanks for the good explanation. Change looks good, I
>                 really like the
>                 comment in capture_initial_stack().
>
>                 Question, with -Xss given and being smaller than current
>                 thread stack
>                 size, guard pages may appear in the middle of the
>                 invoking thread stack?
>                 I always thought this is a bit dangerous. If your model
>                 is to have the
>                 VM created from the main thread, which then goes off to
>                 do different
>                 things, and have other threads then attach and run java
>                 code, main
>                 thread later may crash in unrelated native code just
>                 because it reached
>                 the stack depth of the hava threads? Or am I
>                 misunderstanding something?
>
>
>             There is no change to the general behaviour other than
>             allowing a
>             primordial process thread that launches the VM, to now not
>             have an
>             effective stack limited at 2MB. The current logic will
>             insert guard
>             pages where ever -Xss states (as long as less than 2MB else
>             2MB), while
>             with the fix the guard pages will be inserted above 2MB - as
>             dictated by
>             -Xss.
>
>             David
>             -----
>
>                 Thanks, Thomas
>
>
>                 On Fri, Nov 25, 2016 at 11:38 AM, David Holmes
>                 <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>                 <mailto:david.holmes at oracle.com
>                 <mailto:david.holmes at oracle.com>>> wrote:
>
>                     Bug:
>                 https://bugs.openjdk.java.net/browse/JDK-8170307
>                 <https://bugs.openjdk.java.net/browse/JDK-8170307>
>                     <https://bugs.openjdk.java.net/browse/JDK-8170307
>                 <https://bugs.openjdk.java.net/browse/JDK-8170307>>
>
>                     The bug is not public unfortunately for
>                 non-technical reasons - but
>                     see my eval below.
>
>                     Background: if you load the JVM from the primordial
>                 thread of a
>                     process (not done by the java launcher since JDK 6),
>                 there is an
>                     artificial stack limit imposed on the initial thread
>                 (by sticking
>                     the guard page at the limit position of the actual
>                 stack) of the
>                     minimum of the -Xss setting and 2M. So if you set
>                 -Xss to > 2M it is
>                     ignored for the main thread even if the true stack
>                 is, say, 8M. This
>                     limitation dates back 10-15 years and is no longer
>                 relevant today
>                     and should be removed (see below). I've also added
>                 additional
>                     explanatory notes.
>
>                     webrev:
>                 http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>                 <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>
>                     <http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>                 <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>>
>
>                     Testing was manually done by modifying the launcher
>                 to not run the
>                     VM in a new thread, and checking the resulting stack
>                 size used.
>
>                     This change will only affect hosted JVMs launched
>                 with a -Xss value
>                     > 2M.
>
>                     Thanks,
>                     David
>                     -----
>
>                     Bug eval:
>
>                     JDK-4441425 limits the stack to 8M as a safeguard
>                 against an
>                     unlimited value from getrlimit in 1.3.1, but further
>                 constrained
>                     that to 2M in 1.4.0 due to JDK-4466587.
>
>                     By 1.4.2 we have the basic form of the current
>                 problematic code:
>
>                     #ifndef IA64
>                       if (rlim.rlim_cur > 2 * K * K) rlim.rlim_cur = 2 *
>                 K * K;
>                     #else
>                       // Problem still exists RH7.2 (IA64 anyway) but
>                 2MB is a little
>                 small
>                       if (rlim.rlim_cur > 4 * K * K) rlim.rlim_cur = 4 *
>                 K * K;
>                     #endif
>
>                       _initial_thread_stack_size = rlim.rlim_cur &
>                 ~(page_size() - 1);
>
>                       if (max_size && _initial_thread_stack_size >
>                 max_size) {
>                          _initial_thread_stack_size = max_size;
>                       }
>
>                     This was added by JDK-4678676 to allow the stack of
>                 the main thread
>                     to be _reduced_ below the default 2M/4M if the -Xss
>                 value was
>                     smaller than that.** There was no intent to allow
>                 the stack size to
>                     follow -Xss arbitrarily due to the operational
>                 constraints imposed
>                     by the OS/glibc at the time when dealing with the
>                 primordial process
>                     thread.
>
>                     ** It could not actually change the actual stack
>                 size of course, but
>                     set the guard pages to limit use to the expected
>                 stack size.
>
>                     In JDK 6, under JDK-6316197, the launcher was
>                 changed to create the
>                     JVM in a new thread, so that it was not limited by the
>                     idiosyncracies of the OS or thread library
>                 primordial thread
>                     handling. However, the stack size limitations
>                 remained in place in
>                     case the VM was launched from the primordial thread
>                 of a user
>                     application via the JNI invocation API.
>
>                     I believe it should be safe to remove the 2M
>                 limitation now.
>
>
>