RFR: 8170307: Stack size option -Xss is ignored

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Dec 2 23:12:52 UTC 2016


On 12/1/16 10:51 PM, David Holmes wrote:
> Investigating this further the history is quite complex, especially 
> when we start looking at other platforms. E.g. see
>
> https://bugs.openjdk.java.net/browse/JDK-6269555
>
> Solaris actually hard-wires an 8MB limit for the primordial thread.
>
> I'm very tempted to do the same on Linux.

Vote: yes

This latest problem only comes up with -XX:ThreadStackSize=0 when the
stack is unlimited right?

When -XX:ThreadStackSize=0 is specified, is taking the smaller of
8MB or the ulimit a viable option?

Also, it looks like Hui had some things to say about not setting the
red/yellow zone pages on the primordial thread when we aren't using the
'java' launcher because we don't know the environment of the code that
is using the JNI invocation API...

Dan


>
> David
> -----
>
> On 30/11/2016 6:46 PM, David Holmes wrote:
>> On 30/11/2016 6:17 PM, Thomas Stüfe wrote:
>>> On Wed, Nov 30, 2016 at 8:35 AM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>>     On 29/11/2016 10:25 PM, David Holmes wrote:
>>>
>>>         I just realized I overlooked the case where ThreadStackSize=0
>>>         and the
>>>         stack is unlimited. In that case it isn't clear where the guard
>>>         pages
>>>         will get inserted - I do know that I don't get a stackoverflow
>>>         error.
>>>
>>>         This needs further investigation.
>>>
>>>
>>>     So what happens here is that the massive stack-size causes
>>>     stack-bottom to be higher than stack-top! So we will set a
>>>     guard-page goodness knows where, and we can consume the current
>>>     stack until such time as we hit an unmapped or protected region at
>>>     which point we are killed.
>>>
>>>     I'm not sure what to do here. My gut feel is that in such a case we
>>>     should not attempt to create a guard page in the initial thread.
>>>     That would require using a sentinel value for the stack-size. 
>>> Though
>>>     it also presents a problem for stack-bottom - which is implicitly
>>>     zero. It may also give false positives in the is_initial_thread()
>>> check!
>>>
>>>     Thoughts? Suggestions?
>>>
>>>
>>> Maybe I am overlooking something, but should
>>> os::capture_initial_thread() not call pthread_getattr_np() first to
>>> handle the case where the VM was created on a pthread which is not the
>>> primordial thread and may have a different stack size than what
>>> getrlimit returns? And fall back to getrlimit only if
>>> pthread_getattr_np() fails?
>>
>> My understanding of the problem (which likely no longer exists) is that
>> pthread_getattr_np didn't fail as such but returned bogus values - so
>> the problem was not detectable and so we just had to not use
>> pthread_getattr_np.
>>
>>> And then we also should handle
>>> RLIM_INFINITY. For that case, I also think not setting guard pages 
>>> would
>>> be safest.
>>>
>>> We also may just refuse to run in that case, because the workaround for
>>> the user is easy - just set the limit before process start. Note 
>>> that on
>>> AIX, we currently refuse to run on the primordial thread because it may
>>> have different page sizes than pthreads and it is impossible to get the
>>> exact stack locations.
>>
>> I was wondering why the AIX set up seemed so simple in comparison :)
>>
>> Thanks,
>> David
>>
>>>
>>> Thomas
>>>
>>>
>>>
>>>         David
>>>
>>>         On 29/11/2016 9:59 PM, David Holmes wrote:
>>>
>>>             Hi Thomas,
>>>
>>>             On 29/11/2016 8:39 PM, Thomas Stüfe wrote:
>>>
>>>                 Hi David,
>>>
>>>                 thanks for the good explanation. Change looks good, I
>>>                 really like the
>>>                 comment in capture_initial_stack().
>>>
>>>                 Question, with -Xss given and being smaller than 
>>> current
>>>                 thread stack
>>>                 size, guard pages may appear in the middle of the
>>>                 invoking thread stack?
>>>                 I always thought this is a bit dangerous. If your model
>>>                 is to have the
>>>                 VM created from the main thread, which then goes off to
>>>                 do different
>>>                 things, and have other threads then attach and run java
>>>                 code, main
>>>                 thread later may crash in unrelated native code just
>>>                 because it reached
>>>                 the stack depth of the hava threads? Or am I
>>>                 misunderstanding something?
>>>
>>>
>>>             There is no change to the general behaviour other than
>>>             allowing a
>>>             primordial process thread that launches the VM, to now not
>>>             have an
>>>             effective stack limited at 2MB. The current logic will
>>>             insert guard
>>>             pages where ever -Xss states (as long as less than 2MB else
>>>             2MB), while
>>>             with the fix the guard pages will be inserted above 2MB 
>>> - as
>>>             dictated by
>>>             -Xss.
>>>
>>>             David
>>>             -----
>>>
>>>                 Thanks, Thomas
>>>
>>>
>>>                 On Fri, Nov 25, 2016 at 11:38 AM, David Holmes
>>>                 <david.holmes at oracle.com 
>>> <mailto:david.holmes at oracle.com>
>>>                 <mailto:david.holmes at oracle.com
>>>                 <mailto:david.holmes at oracle.com>>> wrote:
>>>
>>>                     Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8170307
>>> <https://bugs.openjdk.java.net/browse/JDK-8170307>
>>> <https://bugs.openjdk.java.net/browse/JDK-8170307
>>> <https://bugs.openjdk.java.net/browse/JDK-8170307>>
>>>
>>>                     The bug is not public unfortunately for
>>>                 non-technical reasons - but
>>>                     see my eval below.
>>>
>>>                     Background: if you load the JVM from the primordial
>>>                 thread of a
>>>                     process (not done by the java launcher since JDK 
>>> 6),
>>>                 there is an
>>>                     artificial stack limit imposed on the initial 
>>> thread
>>>                 (by sticking
>>>                     the guard page at the limit position of the actual
>>>                 stack) of the
>>>                     minimum of the -Xss setting and 2M. So if you set
>>>                 -Xss to > 2M it is
>>>                     ignored for the main thread even if the true stack
>>>                 is, say, 8M. This
>>>                     limitation dates back 10-15 years and is no longer
>>>                 relevant today
>>>                     and should be removed (see below). I've also added
>>>                 additional
>>>                     explanatory notes.
>>>
>>>                     webrev:
>>> http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>>> <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>
>>> <http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>>> <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>>
>>>
>>>                     Testing was manually done by modifying the launcher
>>>                 to not run the
>>>                     VM in a new thread, and checking the resulting 
>>> stack
>>>                 size used.
>>>
>>>                     This change will only affect hosted JVMs launched
>>>                 with a -Xss value
>>>                     > 2M.
>>>
>>>                     Thanks,
>>>                     David
>>>                     -----
>>>
>>>                     Bug eval:
>>>
>>>                     JDK-4441425 limits the stack to 8M as a safeguard
>>>                 against an
>>>                     unlimited value from getrlimit in 1.3.1, but 
>>> further
>>>                 constrained
>>>                     that to 2M in 1.4.0 due to JDK-4466587.
>>>
>>>                     By 1.4.2 we have the basic form of the current
>>>                 problematic code:
>>>
>>>                     #ifndef IA64
>>>                       if (rlim.rlim_cur > 2 * K * K) rlim.rlim_cur = 
>>> 2 *
>>>                 K * K;
>>>                     #else
>>>                       // Problem still exists RH7.2 (IA64 anyway) but
>>>                 2MB is a little
>>>                 small
>>>                       if (rlim.rlim_cur > 4 * K * K) rlim.rlim_cur = 
>>> 4 *
>>>                 K * K;
>>>                     #endif
>>>
>>>                       _initial_thread_stack_size = rlim.rlim_cur &
>>>                 ~(page_size() - 1);
>>>
>>>                       if (max_size && _initial_thread_stack_size >
>>>                 max_size) {
>>>                          _initial_thread_stack_size = max_size;
>>>                       }
>>>
>>>                     This was added by JDK-4678676 to allow the stack of
>>>                 the main thread
>>>                     to be _reduced_ below the default 2M/4M if the -Xss
>>>                 value was
>>>                     smaller than that.** There was no intent to allow
>>>                 the stack size to
>>>                     follow -Xss arbitrarily due to the operational
>>>                 constraints imposed
>>>                     by the OS/glibc at the time when dealing with the
>>>                 primordial process
>>>                     thread.
>>>
>>>                     ** It could not actually change the actual stack
>>>                 size of course, but
>>>                     set the guard pages to limit use to the expected
>>>                 stack size.
>>>
>>>                     In JDK 6, under JDK-6316197, the launcher was
>>>                 changed to create the
>>>                     JVM in a new thread, so that it was not limited by
>>> the
>>>                     idiosyncracies of the OS or thread library
>>>                 primordial thread
>>>                     handling. However, the stack size limitations
>>>                 remained in place in
>>>                     case the VM was launched from the primordial thread
>>>                 of a user
>>>                     application via the JNI invocation API.
>>>
>>>                     I believe it should be safe to remove the 2M
>>>                 limitation now.
>>>
>>>
>>>



More information about the hotspot-dev mailing list