(RFR)(S)(10): 8176768: hotspot ignores PTHREAD_STACK_MIN when creating new threads
Thomas Stüfe
thomas.stuefe at gmail.com
Sat Mar 18 07:23:17 UTC 2017
Hi Chris,
On Sat, Mar 18, 2017 at 7:37 AM, Chris Plummer <chris.plummer at oracle.com>
wrote:
> On 3/17/17 8:17 PM, Chris Plummer wrote:
>
>> On 3/17/17 7:01 PM, David Holmes wrote:
>>
>>> On 18/03/2017 9:11 AM, Chris Plummer wrote:
>>>
>>>> Looks like this will need some more work since the added asserts are
>>>> triggering on mac os x (which is the only place we'd currently expect
>>>> them to assert).
>>>>
>>>> The problem is that the code I found that rounds up to the page size is
>>>> only applied to java threads created by the VM for which the java user
>>>> specified no stack size. The VM and Compiler thread sizes are not
>>>> rounded. The failure I saw was with
>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java when
>>>> is
>>>> specified -XX:CompilerThreadStackSize=9007199254740991. I hit the
>>>> assert
>>>> with an EINVAL. The size is not aligned, but it could also be
>>>> complaining because it is too big. I haven't tried aligning it yet to
>>>> see.
>>>>
>>>> On Linux we do the following:
>>>>
>>>> stack_size = align_size_up(stack_size +
>>>> os::Linux::default_guard_size(thr_type), vm_page_size());
>>>>
>>>> We don't do this on BSD. I think the same on BSD would solve this
>>>> problem. I'm not sure about adding the guard size. I'll need to see if
>>>> mac os x has the same pthread bug as linux does.
>>>>
>>>
>>> At this stage I would only deal with alignment issues. IIRC the guard
>>> issue only affected Linux.
>>>
>> Yes, that's what I eventually concluded. I put the fix in
>> os::Posix::get_initial_stack_size() in os_posix.cpp, and only did the
>> page aligning, not add the guard page. That way all Posix ports are fixed
>> in one place. It seems to be working.
>>
>>>
>>> BTW, did you know java users can specify the size of the a new thread's
>>>> stack:
>>>>
>>>
>>> Yes I mentioned that in another reply - wondering whether we suitably
>>> check and aligned such requests.
>>>
>> No we don't. Below I mentioned I was able to trigger the assert with a
>> 257k stack size. I guess I wasn't clear that I did that from Java. I have a
>> new test to add that will be testing this, plus the 9007199254740991 stack
>> size (which fails to create the thread with an OOME, but that's
>> acceptable). The fix I mention above in os::Posix::get_initial_stack_size()
>> takes care of this issue also.
>>
> Rounding up triggers a new assert, this time on 32-bit x86 and arm.
>
> I should have clarified it's 9007199254740991 "K", which is
> 9223372036854774784. Unfortunately on 32bit systems that is asserting with
> EINVAL. I think we need to do a better job of dealing with 32-bit size_t
> values:
>
> jlong java_lang_Thread::stackSize(oop java_thread) {
> if (_stackSize_offset > 0) {
> return java_thread->long_field(_stackSize_offset);
> } else {
> return 0;
> }
> }
>
> jlong size =
> java_lang_Thread::stackSize(JNIHandles::resolve_non_null(jthread));
> // Allocate the C++ Thread structure and create the native thread.
> The
> // stack size retrieved from java is signed, but the constructor
> takes
> // size_t (an unsigned type), so avoid passing negative values which
> would
> // result in really large stacks.
> size_t sz = size > 0 ? (size_t) size : 0;
> native_thread = new JavaThread(&thread_entry, sz);
>
> 9223372036854774784 is 0x7ffffffffffffc00 (close to 64 bit MAX_INT), which
> is 0xfffffc00 when cast to a size_t on a 32-bit system (close to 32-bit
> MAX_UINT). Round it up to the 4k page size and you get 0, which I guess
> pthread_attr_setstacksize() doesn't like. So I think more processing of the
> size is needed here. Maybe in os::create_thread() we should check for 0
> after rounding up, and subtract the os page size if it is 0. However, I
> think we should also avoid truncating on 32-bit to what is basically some
> random number. Maybe if "size" (a jlong) is greater than UINT_MAX, then
> just set "sz" (a size_t) it to UINT_MAX.
>
> Chris
>
>
IMHO both solutions are fine: if cast to size_t would truncate the value,
either to cap the value to SIZE_MAX (to be portable, instead of UINT_MAX),
maybe minus one page size to forestall any rounding up. Then let the OS
fail with ENOMEM and handle that as usual. Alternatively fail immediately
with OOME, which will happen anyway, so we save some cycles.
Funny, I never get suspicious of (size_t) casts, because I am too used to
64bit. I did a quick grep for (size_t) casts in the hotspot sources, there
are quite a few. Probably overwhelmingly harmless, but it may be worth
checking them nevertheless.
... (took a cursory glance at the grep results, found one example of
casting jlong to size_t, see JVM_ENTRY
<http://ld8443:8080/source/s?defs=JVM_ENTRY&project=openjdk-jdk10-hs>(jlong
<http://ld8443:8080/source/s?defs=jlong&project=openjdk-jdk10-hs>,
jmm_SetPoolThreshold
<http://ld8443:8080/source/s?defs=jmm_SetPoolThreshold&project=openjdk-jdk10-hs>
(JNIEnv <http://ld8443:8080/source/s?defs=JNIEnv&project=openjdk-jdk10-hs>*
env
<http://ld8443:8080/source/xref/openjdk-jdk10-hs/hotspot/src/share/vm/services/management.cpp#env>,
jobject <http://ld8443:8080/source/s?defs=jobject&project=openjdk-jdk10-hs>
obj <http://ld8443:8080/source/s?defs=obj&project=openjdk-jdk10-hs>,
jmmThresholdType
<http://ld8443:8080/source/s?defs=jmmThresholdType&project=openjdk-jdk10-hs>
type <http://ld8443:8080/source/s?defs=type&project=openjdk-jdk10-hs>, jlong
<http://ld8443:8080/source/s?defs=jlong&project=openjdk-jdk10-hs> threshold
<http://ld8443:8080/source/s?defs=threshold&project=openjdk-jdk10-hs>)) in
management.cpp). So, its an issue.
Kind Regards, Thomas
>
>
>>> public Thread(ThreadGroup group, Runnable target, String name,
>>>> long stackSize) {
>>>> init(group, target, name, stackSize);
>>>> }
>>>>
>>>> Fortunately we already force the stackSize to be at least
>>>> _java_thread_min_stack_allowed. However, we don't do any OS page
>>>> rounding on Mac OS X as noted above, and I was able to trigger the
>>>> assert by creating a thread with size 257k.
>>>>
>>>
>>> Note this means that OSX stack logic is broken because it will currently
>>> be silently failing due to EINVAL!
>>>
>> Yes, that is correct.
>>
>> Chris
>>
>>>
>>> I'll get another webrev out once I've made the needed fixes. I also have
>>>> a new test I'd like to add.
>>>>
>>>
>>> Ok.
>>>
>>> Thanks,
>>> David
>>>
>>> Chris
>>>>
>>>> On 3/16/17 9:27 PM, Chris Plummer wrote:
>>>>
>>>>> Ok, time for a new webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~cjplummer/8176768/webrev.01/webrev.hotspot
>>>>>
>>>>> The only thing that has changed since the first webrev are the asserts
>>>>> added to os_linux.cpp and os_bsd.cpp. And to summarize what we discuss
>>>>> already:
>>>>>
>>>>> - The assert should never happen due to the stack size being too
>>>>> small since it will be at least PTHREAD_STACK_MIN.
>>>>> - The assert should never happen due to an unaligned stack size
>>>>> because we always align it to the page size.
>>>>> - Any assert would therefore be a VM bug and not due to user error.
>>>>> - No fixing the java launcher. If the user specifies a stack that is
>>>>> too small, hotspot will already detect this. If the user specifies a
>>>>> stack size that is large enough but not page aligned, then we just
>>>>> ignore any error (if the platform doth protest) and the user gets a
>>>>> main thread with a stack size set to whatever the OS default is.
>>>>>
>>>>> I still need to retest (I only ran TooSmallStackSize.java), but figure
>>>>> getting agreement on the changes first would be best before I bog down
>>>>> our testing resources.
>>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>> On 3/15/17 10:03 PM, Chris Plummer wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Please review the following:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8176768
>>>>>> http://cr.openjdk.java.net/~cjplummer/8176768/webrev.00/webr
>>>>>> ev.hotspot
>>>>>>
>>>>>> While working on 8175342 I noticed our stack size on xgene was 8mb
>>>>>> even though I was specifying -Xss72k. It turns out the following code
>>>>>> was failing:
>>>>>>
>>>>>> pthread_attr_setstacksize(&attr, stack_size);
>>>>>>
>>>>>> Although we computed a minimum stack size of 72k, so -Xss72k should
>>>>>> be fine, pthreads on this platform requires the stack be at least
>>>>>> 128k, so it failed the pthread_attr_setstacksize() call. The end
>>>>>> result is pthread_attr_setstacksize() had no impact on the thread's
>>>>>> stack size, and we ended up with the platform default of 8mb. The fix
>>>>>> is to round up the following variables to PTHREAD_STACK_MIN after
>>>>>> computing their new values:
>>>>>>
>>>>>> _java_thread_min_stack_allowed
>>>>>> _compiler_thread_min_stack_allowed
>>>>>> _vm_internal_thread_min_stack_allowed
>>>>>>
>>>>>> For solaris, there was an issue using PTHREAD_STACK_MIN. You need to
>>>>>> #define _POSIX_C_SOURCE >= 199506L in order to get PTHREAD_STACK_MIN
>>>>>> #defined, and this needs to be done before including OS header files.
>>>>>> I noticed that on solaris we were using thr_min_stack() elsewhere
>>>>>> instead of PTHREAD_STACK_MIN, so I decided to do the same with this
>>>>>> fix. Either way is ugly (the #define or using thr_min_stack()).
>>>>>>
>>>>>> And speaking of the existing use of thr_min_stack(), I deleted it. It
>>>>>> was being applied before any adjustments to the stack sizes had been
>>>>>> made (rounding and adding red, yellow, and shadow zones). This mean
>>>>>> the stack ended up being larger than necessary. With the above fix in
>>>>>> place, we are now applying thr_min_stack() after recomputing the
>>>>>> minimum stack sizes. If for any reason one of those stack sizes is
>>>>>> now too small, the correct fix is to adjust the initial stack sizes,
>>>>>> not apply thr_min_stack() to the initial stack sizes. However, it
>>>>>> looks like no adjustment is needed. I did something close to our
>>>>>> nightly testing on all affect platforms, and no new problems turned
>>>>>> up.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>
More information about the hotspot-dev
mailing list