RFR: 8225035: Thread stack size issue caused by large TLS size

David Holmes david.holmes at oracle.com
Wed Jul 3 06:59:51 UTC 2019


Hi Jiangli,

On 2/07/2019 8:33 am, Jiangli Zhou wrote:
> Hi David,
> 
> On Mon, Jul 1, 2019 at 1:22 AM David Holmes <david.holmes at oracle.com> wrote:
>>
>> Hi Jiangli,
>>
>> On 29/06/2019 9:42 am, Jiangli Zhou wrote:
>>> Hi David,
>>>
>>> Thanks for the detailed comments! Here is the latest webrev:
>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing
>>> for not including an incremental webrev (realized that when I almost
>>> done edits).
>>
>> That all looks fine - thanks for making the changes.
>>
>> However ... now that I see the logging output it occurred to me that
>> checking for the TLS adjustment is something that should only happen
>> once and we should be storing the adjustment amount in a static for
>> direct use. Sorry I didn't think about this earlier. Something like:
> 
> No problem at all! I actually thought in the same direction as well
> when making the change initially but didn't go with it as I had some
> concerns. Florian's latest reply is reassuring (thanks again!). So
> here is the update:
> http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/.

The incremental change looks fine to me. To address Florian's concern I 
suggest adding an additional comment:

   // Returns the size of the static TLS area glibc puts on thread stacks.
+ // The value is cached on first use, which occurs when the first thread
+ // is created during VM initialization.
   static size_t get_static_tls_area_size(const pthread_attr_t *attr) {

> Since we are settling down on the approach and final implementation
> details, it would be a good idea to get the CSR ball rolling. Could
> you please review and I'll finalize the CSR. Thanks!

Done.

Thanks,
David

> Best regards,
> Jiangli
> 
>>
>> static size_t tls_size = 0;
>> static bool tls_size_inited = false;
>>
>> static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
>> +  if (!tls_size_inited) {
>> +    tls_size_inited = true;
>>        if (_get_minstack_func != NULL) {
>>          size_t minstack_size = _get_minstack_func(attr);
>>          ...
>>          if (minstack_size > (size_t)os::vm_page_size() +
>> PTHREAD_STACK_MIN) {
>>            tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN;
>>          }
>>       }
>> + }
>>     log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT,
>>                          tls_size);
>>     return tls_size;
>> }
>>
>> Or even fold it all into get_minstack_init() ?
>>
>> I'm assuming that the result of __pthread_get_minstack wont' change over
>> time of course.
>>
>> Thanks,
>> David
>> -----
>>
>>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes <david.holmes at oracle.com> wrote:
>>>>
>>>> Hi Jiangli,
>>>>
>>>> This is very well written up - thanks.
>>>>
>>>> I apologize in advance that I'm about to traveling for a few days so
>>>> won't be able to respond further until next week.
>>>
>>> Hope you have a relaxed and safe travel. This can wait.
>>>
>>>>
>>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote:
>>>>> Updated webrev:
>>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/
>>>>
>>>> Overall changes look good. I also have a concern around this code:
>>>>
>>>> +     tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN;
>>>> +     assert(tls_size > 0, "unexpected size");
>>>>
>>>> In addition to Thomas's comments re-signedness, can't the result == 0 if
>>>> there is no sttaic TLS in use? Or is there always some static TLS in use?
>>>
>>> On both glibc 2.24 and 2.28, by default I see there is one page memory
>>> for static TLS, without explicitly defining any __thread variables in
>>> user code.
>>>
>>>>
>>>> A few other comments/requests:
>>>>
>>>>     822 static void get_minstack_init() {
>>>>     823   _get_minstack_func =
>>>>     824         (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack");
>>>>     825 }
>>>>
>>>> Can you add a logging statement please:
>>>>
>>>> log(os, thread)("Lookup of __pthread_get_minstack %s",
>>>>                    _get_minstack_func == NULL ? "failed" : "succeeded");
>>>
>>> Added log info.
>>>
>>>>
>>>> --
>>>>
>>>>     884   // In the Linux NPTL pthread implementation the guard size mechanism
>>>>
>>>> Now that we have additional information on this could you update this
>>>> old comment to say
>>>>
>>>> // In glibc versions prior to 2.7 the guard size mechanism
>>>
>>> Done.
>>>
>>>>
>>>> --
>>>>
>>>>     897     // Adjust the stack_size by adding the on-stack TLS size if
>>>>     898     // AdjustStackSizeForTLS is true. The guard size is already
>>>>     899     // accounted in this case, please see comments in
>>>>     900     // get_static_tls_area_size().
>>>>
>>>> Given the extensive commentary in get_static_tls_area_size() can we be
>>>> more brief here and just say:
>>>>
>>>> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size().
>>>
>>> Done.
>>>
>>>>
>>>> --
>>>>
>>>> 5193   get_minstack_init();
>>>>
>>>> Can you make this conditional on AdjustStackSizeForTLS please so there
>>>> is no affect when not using the flag - thanks.
>>>
>>> Done.
>>>
>>>>
>>>> --
>>>>
>>>> 855   }
>>>> 856   return tls_size;
>>>>
>>>> Can you insert a logging statement:
>>>>
>>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT,
>>>> tls_size);
>>>
>>> Added.
>>>
>>>>
>>>> ---
>>>>
>>>> test/hotspot/jtreg/runtime/TLS/T.java
>>>>
>>>> 37             // Starting a ProcessBuilder causes the process reaper
>>>> thread being
>>>>
>>>> s/being/to be/
>>>
>>> Fixed.
>>>
>>>>
>>>>     43             // failure mode the VM fails to create thread with
>>>> error message
>>>>
>>>> s/create thread/create a thread/
>>>
>>> Fixed.
>>>
>>>>
>>>>      53                 System.out.println("Unexpected Echo output: " +
>>>> echoOutput +
>>>>      54                                    ", expects: " + echoInput);
>>>>
>>>> should this be an exception so that test fails? I can't imagine how echo
>>>> would fail but probably better to fail the test if something unexpected
>>>> happens.
>>>
>>> If no expected output is obtained from echo (due to ProcessBuilder
>>> failure caused by TLS issue), the test does fail and reports to the
>>> caller (returns false). If we throw an explicit expectation, it will
>>> be caught by the outer try/catch, which seems to be unnecessary.
>>>
>>>>
>>>> 66         try {
>>>>      67             br = new BufferedReader(new
>>>> InputStreamReader(inputStream));
>>>>      68             s = br.readLine();
>>>>      69         } finally {
>>>>      70             br.close();
>>>>      71         }
>>>>
>>>> This could use try-with-resources for both streams.
>>>
>>> Sounds good. Done.
>>>
>>>>
>>>> ---
>>>>
>>>> exestack-tls.c
>>>>
>>>> It's simpler if no argument means no-tls and an argument means tls.
>>>>
>>>>      42     char classpath[4096];
>>>>      43     snprintf(classpath, sizeof classpath,
>>>>      44              "-Djava.class.path=%s", getenv("CLASSPATH"));
>>>>      45     options[0].optionString = classpath;
>>>>
>>>> Do we need to explicitly set the classpath? I'm concerned that our test
>>>> environment uses really, really long paths and a number of them.
>>>> (Probably not 4096 but still ...)
>>>
>>> The classpath needs to be set so we know where to load the test class.
>>> Given that we use 4096 for the same type of usage in other existing
>>> test(s) (for example StackGap), it probably is okay for our test
>>> environments? I can increase the array size if we want to be extra
>>> safe ...
>>>
>>>>
>>>>
>>>> test/hotspot/jtreg/runtime/TLS/testtls.sh
>>>>
>>>>      40 if [ "${VM_OS}" != "linux" ]
>>>>      41 then
>>>>      42   echo "Test is only valid for Linux"
>>>>      43   exit 0
>>>>      44 fi
>>>>
>>>> This should be done via "@requires os.family != Linux"
>>>
>>> Done.
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Jiangli
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Thanks for everyone's contribution on carving out the current workaround!
>>>>>
>>>>> Best regards,
>>>>> Jiangli
>>>>>
>>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou <jianglizhou at google.com> wrote:
>>>>>>
>>>>>> Thank you Thomas and David! Glad to see that we are converging on an
>>>>>> acceptable approach here. I'll try to factor in all the latest inputs
>>>>>> from everyone and send out a new update.
>>>>>>
>>>>>> Thanks and best regards,
>>>>>> Jiangli
>>>>>>
>>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes <david.holmes at oracle.com> wrote:
>>>>>>>
>>>>>>> Trimming ....
>>>>>>>
>>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote:
>>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer <fweimer at redhat.com> wrote:
>>>>>>>>> I think you can handle the guard size in this way:
>>>>>>>>>
>>>>>>>>>       pthread_attr_setguardsize(&attr, guard_size);
>>>>>>>>>
>>>>>>>>>       size_t stack_adjust_size = 0;
>>>>>>>>>       if (AdjustStackSizeForTLS) {
>>>>>>>>>         size_t minstack_size = get_minstack(&attr);
>>>>>>>>>         size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN;
>>>>>>>>>         // In glibc before 2.27, tls_size still includes guard_size.
>>>>>>>>>         // In glibc 2.27 and later, guard_size is automatically
>>>>>>>>>         // added to the stack size by pthread_create.
>>>>>>>>>         // In both cases, the guard size is taken into account.
>>>>>>>>>         stack_adjust_size += tls_size;
>>>>>>>>>       } else {
>>>>>>>>>         stack_adjust_size += guard_size;
>>>>>>>>>       }
>>>>>>>>
>>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others
>>>>>>>> are okay with the above suggested adjustment, it looks good to me.
>>>>>>>> Thomas, David and others, any objection?
>>>>>>>
>>>>>>> I find the above acceptable. I've been waiting for the dust to settle.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Thanks and best regards,
>>>>>>>> Jiangli
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Florian


More information about the hotspot-runtime-dev mailing list