RFR: 8225035: Thread stack size issue caused by large TLS size

Jiangli Zhou jianglizhou at google.com
Fri Jul 5 04:08:07 UTC 2019


Hi Thomas and David,

Acknowledged.

Best regards,
Jiangli

On Wed, Jul 3, 2019, 11:53 PM David Holmes <david.holmes at oracle.com> wrote:

>
>
> On 4/07/2019 4:22 am, Jiangli Zhou wrote:
> > Hi David,
> >
> > On Wed, Jul 3, 2019 at 12:00 AM David Holmes <david.holmes at oracle.com>
> wrote:
> >>
> >> Hi Jiangli,
> >>
> >> On 2/07/2019 8:33 am, Jiangli Zhou wrote:
> >>> Hi David,
> >>>
> >>> On Mon, Jul 1, 2019 at 1:22 AM David Holmes <david.holmes at oracle.com>
> wrote:
> >>>>
> >>>> Hi Jiangli,
> >>>>
> >>>> On 29/06/2019 9:42 am, Jiangli Zhou wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> Thanks for the detailed comments! Here is the latest webrev:
> >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing
> >>>>> for not including an incremental webrev (realized that when I almost
> >>>>> done edits).
> >>>>
> >>>> That all looks fine - thanks for making the changes.
> >>>>
> >>>> However ... now that I see the logging output it occurred to me that
> >>>> checking for the TLS adjustment is something that should only happen
> >>>> once and we should be storing the adjustment amount in a static for
> >>>> direct use. Sorry I didn't think about this earlier. Something like:
> >>>
> >>> No problem at all! I actually thought in the same direction as well
> >>> when making the change initially but didn't go with it as I had some
> >>> concerns. Florian's latest reply is reassuring (thanks again!). So
> >>> here is the update:
> >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/.
> >>
> >> The incremental change looks fine to me. To address Florian's concern I
> >> suggest adding an additional comment:
> >>
> >>     // Returns the size of the static TLS area glibc puts on thread
> stacks.
> >> + // The value is cached on first use, which occurs when the first
> thread
> >> + // is created during VM initialization.
> >>     static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
> >
> >
> > Done.
> >
> >>
> >>> Since we are settling down on the approach and final implementation
> >>> details, it would be a good idea to get the CSR ball rolling. Could
> >>> you please review and I'll finalize the CSR. Thanks!
> >>
> >> Done.
> >
> > Thank you!
> >
> > As you, Florian, Thomas all made great contributions to this
> > workaround, I should list all of you as both contributors and
> > reviewers in the changeset. If there is any objection, please let me
> > know.
>
> It's not necessary for me, but I appreciate the consideration.
>
> Cheers,
> David
>
> >
> > Best regards,
> > Jiangli
> >
> >>
> >> Thanks,
> >> David
> >>
> >>> Best regards,
> >>> Jiangli
> >>>
> >>>>
> >>>> static size_t tls_size = 0;
> >>>> static bool tls_size_inited = false;
> >>>>
> >>>> static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
> >>>> +  if (!tls_size_inited) {
> >>>> +    tls_size_inited = true;
> >>>>         if (_get_minstack_func != NULL) {
> >>>>           size_t minstack_size = _get_minstack_func(attr);
> >>>>           ...
> >>>>           if (minstack_size > (size_t)os::vm_page_size() +
> >>>> PTHREAD_STACK_MIN) {
> >>>>             tls_size = minstack_size - os::vm_page_size() -
> PTHREAD_STACK_MIN;
> >>>>           }
> >>>>        }
> >>>> + }
> >>>>      log_info(os, thread)("Stack size adjustment for TLS is "
> SIZE_FORMAT,
> >>>>                           tls_size);
> >>>>      return tls_size;
> >>>> }
> >>>>
> >>>> Or even fold it all into get_minstack_init() ?
> >>>>
> >>>> I'm assuming that the result of __pthread_get_minstack wont' change
> over
> >>>> time of course.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>> -----
> >>>>
> >>>>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes <
> david.holmes at oracle.com> wrote:
> >>>>>>
> >>>>>> Hi Jiangli,
> >>>>>>
> >>>>>> This is very well written up - thanks.
> >>>>>>
> >>>>>> I apologize in advance that I'm about to traveling for a few days so
> >>>>>> won't be able to respond further until next week.
> >>>>>
> >>>>> Hope you have a relaxed and safe travel. This can wait.
> >>>>>
> >>>>>>
> >>>>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote:
> >>>>>>> Updated webrev:
> >>>>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/
> >>>>>>
> >>>>>> Overall changes look good. I also have a concern around this code:
> >>>>>>
> >>>>>> +     tls_size = minstack_size - os::vm_page_size() -
> PTHREAD_STACK_MIN;
> >>>>>> +     assert(tls_size > 0, "unexpected size");
> >>>>>>
> >>>>>> In addition to Thomas's comments re-signedness, can't the result ==
> 0 if
> >>>>>> there is no sttaic TLS in use? Or is there always some static TLS
> in use?
> >>>>>
> >>>>> On both glibc 2.24 and 2.28, by default I see there is one page
> memory
> >>>>> for static TLS, without explicitly defining any __thread variables in
> >>>>> user code.
> >>>>>
> >>>>>>
> >>>>>> A few other comments/requests:
> >>>>>>
> >>>>>>      822 static void get_minstack_init() {
> >>>>>>      823   _get_minstack_func =
> >>>>>>      824         (GetMinStack)dlsym(RTLD_DEFAULT,
> "__pthread_get_minstack");
> >>>>>>      825 }
> >>>>>>
> >>>>>> Can you add a logging statement please:
> >>>>>>
> >>>>>> log(os, thread)("Lookup of __pthread_get_minstack %s",
> >>>>>>                     _get_minstack_func == NULL ? "failed" :
> "succeeded");
> >>>>>
> >>>>> Added log info.
> >>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>>      884   // In the Linux NPTL pthread implementation the guard
> size mechanism
> >>>>>>
> >>>>>> Now that we have additional information on this could you update
> this
> >>>>>> old comment to say
> >>>>>>
> >>>>>> // In glibc versions prior to 2.7 the guard size mechanism
> >>>>>
> >>>>> Done.
> >>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>>      897     // Adjust the stack_size by adding the on-stack TLS
> size if
> >>>>>>      898     // AdjustStackSizeForTLS is true. The guard size is
> already
> >>>>>>      899     // accounted in this case, please see comments in
> >>>>>>      900     // get_static_tls_area_size().
> >>>>>>
> >>>>>> Given the extensive commentary in get_static_tls_area_size() can we
> be
> >>>>>> more brief here and just say:
> >>>>>>
> >>>>>> // Adjust the stack size for on-stack TLS - see
> get_static_tls_area_size().
> >>>>>
> >>>>> Done.
> >>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> 5193   get_minstack_init();
> >>>>>>
> >>>>>> Can you make this conditional on AdjustStackSizeForTLS please so
> there
> >>>>>> is no affect when not using the flag - thanks.
> >>>>>
> >>>>> Done.
> >>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> 855   }
> >>>>>> 856   return tls_size;
> >>>>>>
> >>>>>> Can you insert a logging statement:
> >>>>>>
> >>>>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT,
> >>>>>> tls_size);
> >>>>>
> >>>>> Added.
> >>>>>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> test/hotspot/jtreg/runtime/TLS/T.java
> >>>>>>
> >>>>>> 37             // Starting a ProcessBuilder causes the process
> reaper
> >>>>>> thread being
> >>>>>>
> >>>>>> s/being/to be/
> >>>>>
> >>>>> Fixed.
> >>>>>
> >>>>>>
> >>>>>>      43             // failure mode the VM fails to create thread
> with
> >>>>>> error message
> >>>>>>
> >>>>>> s/create thread/create a thread/
> >>>>>
> >>>>> Fixed.
> >>>>>
> >>>>>>
> >>>>>>       53                 System.out.println("Unexpected Echo
> output: " +
> >>>>>> echoOutput +
> >>>>>>       54                                    ", expects: " +
> echoInput);
> >>>>>>
> >>>>>> should this be an exception so that test fails? I can't imagine how
> echo
> >>>>>> would fail but probably better to fail the test if something
> unexpected
> >>>>>> happens.
> >>>>>
> >>>>> If no expected output is obtained from echo (due to ProcessBuilder
> >>>>> failure caused by TLS issue), the test does fail and reports to the
> >>>>> caller (returns false). If we throw an explicit expectation, it will
> >>>>> be caught by the outer try/catch, which seems to be unnecessary.
> >>>>>
> >>>>>>
> >>>>>> 66         try {
> >>>>>>       67             br = new BufferedReader(new
> >>>>>> InputStreamReader(inputStream));
> >>>>>>       68             s = br.readLine();
> >>>>>>       69         } finally {
> >>>>>>       70             br.close();
> >>>>>>       71         }
> >>>>>>
> >>>>>> This could use try-with-resources for both streams.
> >>>>>
> >>>>> Sounds good. Done.
> >>>>>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> exestack-tls.c
> >>>>>>
> >>>>>> It's simpler if no argument means no-tls and an argument means tls.
> >>>>>>
> >>>>>>       42     char classpath[4096];
> >>>>>>       43     snprintf(classpath, sizeof classpath,
> >>>>>>       44              "-Djava.class.path=%s", getenv("CLASSPATH"));
> >>>>>>       45     options[0].optionString = classpath;
> >>>>>>
> >>>>>> Do we need to explicitly set the classpath? I'm concerned that our
> test
> >>>>>> environment uses really, really long paths and a number of them.
> >>>>>> (Probably not 4096 but still ...)
> >>>>>
> >>>>> The classpath needs to be set so we know where to load the test
> class.
> >>>>> Given that we use 4096 for the same type of usage in other existing
> >>>>> test(s) (for example StackGap), it probably is okay for our test
> >>>>> environments? I can increase the array size if we want to be extra
> >>>>> safe ...
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> test/hotspot/jtreg/runtime/TLS/testtls.sh
> >>>>>>
> >>>>>>       40 if [ "${VM_OS}" != "linux" ]
> >>>>>>       41 then
> >>>>>>       42   echo "Test is only valid for Linux"
> >>>>>>       43   exit 0
> >>>>>>       44 fi
> >>>>>>
> >>>>>> This should be done via "@requires os.family != Linux"
> >>>>>
> >>>>> Done.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Best regards,
> >>>>> Jiangli
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>> -----
> >>>>>>
> >>>>>>> Thanks for everyone's contribution on carving out the current
> workaround!
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Jiangli
> >>>>>>>
> >>>>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou <
> jianglizhou at google.com> wrote:
> >>>>>>>>
> >>>>>>>> Thank you Thomas and David! Glad to see that we are converging on
> an
> >>>>>>>> acceptable approach here. I'll try to factor in all the latest
> inputs
> >>>>>>>> from everyone and send out a new update.
> >>>>>>>>
> >>>>>>>> Thanks and best regards,
> >>>>>>>> Jiangli
> >>>>>>>>
> >>>>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes <
> david.holmes at oracle.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Trimming ....
> >>>>>>>>>
> >>>>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote:
> >>>>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer <
> fweimer at redhat.com> wrote:
> >>>>>>>>>>> I think you can handle the guard size in this way:
> >>>>>>>>>>>
> >>>>>>>>>>>        pthread_attr_setguardsize(&attr, guard_size);
> >>>>>>>>>>>
> >>>>>>>>>>>        size_t stack_adjust_size = 0;
> >>>>>>>>>>>        if (AdjustStackSizeForTLS) {
> >>>>>>>>>>>          size_t minstack_size = get_minstack(&attr);
> >>>>>>>>>>>          size_t tls_size = minstack_size - vm_page_size() -
> PTHREAD_STACK_MIN;
> >>>>>>>>>>>          // In glibc before 2.27, tls_size still includes
> guard_size.
> >>>>>>>>>>>          // In glibc 2.27 and later, guard_size is
> automatically
> >>>>>>>>>>>          // added to the stack size by pthread_create.
> >>>>>>>>>>>          // In both cases, the guard size is taken into
> account.
> >>>>>>>>>>>          stack_adjust_size += tls_size;
> >>>>>>>>>>>        } else {
> >>>>>>>>>>>          stack_adjust_size += guard_size;
> >>>>>>>>>>>        }
> >>>>>>>>>>
> >>>>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as
> others
> >>>>>>>>>> are okay with the above suggested adjustment, it looks good to
> me.
> >>>>>>>>>> Thomas, David and others, any objection?
> >>>>>>>>>
> >>>>>>>>> I find the above acceptable. I've been waiting for the dust to
> settle.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> David
> >>>>>>>>>
> >>>>>>>>>> Thanks and best regards,
> >>>>>>>>>> Jiangli
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Florian
>


More information about the hotspot-runtime-dev mailing list