RFR: 8225035: Thread stack size issue caused by large TLS size

Thomas Stüfe thomas.stuefe at gmail.com
Wed Jul 3 20:09:16 UTC 2019


Hi Jiangli,

On Wed 3. Jul 2019 at 20:23, Jiangli Zhou <jianglizhou at google.com> wrote:

> Hi David,
>
> On Wed, Jul 3, 2019 at 12:00 AM David Holmes <david.holmes at oracle.com>
> wrote:
> >
> > Hi Jiangli,
> >
> > On 2/07/2019 8:33 am, Jiangli Zhou wrote:
> > > Hi David,
> > >
> > > On Mon, Jul 1, 2019 at 1:22 AM David Holmes <david.holmes at oracle.com>
> wrote:
> > >>
> > >> Hi Jiangli,
> > >>
> > >> On 29/06/2019 9:42 am, Jiangli Zhou wrote:
> > >>> Hi David,
> > >>>
> > >>> Thanks for the detailed comments! Here is the latest webrev:
> > >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing
> > >>> for not including an incremental webrev (realized that when I almost
> > >>> done edits).
> > >>
> > >> That all looks fine - thanks for making the changes.
> > >>
> > >> However ... now that I see the logging output it occurred to me that
> > >> checking for the TLS adjustment is something that should only happen
> > >> once and we should be storing the adjustment amount in a static for
> > >> direct use. Sorry I didn't think about this earlier. Something like:
> > >
> > > No problem at all! I actually thought in the same direction as well
> > > when making the change initially but didn't go with it as I had some
> > > concerns. Florian's latest reply is reassuring (thanks again!). So
> > > here is the update:
> > > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/.
> >
> > The incremental change looks fine to me. To address Florian's concern I
> > suggest adding an additional comment:
> >
> >    // Returns the size of the static TLS area glibc puts on thread
> stacks.
> > + // The value is cached on first use, which occurs when the first thread
> > + // is created during VM initialization.
> >    static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
>
>
> Done.
>
> >
> > > Since we are settling down on the approach and final implementation
> > > details, it would be a good idea to get the CSR ball rolling. Could
> > > you please review and I'll finalize the CSR. Thanks!
> >
> > Done.
>
> Thank you!
>
> As you, Florian, Thomas all made great contributions to this
> workaround, I should list all of you as both contributors and
> reviewers in the changeset. If there is any objection, please let me
> know.
>

Appreciated, but not for me. May make sense for Florian if he intends to
get author status at some point.

Thanks for your perseverance.

Cheers, Thomas



> Best regards,
> Jiangli
>
> >
> > Thanks,
> > David
> >
> > > Best regards,
> > > Jiangli
> > >
> > >>
> > >> static size_t tls_size = 0;
> > >> static bool tls_size_inited = false;
> > >>
> > >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
> > >> +  if (!tls_size_inited) {
> > >> +    tls_size_inited = true;
> > >>        if (_get_minstack_func != NULL) {
> > >>          size_t minstack_size = _get_minstack_func(attr);
> > >>          ...
> > >>          if (minstack_size > (size_t)os::vm_page_size() +
> > >> PTHREAD_STACK_MIN) {
> > >>            tls_size = minstack_size - os::vm_page_size() -
> PTHREAD_STACK_MIN;
> > >>          }
> > >>       }
> > >> + }
> > >>     log_info(os, thread)("Stack size adjustment for TLS is "
> SIZE_FORMAT,
> > >>                          tls_size);
> > >>     return tls_size;
> > >> }
> > >>
> > >> Or even fold it all into get_minstack_init() ?
> > >>
> > >> I'm assuming that the result of __pthread_get_minstack wont' change
> over
> > >> time of course.
> > >>
> > >> Thanks,
> > >> David
> > >> -----
> > >>
> > >>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes <
> david.holmes at oracle.com> wrote:
> > >>>>
> > >>>> Hi Jiangli,
> > >>>>
> > >>>> This is very well written up - thanks.
> > >>>>
> > >>>> I apologize in advance that I'm about to traveling for a few days so
> > >>>> won't be able to respond further until next week.
> > >>>
> > >>> Hope you have a relaxed and safe travel. This can wait.
> > >>>
> > >>>>
> > >>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote:
> > >>>>> Updated webrev:
> > >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/
> > >>>>
> > >>>> Overall changes look good. I also have a concern around this code:
> > >>>>
> > >>>> +     tls_size = minstack_size - os::vm_page_size() -
> PTHREAD_STACK_MIN;
> > >>>> +     assert(tls_size > 0, "unexpected size");
> > >>>>
> > >>>> In addition to Thomas's comments re-signedness, can't the result ==
> 0 if
> > >>>> there is no sttaic TLS in use? Or is there always some static TLS
> in use?
> > >>>
> > >>> On both glibc 2.24 and 2.28, by default I see there is one page
> memory
> > >>> for static TLS, without explicitly defining any __thread variables in
> > >>> user code.
> > >>>
> > >>>>
> > >>>> A few other comments/requests:
> > >>>>
> > >>>>     822 static void get_minstack_init() {
> > >>>>     823   _get_minstack_func =
> > >>>>     824         (GetMinStack)dlsym(RTLD_DEFAULT,
> "__pthread_get_minstack");
> > >>>>     825 }
> > >>>>
> > >>>> Can you add a logging statement please:
> > >>>>
> > >>>> log(os, thread)("Lookup of __pthread_get_minstack %s",
> > >>>>                    _get_minstack_func == NULL ? "failed" :
> "succeeded");
> > >>>
> > >>> Added log info.
> > >>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>>     884   // In the Linux NPTL pthread implementation the guard
> size mechanism
> > >>>>
> > >>>> Now that we have additional information on this could you update
> this
> > >>>> old comment to say
> > >>>>
> > >>>> // In glibc versions prior to 2.7 the guard size mechanism
> > >>>
> > >>> Done.
> > >>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>>     897     // Adjust the stack_size by adding the on-stack TLS
> size if
> > >>>>     898     // AdjustStackSizeForTLS is true. The guard size is
> already
> > >>>>     899     // accounted in this case, please see comments in
> > >>>>     900     // get_static_tls_area_size().
> > >>>>
> > >>>> Given the extensive commentary in get_static_tls_area_size() can we
> be
> > >>>> more brief here and just say:
> > >>>>
> > >>>> // Adjust the stack size for on-stack TLS - see
> get_static_tls_area_size().
> > >>>
> > >>> Done.
> > >>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>> 5193   get_minstack_init();
> > >>>>
> > >>>> Can you make this conditional on AdjustStackSizeForTLS please so
> there
> > >>>> is no affect when not using the flag - thanks.
> > >>>
> > >>> Done.
> > >>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>> 855   }
> > >>>> 856   return tls_size;
> > >>>>
> > >>>> Can you insert a logging statement:
> > >>>>
> > >>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT,
> > >>>> tls_size);
> > >>>
> > >>> Added.
> > >>>
> > >>>>
> > >>>> ---
> > >>>>
> > >>>> test/hotspot/jtreg/runtime/TLS/T.java
> > >>>>
> > >>>> 37             // Starting a ProcessBuilder causes the process
> reaper
> > >>>> thread being
> > >>>>
> > >>>> s/being/to be/
> > >>>
> > >>> Fixed.
> > >>>
> > >>>>
> > >>>>     43             // failure mode the VM fails to create thread
> with
> > >>>> error message
> > >>>>
> > >>>> s/create thread/create a thread/
> > >>>
> > >>> Fixed.
> > >>>
> > >>>>
> > >>>>      53                 System.out.println("Unexpected Echo output:
> " +
> > >>>> echoOutput +
> > >>>>      54                                    ", expects: " +
> echoInput);
> > >>>>
> > >>>> should this be an exception so that test fails? I can't imagine how
> echo
> > >>>> would fail but probably better to fail the test if something
> unexpected
> > >>>> happens.
> > >>>
> > >>> If no expected output is obtained from echo (due to ProcessBuilder
> > >>> failure caused by TLS issue), the test does fail and reports to the
> > >>> caller (returns false). If we throw an explicit expectation, it will
> > >>> be caught by the outer try/catch, which seems to be unnecessary.
> > >>>
> > >>>>
> > >>>> 66         try {
> > >>>>      67             br = new BufferedReader(new
> > >>>> InputStreamReader(inputStream));
> > >>>>      68             s = br.readLine();
> > >>>>      69         } finally {
> > >>>>      70             br.close();
> > >>>>      71         }
> > >>>>
> > >>>> This could use try-with-resources for both streams.
> > >>>
> > >>> Sounds good. Done.
> > >>>
> > >>>>
> > >>>> ---
> > >>>>
> > >>>> exestack-tls.c
> > >>>>
> > >>>> It's simpler if no argument means no-tls and an argument means tls.
> > >>>>
> > >>>>      42     char classpath[4096];
> > >>>>      43     snprintf(classpath, sizeof classpath,
> > >>>>      44              "-Djava.class.path=%s", getenv("CLASSPATH"));
> > >>>>      45     options[0].optionString = classpath;
> > >>>>
> > >>>> Do we need to explicitly set the classpath? I'm concerned that our
> test
> > >>>> environment uses really, really long paths and a number of them.
> > >>>> (Probably not 4096 but still ...)
> > >>>
> > >>> The classpath needs to be set so we know where to load the test
> class.
> > >>> Given that we use 4096 for the same type of usage in other existing
> > >>> test(s) (for example StackGap), it probably is okay for our test
> > >>> environments? I can increase the array size if we want to be extra
> > >>> safe ...
> > >>>
> > >>>>
> > >>>>
> > >>>> test/hotspot/jtreg/runtime/TLS/testtls.sh
> > >>>>
> > >>>>      40 if [ "${VM_OS}" != "linux" ]
> > >>>>      41 then
> > >>>>      42   echo "Test is only valid for Linux"
> > >>>>      43   exit 0
> > >>>>      44 fi
> > >>>>
> > >>>> This should be done via "@requires os.family != Linux"
> > >>>
> > >>> Done.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Best regards,
> > >>> Jiangli
> > >>>>
> > >>>> Thanks,
> > >>>> David
> > >>>> -----
> > >>>>
> > >>>>> Thanks for everyone's contribution on carving out the current
> workaround!
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Jiangli
> > >>>>>
> > >>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou <
> jianglizhou at google.com> wrote:
> > >>>>>>
> > >>>>>> Thank you Thomas and David! Glad to see that we are converging on
> an
> > >>>>>> acceptable approach here. I'll try to factor in all the latest
> inputs
> > >>>>>> from everyone and send out a new update.
> > >>>>>>
> > >>>>>> Thanks and best regards,
> > >>>>>> Jiangli
> > >>>>>>
> > >>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes <
> david.holmes at oracle.com> wrote:
> > >>>>>>>
> > >>>>>>> Trimming ....
> > >>>>>>>
> > >>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote:
> > >>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer <
> fweimer at redhat.com> wrote:
> > >>>>>>>>> I think you can handle the guard size in this way:
> > >>>>>>>>>
> > >>>>>>>>>       pthread_attr_setguardsize(&attr, guard_size);
> > >>>>>>>>>
> > >>>>>>>>>       size_t stack_adjust_size = 0;
> > >>>>>>>>>       if (AdjustStackSizeForTLS) {
> > >>>>>>>>>         size_t minstack_size = get_minstack(&attr);
> > >>>>>>>>>         size_t tls_size = minstack_size - vm_page_size() -
> PTHREAD_STACK_MIN;
> > >>>>>>>>>         // In glibc before 2.27, tls_size still includes
> guard_size.
> > >>>>>>>>>         // In glibc 2.27 and later, guard_size is automatically
> > >>>>>>>>>         // added to the stack size by pthread_create.
> > >>>>>>>>>         // In both cases, the guard size is taken into account.
> > >>>>>>>>>         stack_adjust_size += tls_size;
> > >>>>>>>>>       } else {
> > >>>>>>>>>         stack_adjust_size += guard_size;
> > >>>>>>>>>       }
> > >>>>>>>>
> > >>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as
> others
> > >>>>>>>> are okay with the above suggested adjustment, it looks good to
> me.
> > >>>>>>>> Thomas, David and others, any objection?
> > >>>>>>>
> > >>>>>>> I find the above acceptable. I've been waiting for the dust to
> settle.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> David
> > >>>>>>>
> > >>>>>>>> Thanks and best regards,
> > >>>>>>>> Jiangli
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Florian
>


More information about the hotspot-runtime-dev mailing list