RFR: 8225035: Thread stack size issue caused by large TLS size
Jiangli Zhou
jianglizhou at google.com
Wed Jul 3 18:22:53 UTC 2019
Hi David,
On Wed, Jul 3, 2019 at 12:00 AM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> On 2/07/2019 8:33 am, Jiangli Zhou wrote:
> > Hi David,
> >
> > On Mon, Jul 1, 2019 at 1:22 AM David Holmes <david.holmes at oracle.com> wrote:
> >>
> >> Hi Jiangli,
> >>
> >> On 29/06/2019 9:42 am, Jiangli Zhou wrote:
> >>> Hi David,
> >>>
> >>> Thanks for the detailed comments! Here is the latest webrev:
> >>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing
> >>> for not including an incremental webrev (realized that when I almost
> >>> done edits).
> >>
> >> That all looks fine - thanks for making the changes.
> >>
> >> However ... now that I see the logging output it occurred to me that
> >> checking for the TLS adjustment is something that should only happen
> >> once and we should be storing the adjustment amount in a static for
> >> direct use. Sorry I didn't think about this earlier. Something like:
> >
> > No problem at all! I actually thought in the same direction as well
> > when making the change initially but didn't go with it as I had some
> > concerns. Florian's latest reply is reassuring (thanks again!). So
> > here is the update:
> > http://cr.openjdk.java.net/~jiangli/8225035/webrev_inc.05/.
>
> The incremental change looks fine to me. To address Florian's concern I
> suggest adding an additional comment:
>
> // Returns the size of the static TLS area glibc puts on thread stacks.
> + // The value is cached on first use, which occurs when the first thread
> + // is created during VM initialization.
> static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
Done.
>
> > Since we are settling down on the approach and final implementation
> > details, it would be a good idea to get the CSR ball rolling. Could
> > you please review and I'll finalize the CSR. Thanks!
>
> Done.
Thank you!
As you, Florian, Thomas all made great contributions to this
workaround, I should list all of you as both contributors and
reviewers in the changeset. If there is any objection, please let me
know.
Best regards,
Jiangli
>
> Thanks,
> David
>
> > Best regards,
> > Jiangli
> >
> >>
> >> static size_t tls_size = 0;
> >> static bool tls_size_inited = false;
> >>
> >> static size_t get_static_tls_area_size(const pthread_attr_t *attr) {
> >> + if (!tls_size_inited) {
> >> + tls_size_inited = true;
> >> if (_get_minstack_func != NULL) {
> >> size_t minstack_size = _get_minstack_func(attr);
> >> ...
> >> if (minstack_size > (size_t)os::vm_page_size() +
> >> PTHREAD_STACK_MIN) {
> >> tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN;
> >> }
> >> }
> >> + }
> >> log_info(os, thread)("Stack size adjustment for TLS is " SIZE_FORMAT,
> >> tls_size);
> >> return tls_size;
> >> }
> >>
> >> Or even fold it all into get_minstack_init() ?
> >>
> >> I'm assuming that the result of __pthread_get_minstack wont' change over
> >> time of course.
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>> On Fri, Jun 28, 2019 at 8:57 AM David Holmes <david.holmes at oracle.com> wrote:
> >>>>
> >>>> Hi Jiangli,
> >>>>
> >>>> This is very well written up - thanks.
> >>>>
> >>>> I apologize in advance that I'm about to traveling for a few days so
> >>>> won't be able to respond further until next week.
> >>>
> >>> Hope you have a relaxed and safe travel. This can wait.
> >>>
> >>>>
> >>>> On 27/06/2019 11:58 pm, Jiangli Zhou wrote:
> >>>>> Updated webrev:
> >>>>> http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/
> >>>>
> >>>> Overall changes look good. I also have a concern around this code:
> >>>>
> >>>> + tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN;
> >>>> + assert(tls_size > 0, "unexpected size");
> >>>>
> >>>> In addition to Thomas's comments re-signedness, can't the result == 0 if
> >>>> there is no sttaic TLS in use? Or is there always some static TLS in use?
> >>>
> >>> On both glibc 2.24 and 2.28, by default I see there is one page memory
> >>> for static TLS, without explicitly defining any __thread variables in
> >>> user code.
> >>>
> >>>>
> >>>> A few other comments/requests:
> >>>>
> >>>> 822 static void get_minstack_init() {
> >>>> 823 _get_minstack_func =
> >>>> 824 (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack");
> >>>> 825 }
> >>>>
> >>>> Can you add a logging statement please:
> >>>>
> >>>> log(os, thread)("Lookup of __pthread_get_minstack %s",
> >>>> _get_minstack_func == NULL ? "failed" : "succeeded");
> >>>
> >>> Added log info.
> >>>
> >>>>
> >>>> --
> >>>>
> >>>> 884 // In the Linux NPTL pthread implementation the guard size mechanism
> >>>>
> >>>> Now that we have additional information on this could you update this
> >>>> old comment to say
> >>>>
> >>>> // In glibc versions prior to 2.7 the guard size mechanism
> >>>
> >>> Done.
> >>>
> >>>>
> >>>> --
> >>>>
> >>>> 897 // Adjust the stack_size by adding the on-stack TLS size if
> >>>> 898 // AdjustStackSizeForTLS is true. The guard size is already
> >>>> 899 // accounted in this case, please see comments in
> >>>> 900 // get_static_tls_area_size().
> >>>>
> >>>> Given the extensive commentary in get_static_tls_area_size() can we be
> >>>> more brief here and just say:
> >>>>
> >>>> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size().
> >>>
> >>> Done.
> >>>
> >>>>
> >>>> --
> >>>>
> >>>> 5193 get_minstack_init();
> >>>>
> >>>> Can you make this conditional on AdjustStackSizeForTLS please so there
> >>>> is no affect when not using the flag - thanks.
> >>>
> >>> Done.
> >>>
> >>>>
> >>>> --
> >>>>
> >>>> 855 }
> >>>> 856 return tls_size;
> >>>>
> >>>> Can you insert a logging statement:
> >>>>
> >>>> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT,
> >>>> tls_size);
> >>>
> >>> Added.
> >>>
> >>>>
> >>>> ---
> >>>>
> >>>> test/hotspot/jtreg/runtime/TLS/T.java
> >>>>
> >>>> 37 // Starting a ProcessBuilder causes the process reaper
> >>>> thread being
> >>>>
> >>>> s/being/to be/
> >>>
> >>> Fixed.
> >>>
> >>>>
> >>>> 43 // failure mode the VM fails to create thread with
> >>>> error message
> >>>>
> >>>> s/create thread/create a thread/
> >>>
> >>> Fixed.
> >>>
> >>>>
> >>>> 53 System.out.println("Unexpected Echo output: " +
> >>>> echoOutput +
> >>>> 54 ", expects: " + echoInput);
> >>>>
> >>>> should this be an exception so that test fails? I can't imagine how echo
> >>>> would fail but probably better to fail the test if something unexpected
> >>>> happens.
> >>>
> >>> If no expected output is obtained from echo (due to ProcessBuilder
> >>> failure caused by TLS issue), the test does fail and reports to the
> >>> caller (returns false). If we throw an explicit expectation, it will
> >>> be caught by the outer try/catch, which seems to be unnecessary.
> >>>
> >>>>
> >>>> 66 try {
> >>>> 67 br = new BufferedReader(new
> >>>> InputStreamReader(inputStream));
> >>>> 68 s = br.readLine();
> >>>> 69 } finally {
> >>>> 70 br.close();
> >>>> 71 }
> >>>>
> >>>> This could use try-with-resources for both streams.
> >>>
> >>> Sounds good. Done.
> >>>
> >>>>
> >>>> ---
> >>>>
> >>>> exestack-tls.c
> >>>>
> >>>> It's simpler if no argument means no-tls and an argument means tls.
> >>>>
> >>>> 42 char classpath[4096];
> >>>> 43 snprintf(classpath, sizeof classpath,
> >>>> 44 "-Djava.class.path=%s", getenv("CLASSPATH"));
> >>>> 45 options[0].optionString = classpath;
> >>>>
> >>>> Do we need to explicitly set the classpath? I'm concerned that our test
> >>>> environment uses really, really long paths and a number of them.
> >>>> (Probably not 4096 but still ...)
> >>>
> >>> The classpath needs to be set so we know where to load the test class.
> >>> Given that we use 4096 for the same type of usage in other existing
> >>> test(s) (for example StackGap), it probably is okay for our test
> >>> environments? I can increase the array size if we want to be extra
> >>> safe ...
> >>>
> >>>>
> >>>>
> >>>> test/hotspot/jtreg/runtime/TLS/testtls.sh
> >>>>
> >>>> 40 if [ "${VM_OS}" != "linux" ]
> >>>> 41 then
> >>>> 42 echo "Test is only valid for Linux"
> >>>> 43 exit 0
> >>>> 44 fi
> >>>>
> >>>> This should be done via "@requires os.family != Linux"
> >>>
> >>> Done.
> >>>
> >>> Thanks!
> >>>
> >>> Best regards,
> >>> Jiangli
> >>>>
> >>>> Thanks,
> >>>> David
> >>>> -----
> >>>>
> >>>>> Thanks for everyone's contribution on carving out the current workaround!
> >>>>>
> >>>>> Best regards,
> >>>>> Jiangli
> >>>>>
> >>>>> On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou <jianglizhou at google.com> wrote:
> >>>>>>
> >>>>>> Thank you Thomas and David! Glad to see that we are converging on an
> >>>>>> acceptable approach here. I'll try to factor in all the latest inputs
> >>>>>> from everyone and send out a new update.
> >>>>>>
> >>>>>> Thanks and best regards,
> >>>>>> Jiangli
> >>>>>>
> >>>>>> On Thu, Jun 27, 2019 at 11:13 AM David Holmes <david.holmes at oracle.com> wrote:
> >>>>>>>
> >>>>>>> Trimming ....
> >>>>>>>
> >>>>>>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote:
> >>>>>>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer <fweimer at redhat.com> wrote:
> >>>>>>>>> I think you can handle the guard size in this way:
> >>>>>>>>>
> >>>>>>>>> pthread_attr_setguardsize(&attr, guard_size);
> >>>>>>>>>
> >>>>>>>>> size_t stack_adjust_size = 0;
> >>>>>>>>> if (AdjustStackSizeForTLS) {
> >>>>>>>>> size_t minstack_size = get_minstack(&attr);
> >>>>>>>>> size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN;
> >>>>>>>>> // In glibc before 2.27, tls_size still includes guard_size.
> >>>>>>>>> // In glibc 2.27 and later, guard_size is automatically
> >>>>>>>>> // added to the stack size by pthread_create.
> >>>>>>>>> // In both cases, the guard size is taken into account.
> >>>>>>>>> stack_adjust_size += tls_size;
> >>>>>>>>> } else {
> >>>>>>>>> stack_adjust_size += guard_size;
> >>>>>>>>> }
> >>>>>>>>
> >>>>>>>> Is the vm_page_size() counted for the dl_pagesize? As long as others
> >>>>>>>> are okay with the above suggested adjustment, it looks good to me.
> >>>>>>>> Thomas, David and others, any objection?
> >>>>>>>
> >>>>>>> I find the above acceptable. I've been waiting for the dust to settle.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> David
> >>>>>>>
> >>>>>>>> Thanks and best regards,
> >>>>>>>> Jiangli
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Florian
More information about the hotspot-runtime-dev
mailing list