RFR: 8225035: Thread stack size issue caused by large TLS size

Jiangli Zhou jianglizhou at google.com
Fri Jun 28 23:42:47 UTC 2019


Hi David,

Thanks for the detailed comments! Here is the latest webrev:
http://cr.openjdk.java.net/~jiangli/8225035/webrev.04/. Apologizing
for not including an incremental webrev (realized that when I almost
done edits).

On Fri, Jun 28, 2019 at 8:57 AM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> This is very well written up - thanks.
>
> I apologize in advance that I'm about to traveling for a few days so
> won't be able to respond further until next week.

Hope you have a relaxed and safe travel. This can wait.

>
> On 27/06/2019 11:58 pm, Jiangli Zhou wrote:
> > Updated webrev:
> > http://cr.openjdk.java.net/~jiangli/8225035/webrev.03/
>
> Overall changes look good. I also have a concern around this code:
>
> +     tls_size = minstack_size - os::vm_page_size() - PTHREAD_STACK_MIN;
> +     assert(tls_size > 0, "unexpected size");
>
> In addition to Thomas's comments re-signedness, can't the result == 0 if
> there is no sttaic TLS in use? Or is there always some static TLS in use?

On both glibc 2.24 and 2.28, by default I see there is one page memory
for static TLS, without explicitly defining any __thread variables in
user code.

>
> A few other comments/requests:
>
>   822 static void get_minstack_init() {
>   823   _get_minstack_func =
>   824         (GetMinStack)dlsym(RTLD_DEFAULT, "__pthread_get_minstack");
>   825 }
>
> Can you add a logging statement please:
>
> log(os, thread)("Lookup of __pthread_get_minstack %s",
>                  _get_minstack_func == NULL ? "failed" : "succeeded");

Added log info.

>
> --
>
>   884   // In the Linux NPTL pthread implementation the guard size mechanism
>
> Now that we have additional information on this could you update this
> old comment to say
>
> // In glibc versions prior to 2.7 the guard size mechanism

Done.

>
> --
>
>   897     // Adjust the stack_size by adding the on-stack TLS size if
>   898     // AdjustStackSizeForTLS is true. The guard size is already
>   899     // accounted in this case, please see comments in
>   900     // get_static_tls_area_size().
>
> Given the extensive commentary in get_static_tls_area_size() can we be
> more brief here and just say:
>
> // Adjust the stack size for on-stack TLS - see get_static_tls_area_size().

Done.

>
> --
>
> 5193   get_minstack_init();
>
> Can you make this conditional on AdjustStackSizeForTLS please so there
> is no affect when not using the flag - thanks.

Done.

>
> --
>
> 855   }
> 856   return tls_size;
>
> Can you insert a logging statement:
>
> log(os, thread)("Stack size adjustment for TLS is " SIZE_T_FORMAT,
> tls_size);

Added.

>
> ---
>
> test/hotspot/jtreg/runtime/TLS/T.java
>
> 37             // Starting a ProcessBuilder causes the process reaper
> thread being
>
> s/being/to be/

Fixed.

>
>   43             // failure mode the VM fails to create thread with
> error message
>
> s/create thread/create a thread/

Fixed.

>
>    53                 System.out.println("Unexpected Echo output: " +
> echoOutput +
>    54                                    ", expects: " + echoInput);
>
> should this be an exception so that test fails? I can't imagine how echo
> would fail but probably better to fail the test if something unexpected
> happens.

If no expected output is obtained from echo (due to ProcessBuilder
failure caused by TLS issue), the test does fail and reports to the
caller (returns false). If we throw an explicit expectation, it will
be caught by the outer try/catch, which seems to be unnecessary.

>
> 66         try {
>    67             br = new BufferedReader(new
> InputStreamReader(inputStream));
>    68             s = br.readLine();
>    69         } finally {
>    70             br.close();
>    71         }
>
> This could use try-with-resources for both streams.

Sounds good. Done.

>
> ---
>
> exestack-tls.c
>
> It's simpler if no argument means no-tls and an argument means tls.
>
>    42     char classpath[4096];
>    43     snprintf(classpath, sizeof classpath,
>    44              "-Djava.class.path=%s", getenv("CLASSPATH"));
>    45     options[0].optionString = classpath;
>
> Do we need to explicitly set the classpath? I'm concerned that our test
> environment uses really, really long paths and a number of them.
> (Probably not 4096 but still ...)

The classpath needs to be set so we know where to load the test class.
Given that we use 4096 for the same type of usage in other existing
test(s) (for example StackGap), it probably is okay for our test
environments? I can increase the array size if we want to be extra
safe ...

>
>
> test/hotspot/jtreg/runtime/TLS/testtls.sh
>
>    40 if [ "${VM_OS}" != "linux" ]
>    41 then
>    42   echo "Test is only valid for Linux"
>    43   exit 0
>    44 fi
>
> This should be done via "@requires os.family != Linux"

Done.

Thanks!

Best regards,
Jiangli
>
> Thanks,
> David
> -----
>
> > Thanks for everyone's contribution on carving out the current workaround!
> >
> > Best regards,
> > Jiangli
> >
> > On Thu, Jun 27, 2019 at 11:42 AM Jiangli Zhou <jianglizhou at google.com> wrote:
> >>
> >> Thank you Thomas and David! Glad to see that we are converging on an
> >> acceptable approach here. I'll try to factor in all the latest inputs
> >> from everyone and send out a new update.
> >>
> >> Thanks and best regards,
> >> Jiangli
> >>
> >> On Thu, Jun 27, 2019 at 11:13 AM David Holmes <david.holmes at oracle.com> wrote:
> >>>
> >>> Trimming ....
> >>>
> >>> On 27/06/2019 12:35 pm, Jiangli Zhou wrote:
> >>>> On Thu, Jun 27, 2019 at 9:23 AM Florian Weimer <fweimer at redhat.com> wrote:
> >>>>> I think you can handle the guard size in this way:
> >>>>>
> >>>>>     pthread_attr_setguardsize(&attr, guard_size);
> >>>>>
> >>>>>     size_t stack_adjust_size = 0;
> >>>>>     if (AdjustStackSizeForTLS) {
> >>>>>       size_t minstack_size = get_minstack(&attr);
> >>>>>       size_t tls_size = minstack_size - vm_page_size() - PTHREAD_STACK_MIN;
> >>>>>       // In glibc before 2.27, tls_size still includes guard_size.
> >>>>>       // In glibc 2.27 and later, guard_size is automatically
> >>>>>       // added to the stack size by pthread_create.
> >>>>>       // In both cases, the guard size is taken into account.
> >>>>>       stack_adjust_size += tls_size;
> >>>>>     } else {
> >>>>>       stack_adjust_size += guard_size;
> >>>>>     }
> >>>>
> >>>> Is the vm_page_size() counted for the dl_pagesize? As long as others
> >>>> are okay with the above suggested adjustment, it looks good to me.
> >>>> Thomas, David and others, any objection?
> >>>
> >>> I find the above acceptable. I've been waiting for the dust to settle.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>>> Thanks and best regards,
> >>>> Jiangli
> >>>>>
> >>>>> Thanks,
> >>>>> Florian


More information about the hotspot-runtime-dev mailing list