Thread stack size issue related to glibc TLS bug
Jiangli Zhou
jianglizhou at google.com
Mon Jun 3 16:04:08 UTC 2019
Hi David,
Using a flag for setting the percentage (with the default set to 10%)
sounds plausible to me. I'll work on the CSR and also update the patch
to add the flag. Thank you!
Best regards,
Jiangli
On Mon, Jun 3, 2019 at 12:00 AM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> On 31/05/2019 1:23 am, Jiangli Zhou wrote:
> > Hi David,
> >
> > This is a link to __pthread_get_minstack that I find in the public
> > domain: https://code.woboq.org/userspace/glibc/nptl/nptl-init.c.html.
> > It has copyright of 2002-2019, so it's probably the latest version.
> >
> > size_t
> > __pthread_get_minstack (const pthread_attr_t *attr)
> > {
> > return GLRO(dl_pagesize) + __static_tls_size + PTHREAD_STACK_MIN;
> > }
> >
> > The PTHREAD_STACK_MIN appears to be 4-pages for the glibc version
> > (2.24) that I'm linking with. The dl_pagesize is 1-page.
> >
> > We could go with the suggestion that you brought up earlier by only
> > adding the min_stack_size to the current thread stack size if it is
> > certain percent of the stack size. With the added dl_pagesize and
> > PTHREAD_STACK_MIN, 25% probably is reasonable? I've made changes
> > (including the percent suggestion) on top the existing patch and also
> > added a jtreg test based on the test case attached in JDK-8130425. On
> > JDK 13, the test could fail with different symptoms (segfault or hang)
> > depending on the TLS sizes without the fix.
>
> We will need to do a CSR request for this behaviour change. I'm not sure
> if 25% may be reasonable or not. I would have opted for something
> smaller though as a starting point - say 10%. [Update: just saw the bug
> comment you added :) ]. So if the TLS is going to steal more than 10% of
> the requested stack then we add on the TLS size. Do we have anything to
> use as a guide here? How much TLS does that profiler use? Makes me think
> the percentage itself needs to be tunable via a flag ... that would also
> allow for it to be disabled completely.
>
> Thanks,
> David
> -----
>
> > BTW, I noticed that Florian had already made the comment change in
> > glibc for __pthread_get_minstack. Thanks!
> >
> > Best regards,
> > Jiangli
> >
> > On Wed, May 29, 2019 at 10:42 PM David Holmes <david.holmes at oracle.com> wrote:
> >>
> >> Hi Florian,
> >>
> >> On 25/05/2019 6:50 am, Florian Weimer wrote:
> >>> * Jiangli Zhou:
> >>>
> >>>> Hi Florian,
> >>>>
> >>>> On Fri, May 24, 2019 at 2:46 AM Florian Weimer <fweimer at redhat.com> wrote:
> >>>>>
> >>>>> * Jiangli Zhou:
> >>>>>
> >>>>>> [3] change: http://cr.openjdk.java.net/~jiangli/tls_size/webrev/
> >>>>>> (contributed by Jeremy Manson)
> >>>>>
> >>>>> _dl_get_tls_static_info is an internal symbol (it carries a
> >>>>> GLIBC_PRIVATE symbol version). Its implementation can change at any
> >>>>> time. Please do not do this.
> >>>>
> >>>> Point taken. Thanks.
> >>>>
> >>>> It appears that asan is already carrying the same type of fix:
> >>>>
> >>>> https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
> >>>>
> >>>> As the issue has not been able to be addressed in the glibc layer, all
> >>>> the others have to workaround it (and possibly by using the glibc
> >>>> private APIs, _dl_get_tls_static_info or __pthread_get_minstack). That
> >>>> would cause more dependencies on the private APIs.
> >>>
> >>> _dl_get_tls_static_info changed ABI on i386 in glibc 2.27 already, so
> >>> you really don't want to use that (in case someone backported that
> >>> cleanup to an earlier version, although that might be bit unlikely).
> >>>
> >>> The sanitizers are special and have a much shorter shelf life than the
> >>> OpenJDK binaries.
> >>>
> >>>> One alternative (besides fixing glibc) may be making
> >>>> _dl_get_tls_static_info public. Would that be possible?
> >>>
> >>> For __pthread_get_minstack, I can add a comment to the glibc sources
> >>> that if the ABI changes (or if TLS allocations are no longer considered
> >>> part of the stack), we should rename the function. Then, as long as you
> >>> use a weak reference or dlsym (the latter is preferable for the sack of
> >>> RPM-based distributions which require special steps to reference
> >>> GLIBC_PRIVATE symbols directly), old binaries would keep working if we
> >>> make further changes.
> >>>
> >>> Since _dl_get_tls_static_info already changed ABI once, I really can't
> >>> recommend its use. Especially if you can work with
> >>> __pthread_get_minstack instead.
> >>
> >> Can you explain for me what value this __pthread_get_minstack is defined
> >> to return? Will it accommodate any required TLS plus some other glibc
> >> specific overhead? How much memory are we talking about?
> >>
> >>> The value of PTHREAD_STACK_MIN is rather problematic on x86-64 (for the
> >>> reasons I explained earlier), but it's not likely we are going to change
> >>> the value of the constant any time soon. It's more likely that we are
> >>> going to work around the AVX-512 register file issue by providing
> >>> exactly four usable stack pages with PTHREAD_STACK_MIN, and not less
> >>> than two as we did until recently. So you can assume that it's indeed
> >>> possible to use PTHREAD_STACK_MIN and sysconf (_SC_PAGESIZE) to compute
> >>> the static TLS area size.
> >>
> >> Sorry can you elaborate on that calculation please?
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>> Thanks,
> >>> Florian
> >>>
More information about the core-libs-dev
mailing list