Thread stack size issue related to glibc TLS bug
Jiangli Zhou
jianglizhou at google.com
Fri May 24 17:26:42 UTC 2019
Hi David,
Thanks for the feedback. One particular use case where the issue
surfaces is with FDO (feedback directed optimization) instrumentation
enabled build [0]. That could be a common use case.
_dl_get_tls_static_info is a glibc private symbol. I see Florian has
concerns about the usage of the API.
[0] https://en.wikipedia.org/wiki/Profile-guided_optimization
Thanks,
Jiangli
On Thu, May 23, 2019 at 7:08 PM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> On 24/05/2019 9:21 am, Jiangli Zhou wrote:
> > Hi David (and others),
> >
> > There was a discussion [1] (between you, Jeremy, Martin and others)
> > back in 2015 regarding a stack size issue caused by a glibc bug
> > related to TLS (Thread local storage) [2]. The issue was manifested as
> > a StackOverflowError with the reported test in JDK-8130425 [0] when
> > large TLS size is used. A workaround was introduced with
> > -Djdk.lang.processReaperUseDefaultStackSize. Based on the glibc
> > discussion thread [2], Rust implemented a fix by taking into account
> > of the TLS size. From one of the comments in the OpenJDK discussion
> > archive [3], looks like you considered similar fix could be applied
> > for JVM. I talked to Jeremy about sharing his fix for this particular
> > issue today. The fix appears to be a more general solution than the
> > processReaperUseDefaultStackSize workaround. It has been tested/used
> > for server years and seems to be stable. The link to the changeset is
> > listed below. Please let me know your thoughts on taking the change in
> > OpenJDK.
>
> My thoughts haven't really changed since 2015 - and sadly neither has
> there been any change in glibc in that time. Nor, to my recollection,
> have there been any other reported issues with this.
>
> If this were to be taken into hotspot then I think it has to be opt-in
> via a flag so that it doesn't make sudden and unexpected differences in
> the number of threads an application can create. It may also be worth
> considering, from the bugzilla discussion, only adding in the TLS size
> if it is greater than a certain percentage of the stack size being
> requested. That would limit the impact to threads with small stacks
> without forcing every thread to have to grow by the TLS size.
>
> But I'd want to know how often this is actually needed. As Andrew Haley
> said in the original discussion thread "I think we're rather looking at
> abuse of TLS here.".
>
> And I'd need to understand better what versions of glibc this would work
> for (and how they relate to current distros).
>
> Cheers,
> David
>
> > [0] JDK bug: https://bugs.openjdk.java.net/browse/JDK-8130425
> > [1] OpenJDK discussion archive:
> > http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-December/037558.html
> > [2] glibc discussion archive:
> > http://sourceware.org/bugzilla/show_bug.cgi?id=11787
> > [3] change: http://cr.openjdk.java.net/~jiangli/tls_size/webrev/
> > (contributed by Jeremy Manson)
> >
> > The #ifdef __GLIBC__ in the change could be removed as os_linux.cpp
> > already makes assumption about the use of glibc.
> >
> > Best regards,
> > Jiangli
> >
More information about the core-libs-dev
mailing list