Thread stack size issue related to glibc TLS bug
David Holmes
david.holmes at oracle.com
Mon Jun 3 07:00:18 UTC 2019
Hi Jiangli,
On 31/05/2019 1:23 am, Jiangli Zhou wrote:
> Hi David,
>
> This is a link to __pthread_get_minstack that I find in the public
> domain: https://code.woboq.org/userspace/glibc/nptl/nptl-init.c.html.
> It has copyright of 2002-2019, so it's probably the latest version.
>
> size_t
> __pthread_get_minstack (const pthread_attr_t *attr)
> {
> return GLRO(dl_pagesize) + __static_tls_size + PTHREAD_STACK_MIN;
> }
>
> The PTHREAD_STACK_MIN appears to be 4-pages for the glibc version
> (2.24) that I'm linking with. The dl_pagesize is 1-page.
>
> We could go with the suggestion that you brought up earlier by only
> adding the min_stack_size to the current thread stack size if it is
> certain percent of the stack size. With the added dl_pagesize and
> PTHREAD_STACK_MIN, 25% probably is reasonable? I've made changes
> (including the percent suggestion) on top the existing patch and also
> added a jtreg test based on the test case attached in JDK-8130425. On
> JDK 13, the test could fail with different symptoms (segfault or hang)
> depending on the TLS sizes without the fix.
We will need to do a CSR request for this behaviour change. I'm not sure
if 25% may be reasonable or not. I would have opted for something
smaller though as a starting point - say 10%. [Update: just saw the bug
comment you added :) ]. So if the TLS is going to steal more than 10% of
the requested stack then we add on the TLS size. Do we have anything to
use as a guide here? How much TLS does that profiler use? Makes me think
the percentage itself needs to be tunable via a flag ... that would also
allow for it to be disabled completely.
Thanks,
David
-----
> BTW, I noticed that Florian had already made the comment change in
> glibc for __pthread_get_minstack. Thanks!
>
> Best regards,
> Jiangli
>
> On Wed, May 29, 2019 at 10:42 PM David Holmes <david.holmes at oracle.com> wrote:
>>
>> Hi Florian,
>>
>> On 25/05/2019 6:50 am, Florian Weimer wrote:
>>> * Jiangli Zhou:
>>>
>>>> Hi Florian,
>>>>
>>>> On Fri, May 24, 2019 at 2:46 AM Florian Weimer <fweimer at redhat.com> wrote:
>>>>>
>>>>> * Jiangli Zhou:
>>>>>
>>>>>> [3] change: http://cr.openjdk.java.net/~jiangli/tls_size/webrev/
>>>>>> (contributed by Jeremy Manson)
>>>>>
>>>>> _dl_get_tls_static_info is an internal symbol (it carries a
>>>>> GLIBC_PRIVATE symbol version). Its implementation can change at any
>>>>> time. Please do not do this.
>>>>
>>>> Point taken. Thanks.
>>>>
>>>> It appears that asan is already carrying the same type of fix:
>>>>
>>>> https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc
>>>>
>>>> As the issue has not been able to be addressed in the glibc layer, all
>>>> the others have to workaround it (and possibly by using the glibc
>>>> private APIs, _dl_get_tls_static_info or __pthread_get_minstack). That
>>>> would cause more dependencies on the private APIs.
>>>
>>> _dl_get_tls_static_info changed ABI on i386 in glibc 2.27 already, so
>>> you really don't want to use that (in case someone backported that
>>> cleanup to an earlier version, although that might be bit unlikely).
>>>
>>> The sanitizers are special and have a much shorter shelf life than the
>>> OpenJDK binaries.
>>>
>>>> One alternative (besides fixing glibc) may be making
>>>> _dl_get_tls_static_info public. Would that be possible?
>>>
>>> For __pthread_get_minstack, I can add a comment to the glibc sources
>>> that if the ABI changes (or if TLS allocations are no longer considered
>>> part of the stack), we should rename the function. Then, as long as you
>>> use a weak reference or dlsym (the latter is preferable for the sack of
>>> RPM-based distributions which require special steps to reference
>>> GLIBC_PRIVATE symbols directly), old binaries would keep working if we
>>> make further changes.
>>>
>>> Since _dl_get_tls_static_info already changed ABI once, I really can't
>>> recommend its use. Especially if you can work with
>>> __pthread_get_minstack instead.
>>
>> Can you explain for me what value this __pthread_get_minstack is defined
>> to return? Will it accommodate any required TLS plus some other glibc
>> specific overhead? How much memory are we talking about?
>>
>>> The value of PTHREAD_STACK_MIN is rather problematic on x86-64 (for the
>>> reasons I explained earlier), but it's not likely we are going to change
>>> the value of the constant any time soon. It's more likely that we are
>>> going to work around the AVX-512 register file issue by providing
>>> exactly four usable stack pages with PTHREAD_STACK_MIN, and not less
>>> than two as we did until recently. So you can assume that it's indeed
>>> possible to use PTHREAD_STACK_MIN and sysconf (_SC_PAGESIZE) to compute
>>> the static TLS area size.
>>
>> Sorry can you elaborate on that calculation please?
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks,
>>> Florian
>>>
More information about the core-libs-dev
mailing list