Thread stack size issue related to glibc TLS bug

David Holmes david.holmes at oracle.com
Thu May 30 05:39:19 UTC 2019


Hi Florian,

On 24/05/2019 8:13 pm, Florian Weimer wrote:
> * David Holmes:
> 
>> My thoughts haven't really changed since 2015 - and sadly neither has
>> there been any change in glibc in that time. Nor, to my recollection,
>> have there been any other reported issues with this.
> 
> The issue gets occasionally reported by people who use small stacks with
> large initial-exec TLS consumers (such as jemalloc).  On the glibc side,
> we aren't entirely sure what to do about this.  We have recently tweaked
> the stack size computation, so that in many cases, threads now receive
> an additional page.  This was necessary to work around a kernel/hardware
> change where context switches started to push substantially more data on
> the stack than before, and minimal stack sizes did not work anymore on
> x86-64 (leading to ntpd crashing during startup, among other things).
> 
> The main concern is that for workloads with carefully tuned stack sizes,
> revamping the stack size computation so that TLS is no longer
> effectively allocated on the stack might result in address space
> exhaustion.  (This should only be a concern on 32-bit architectures.)
> 
> Even if we changed this today (or had changed it in 2015), it would take
> a long time for the change to end up with end users, so it's unclear how
> much help it would be.

If it had been fixed in 2012 it wouldn't be an issue today. If it gets 
fixed today then it may not be an issue in 2025. If it is not fixed then 
it will always be an issue. Stealing the TLS space out of the stack 
requested by the user is just not a reasonable thing to do IMHO.

> Maybe OpenJDK can add a property specifying a stack size reserve, and
> htis number is added to all stack size requests?  This will at least
> allow users to work around the issue locally.

This would be a low-impact workaround, though as Jiangli points out it 
is a bit hard on the end-user as they first have to hit the problem, 
then recognize what it is, then realize there's a potential solution and 
then determine the right magic number to use. Better than nothing but 
not ideal.

Further follow up coming in response to your later email.

Cheers,
David
-----

> 
> If we change the accounting in glibc, we will have to add a similar
> tunable on the glibc side, too.
> 
> Thanks,
> Florian
> 


More information about the core-libs-dev mailing list