RFR: 8225035: Thread stack size issue caused by large TLS size

Fri Jun 14 05:46:01 UTC 2019

Hi Thomas,

On 14/06/2019 3:40 am, Thomas Stüfe wrote:
> Hi David,
> 
> On Thu, Jun 13, 2019 at 2:06 AM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Thomas,
> 
>       > I am not sure, but is adding the TLS sized buffer to just small
>     thread
>       > stacks the right thing to do? Why it is okay for thread stacks
>       > slightly larger than the cut-off point to live without this
>     additional
>       > space? Will they not as well be hurt by the large TLS
>     subtraction from
>       > their stack sizes?
> 
>     This was discussed previously:
> 
>     http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034460.html
> 
>     The premise is that the smaller the requested stack the more likely
>     unexpected TLS usage will lead to stackoverflows. If I set 32K stacks
>     and TLS uses 16K I likely have a problem at runtime. If I have 2M
>     stacks
>     and TLS uses 16K I probably don't really care. So the proposal is to
>     use
>     the ratio of TLS-size to requested-stack-size as the trigger -
>     configurable by the end user. That way for the expected case of large
>     stacks with small TLS, and we're not running on the edge of
>     stackoverflow, there will be no change in stack usage or general
>     behaviour. In contrast really small stacks will be expanded by the TLS
>     size to avoid stackoverflow - with some user control. In the latter
>     case
>     if TLS can be accommodated without overflow then the user can increase
>     the ratio to avoid the extra stack usage. If TLS-size causes overflow
>     then the user has no choice if they want their application to run.
> 
>     It's not perfect but nothing is.
> 
> 
> I understand now, thank you.
> 
> I have two thoughts:
> 
> - So we are worrying about disrupting customers with carefully tuned 
> thread stacks, which suddenly may get thread start errors after applying 
> this patch since they need more stack. I can see that. But I would have 
> expected this effect higher especially in small stack scenarios - e.g. 
> customer has lots of threads with carefully tuned mini stacks. For 
> larger thread stacks I would have thought this patch to not matter so much.

It's not so much that the thread stacks are tuned, but that the number 
of threads (with whatever stack sizes) is tuned. If the customer has 
tuned a system so that they can crank up the threads knob to a given 
value and are fully utilizing the system, then causing all stacks to 
grow by the TLS amount will cause them to get resource exhaustion errors.

If the system is under utilized in terms of memory it doesn't matter if 
we take more memory for additional stack space.

> I am not against leaving threads with large stacks out, it just makes 
> the matter a bit more complicated and the switch more difficult to 
> understand than a simple "-XX:+AddStackReserveForTLS" or similar.

Such an on/off switch was an earlier consideration. But given the 
concerns about impact on existing tuned apps such a switch would have to 
be off by default so that users opt-in after they see failures. That was 
my initial position. The use of a ratio to trigger the stack increase 
comes from the bugzilla discussion on how glibc might solve the problem.

The 10% is out-of-thin-air, we don't have any data to really support it. 
The larger the cut-off the more risk of failure with small stacks. The 
smaller the cut-off the more chance of applying it to all threads and 
using too much memory.

As I said this is not perfect, nor can it be. This is a function of 
number of threads, thread stack size, thread stack slack, TLS size, and 
total memory. You can always construct a scenario where a given setting 
won't work for it.

> - I am unsure about the sudden cutoff point. May have understood wrong, 
> but lets assume we have 6K TLS to accommodate, for which 
> pthread_minstack would return 6k + some 4K pages (2?) = 14K.
> 
> requested stack size to real stack size, after the proposed correction, 
> with the default cut off for the thread stack correction of 10%:
> 100K -> 114K
> 120K -> 134K (10% point)
> 121K -> 121K (sudden dip? why?)
> 
> My point is that the penalty for TLS we have to pay is as big at 10.1% 
> as it is at 10%, so maybe we do not want a sudden cutoff?

The cut-off is unfair but also simple. Can you suggest some other way?

> ---
> 
> Another point, but probably just a work item for us at SAP, is that we 
> should test the patch on platforms with large page sizes. I am not sure 
> what pthread_minstack would return e.g on linux ppc64 with 64K pages.

Thats a good point. We may have to factor in whether or not large pages 
are in use.

Thanks,
David

> 
> Thanks, Thomas
> 
>     Cheers,
>     David
>     -----
> 
>     On 13/06/2019 6:45 am, Thomas Stüfe wrote:
>      > Hi Jiangli,
>      >
>      > after learning a lot of interesting things today about TLS :-) I
>     took a
>      > look at your change.
>      >
>      > First another question, just to be sure I understand this: From what
>      > think I understood about TLS, and from the way you wrote your test
>      > (writing an own launcher and all) I assume the problem only
>     happens for
>      > TLS of modules which are loaded at program start, right? So
>     Libjvm.so
>      > and dynamically loaded JNI libs can allocate as many __thread
>     space as
>      > they like, they would not go on the thread stacks? And this is
>     only a
>      > problem when you embed the libjvm into other launchers which use
>     a lot
>      > of TLS?
>      >
>      > ---
>      >
>      > I do not understand why the complicated dynamic lookup of
>      > __pthread_get_minstack. What is the advantage of this two-level
>     approach
>      > vs just dlsym'ing the pointer like we do for all other dynamically
>      > loaded functions? Also, would this not cause problems when
>     building for
>      > other libc implementations, e.g. muslc?
>      >
>      > And why RTLD_NEXT, why not RTLD_DEFAULT? Any special reason?
>     Since that
>      > seems to be how we normally load dynamic symbols from the libc.
>      >
>      > About the fkt pointer, in contrast to David I would prefer to see
>     it is
>      > a pointer, not a function, so your first name which I assume was
>     "p_.."
>      > I would have liked more.
>      >
>      > ---
>      >
>      > +  size_t min_stack_size = align_up(pthread_get_minstack(&attr),
>      > vm_page_size());
>      >
>      > Why the align up? The only condition we have to meet is for the
>     total
>      > stack size to be page aligned. On some platforms page size can be
>     large
>      > (I think we have 64K pages on ppc) and TLS size could be smaller
>     than
>      > that. Could you move the align up do right before the call to
>      > pthread_attr_setstacksize() (and probably remove the assert there)?
>      >
>      > ---
>      >
>      > What is the logic behind giving attr to pthread_get_minstack? If
>     Florian
>      > is right and this:
>      >
>     http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034593.html
> 
>      > is the implementation, that parameter seems not to be used. In that
>      > case, just to be sure, I would feed it a dummy attr.
>      >
>      > ---
>      >
>      > I am not sure, but is adding the TLS sized buffer to just small
>     thread
>      > stacks the right thing to do? Why it is okay for thread stacks
>     slightly
>      > larger than the cut-off point to live without this additional space?
>      > Will they not as well be hurt by the large TLS subtraction from
>     their
>      > stack sizes?
>      >
>      > Thank you!
>      >
>      > Cheers, Thomas
>      >
>      >
>      >
>      >
>      > On Tue, Jun 11, 2019 at 12:30 AM Jiangli Zhou
>     <jianglizhou at google.com <mailto:jianglizhou at google.com>
>      > <mailto:jianglizhou at google.com <mailto:jianglizhou at google.com>>>
>     wrote:
>      >
>      >     Hi David and Florian,
>      >
>      >     This the a follow up of the discussion regarding the TLS
>     (Thread Local
>      >     Storage) issue:
>      >
>     http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034459.html.
>      >     The following is an updated patch with your suggestions
>     incorporated:
>      >
>      >     1) Use __pthread_get_minstack
>      >     2) By default only adjust the thread stack size if the
>      >     minstack-to-stacksize ratio is >= 10%
>      >     3) Introduce a new command-line option (Linux only),
>      >     AdjustWithMinStackAt, which can be used to change the default 10%
>      >     ratio for triggering stack size adjustment
>      >
>      >     CSR: https://bugs.openjdk.java.net/browse/JDK-8225498
>      >     webrev: http://cr.openjdk.java.net/~jiangli/8225035/webrev.00/
>      >     bug: https://bugs.openjdk.java.net/browse/JDK-8225035
>      >
>      >     Best regards,
>      >
>      >     Jiangli
>      >
>