RFR: 8225035: Thread stack size issue caused by large TLS size
Jiangli Zhou
jianglizhou at google.com
Sat Jun 15 23:09:36 UTC 2019
I'm with David on all points.
Yes, 10% ratio cut-off is not back by real data. Using my testing
linux environment (with glibc 2.24) as an example, 28K (7-page) is
returned by __pthread_get_minstack when no explicit __thread variable
is declared in the executable. With the 10% ratio, only threads with
requested stack size <= 280K would be adjusted to include the extra
size. That means most of the threads are not affected by the patch by
default. When there are larger on-stack TLS blocks, the minstack value
returned by __pthread_get_minstack increases, the cut-off stack size
for adjustment also increases. That may not solve all cases, but it's
rather a best-effort attempt, particularly with the cut-off ratio
being changeable with the command-line option.
The on/off switch also starts to sound appealing to me now, because
it's simple. When it's enabled, it covers all cases. When it's
disabled, it provides the current behavior.
Any additional inputs and comments (or a vote on the two approaches)
are highly appreciated. I'll update the patch once we finalize on the
approach. Thank you all!
Best regards,
Jiangli
On Thu, Jun 13, 2019 at 10:48 PM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Thomas,
>
> On 14/06/2019 3:40 am, Thomas Stüfe wrote:
> > Hi David,
> >
> > On Thu, Jun 13, 2019 at 2:06 AM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> > Hi Thomas,
> >
> > > I am not sure, but is adding the TLS sized buffer to just small
> > thread
> > > stacks the right thing to do? Why it is okay for thread stacks
> > > slightly larger than the cut-off point to live without this
> > additional
> > > space? Will they not as well be hurt by the large TLS
> > subtraction from
> > > their stack sizes?
> >
> > This was discussed previously:
> >
> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034460.html
> >
> > The premise is that the smaller the requested stack the more likely
> > unexpected TLS usage will lead to stackoverflows. If I set 32K stacks
> > and TLS uses 16K I likely have a problem at runtime. If I have 2M
> > stacks
> > and TLS uses 16K I probably don't really care. So the proposal is to
> > use
> > the ratio of TLS-size to requested-stack-size as the trigger -
> > configurable by the end user. That way for the expected case of large
> > stacks with small TLS, and we're not running on the edge of
> > stackoverflow, there will be no change in stack usage or general
> > behaviour. In contrast really small stacks will be expanded by the TLS
> > size to avoid stackoverflow - with some user control. In the latter
> > case
> > if TLS can be accommodated without overflow then the user can increase
> > the ratio to avoid the extra stack usage. If TLS-size causes overflow
> > then the user has no choice if they want their application to run.
> >
> > It's not perfect but nothing is.
> >
> >
> > I understand now, thank you.
> >
> > I have two thoughts:
> >
> > - So we are worrying about disrupting customers with carefully tuned
> > thread stacks, which suddenly may get thread start errors after applying
> > this patch since they need more stack. I can see that. But I would have
> > expected this effect higher especially in small stack scenarios - e.g.
> > customer has lots of threads with carefully tuned mini stacks. For
> > larger thread stacks I would have thought this patch to not matter so much.
>
> It's not so much that the thread stacks are tuned, but that the number
> of threads (with whatever stack sizes) is tuned. If the customer has
> tuned a system so that they can crank up the threads knob to a given
> value and are fully utilizing the system, then causing all stacks to
> grow by the TLS amount will cause them to get resource exhaustion errors.
>
> If the system is under utilized in terms of memory it doesn't matter if
> we take more memory for additional stack space.
>
> > I am not against leaving threads with large stacks out, it just makes
> > the matter a bit more complicated and the switch more difficult to
> > understand than a simple "-XX:+AddStackReserveForTLS" or similar.
>
> Such an on/off switch was an earlier consideration. But given the
> concerns about impact on existing tuned apps such a switch would have to
> be off by default so that users opt-in after they see failures. That was
> my initial position. The use of a ratio to trigger the stack increase
> comes from the bugzilla discussion on how glibc might solve the problem.
>
> The 10% is out-of-thin-air, we don't have any data to really support it.
> The larger the cut-off the more risk of failure with small stacks. The
> smaller the cut-off the more chance of applying it to all threads and
> using too much memory.
>
> As I said this is not perfect, nor can it be. This is a function of
> number of threads, thread stack size, thread stack slack, TLS size, and
> total memory. You can always construct a scenario where a given setting
> won't work for it.
>
>
> > - I am unsure about the sudden cutoff point. May have understood wrong,
> > but lets assume we have 6K TLS to accommodate, for which
> > pthread_minstack would return 6k + some 4K pages (2?) = 14K.
> >
> > requested stack size to real stack size, after the proposed correction,
> > with the default cut off for the thread stack correction of 10%:
> > 100K -> 114K
> > 120K -> 134K (10% point)
> > 121K -> 121K (sudden dip? why?)
> >
> > My point is that the penalty for TLS we have to pay is as big at 10.1%
> > as it is at 10%, so maybe we do not want a sudden cutoff?
>
> The cut-off is unfair but also simple. Can you suggest some other way?
>
> > ---
> >
> > Another point, but probably just a work item for us at SAP, is that we
> > should test the patch on platforms with large page sizes. I am not sure
> > what pthread_minstack would return e.g on linux ppc64 with 64K pages.
>
> Thats a good point. We may have to factor in whether or not large pages
> are in use.
>
> Thanks,
> David
>
> >
> > Thanks, Thomas
> >
> > Cheers,
> > David
> > -----
> >
> > On 13/06/2019 6:45 am, Thomas Stüfe wrote:
> > > Hi Jiangli,
> > >
> > > after learning a lot of interesting things today about TLS :-) I
> > took a
> > > look at your change.
> > >
> > > First another question, just to be sure I understand this: From what
> > > think I understood about TLS, and from the way you wrote your test
> > > (writing an own launcher and all) I assume the problem only
> > happens for
> > > TLS of modules which are loaded at program start, right? So
> > Libjvm.so
> > > and dynamically loaded JNI libs can allocate as many __thread
> > space as
> > > they like, they would not go on the thread stacks? And this is
> > only a
> > > problem when you embed the libjvm into other launchers which use
> > a lot
> > > of TLS?
> > >
> > > ---
> > >
> > > I do not understand why the complicated dynamic lookup of
> > > __pthread_get_minstack. What is the advantage of this two-level
> > approach
> > > vs just dlsym'ing the pointer like we do for all other dynamically
> > > loaded functions? Also, would this not cause problems when
> > building for
> > > other libc implementations, e.g. muslc?
> > >
> > > And why RTLD_NEXT, why not RTLD_DEFAULT? Any special reason?
> > Since that
> > > seems to be how we normally load dynamic symbols from the libc.
> > >
> > > About the fkt pointer, in contrast to David I would prefer to see
> > it is
> > > a pointer, not a function, so your first name which I assume was
> > "p_.."
> > > I would have liked more.
> > >
> > > ---
> > >
> > > + size_t min_stack_size = align_up(pthread_get_minstack(&attr),
> > > vm_page_size());
> > >
> > > Why the align up? The only condition we have to meet is for the
> > total
> > > stack size to be page aligned. On some platforms page size can be
> > large
> > > (I think we have 64K pages on ppc) and TLS size could be smaller
> > than
> > > that. Could you move the align up do right before the call to
> > > pthread_attr_setstacksize() (and probably remove the assert there)?
> > >
> > > ---
> > >
> > > What is the logic behind giving attr to pthread_get_minstack? If
> > Florian
> > > is right and this:
> > >
> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034593.html
> >
> > > is the implementation, that parameter seems not to be used. In that
> > > case, just to be sure, I would feed it a dummy attr.
> > >
> > > ---
> > >
> > > I am not sure, but is adding the TLS sized buffer to just small
> > thread
> > > stacks the right thing to do? Why it is okay for thread stacks
> > slightly
> > > larger than the cut-off point to live without this additional space?
> > > Will they not as well be hurt by the large TLS subtraction from
> > their
> > > stack sizes?
> > >
> > > Thank you!
> > >
> > > Cheers, Thomas
> > >
> > >
> > >
> > >
> > > On Tue, Jun 11, 2019 at 12:30 AM Jiangli Zhou
> > <jianglizhou at google.com <mailto:jianglizhou at google.com>
> > > <mailto:jianglizhou at google.com <mailto:jianglizhou at google.com>>>
> > wrote:
> > >
> > > Hi David and Florian,
> > >
> > > This the a follow up of the discussion regarding the TLS
> > (Thread Local
> > > Storage) issue:
> > >
> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034459.html.
> > > The following is an updated patch with your suggestions
> > incorporated:
> > >
> > > 1) Use __pthread_get_minstack
> > > 2) By default only adjust the thread stack size if the
> > > minstack-to-stacksize ratio is >= 10%
> > > 3) Introduce a new command-line option (Linux only),
> > > AdjustWithMinStackAt, which can be used to change the default 10%
> > > ratio for triggering stack size adjustment
> > >
> > > CSR: https://bugs.openjdk.java.net/browse/JDK-8225498
> > > webrev: http://cr.openjdk.java.net/~jiangli/8225035/webrev.00/
> > > bug: https://bugs.openjdk.java.net/browse/JDK-8225035
> > >
> > > Best regards,
> > >
> > > Jiangli
> > >
> >
More information about the hotspot-runtime-dev
mailing list