RFR: 8225035: Thread stack size issue caused by large TLS size

Thomas Stüfe thomas.stuefe at gmail.com
Wed Jun 26 14:40:17 UTC 2019


Hi Jiangli,

On Tue, Jun 25, 2019 at 8:35 PM Jiangli Zhou <jianglizhou at google.com> wrote:

> Since the approach of using a simple on/off switch to control the
> stack size adjustment seems to be the cleanest, I've updated the CSR
> and webrev to go with that. Please help review both of the following:
>
> CSR: https://bugs.openjdk.java.net/browse/JDK-8225498
> webrev: http://cr.openjdk.java.net/~jiangli/8225035/webrev.01/
>
> I've also summarized all approaches brought up in the mailing list
> discussion and added to the comment section in
> https://bugs.openjdk.java.net/browse/JDK-8225035.
>
> A note on passing a dummy 'attr' to __pthread_get_minstack (regarding
> Thomas' comment): I've rechecked with that, and did get the correct
> minstack size. I think it's safer to pass a valid 'attr' however.
>
> <snip>

http://cr.openjdk.java.net/~jiangli/8225035/webrev.01/src/hotspot/os/linux/globals_linux.hpp.udiff.html

Proposed text slightly tweaked:

"Increase thread stack size to include space for native TLS."

I would not mention anything about pthread_min_stack in this text; that is
an implementation detail.

---------------

http://cr.openjdk.java.net/~jiangli/8225035/webrev.01/src/hotspot/os/linux/os_linux.cpp.udiff.html

Since this issue was quite hard to understand, and easy to misunderstand
(at least by me :), I'd like to tweak this comment a bit:

- clarify that this is compiler level TLS we talk about (space for __thread
variables), not Posix TLS nor Java TLS
- making clear that we call an undocumented glibc function which we hope
will return something sensible - a size sufficiently large to hold the TLS
area while small enough to not be a bother.
- I am not sure that mentioning _dl_get_tls_static_info is really useful
here - if people need information about rejected implementation
alternatives, they can read the mail thread

Here a proposal - but feel free to use it as you see please:

<proposal>
On Linux, the glibc places TLS areas (space needed to accommodate compiler
level TLS -> __thread variables) on the thread stack. This decreases the
stack size actually available to threads.

For large TLS sizes, this may cause threads to malfunction due to
insufficient stack space. This is a well-known issue in glibc:
http://sourceware.org/bugzilla/show_bug.cgi?id=11787.

As a workaround, we call __pthread_get_minstack(), which is an undocumented
but assumed-stable glibc API which we hope returns a memory size large
enough to hold the TLS area. We then increase the stack size the user
requested by this size.

Due to compatibility concerns, this size adjustment is opt-in and
controlled via AdjustStackSizeForTLS.
</proposal>

---

I agree with Florian about the other points (potential name clashes with
glibc and not weak linking and using RTLD_DEFAULT).

---

About the attr pointer to pthread_get_minstack(): Since this function is
undocumented, we have no idea what it does to attr. We know that the
implementation Florian posted seems not to do anything with it, but is that
true for all glibc implementations, over all past versions we want to
function on, and on all platforms?

And if in some variant it actually needs an attr pointer, what does it do
with it? Does it have expectations about the content of attr - e.g. the
order by which we fill it? E.g. should we fill in the stack size beforehand?

Since I cannot answer the question, I would feel safer by passing a dummy
pthread_attr structure. I am paranoid and would even zero it out beforehand
and assert afterwards that it is unchanged (all debug only of course), to
get notified quickly when we encounter a glibc version which decides to use
this parameter, but maybe that's overdoing it.

I really do not like this calling of an undocumented function, with an
undocumented parameter...

---

The rest looks okay in this file. Not your change, but that
truncation-aware stack size adjustment seems a bit overkill :) - I have no
idea how this could ever get triggered..

-------------

About the test: Could you please add a comment on why we need this
complicated embedding scenario and why we cannot just put the __thread
array into a native JNI library?

Thank you,

Thomas


More information about the hotspot-runtime-dev mailing list