RFR: 8225035: Thread stack size issue caused by large TLS size

Jiangli Zhou jianglizhou at google.com
Sat Jun 15 19:32:23 UTC 2019


Hi Thomas,

Thanks a lot for spending the time going through all the information! The
questions and discussions certainly helped me to understand this further.

On Wed, Jun 12, 2019 at 1:46 PM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

> Hi Jiangli,
>
> after learning a lot of interesting things today about TLS :-) I took a
> look at your change.
>
> First another question, just to be sure I understand this: From what think
> I understood about TLS, and from the way you wrote your test (writing an
> own launcher and all) I assume the problem only happens for TLS of modules
> which are loaded at program start, right? So Libjvm.so and dynamically
> loaded JNI libs can allocate as many __thread space as they like, they
> would not go on the thread stacks? And this is only a problem when you
> embed the libjvm into other launchers which use a lot of TLS?
>

If my understanding is correct, TLS variables from dlopen'ed shared
objects, may be allocated lazily and do not go on the stack. But there are
usages where we want to statically link natives into the executable, in
which case I assume the TLS variables from the module would be part of the
static TLS block. I'll defer it to Florian for correction.



> ---
>
> I do not understand why the complicated dynamic lookup of
> __pthread_get_minstack. What is the advantage of this two-level approach vs
> just dlsym'ing the pointer like we do for all other dynamically loaded
> functions?
>

It can avoid the lookup if the symbol is already available. In my test
environment, the symbol is available.


> Also, would this not cause problems when building for other libc
> implementations, e.g. muslc?
>

Could you please give some examples on the potential problems?


>
> And why RTLD_NEXT, why not RTLD_DEFAULT? Any special reason? Since that
> seems to be how we normally load dynamic symbols from the libc.
>

The symbol handling code is inherited from the original patch. I don't know
if there was any specific reason.


>
> About the fkt pointer, in contrast to David I would prefer to see it is a
> pointer, not a function, so your first name which I assume was "p_.." I
> would have liked more.
>

Looking at the existing code in os_linux.cpp, using 'p_' doesn't appear to
be a convention as David pointed out. I don't have a strong preference on
the naming in this case...


>
> ---
>
> +  size_t min_stack_size = align_up(pthread_get_minstack(&attr),
> vm_page_size());
>
> Why the align up? The only condition we have to meet is for the total
> stack size to be page aligned. On some platforms page size can be large (I
> think we have 64K pages on ppc) and TLS size could be smaller than that.
> Could you move the align up do right before the call to
> pthread_attr_setstacksize() (and probably remove the assert there)?
>

Sounds good. Will do.


>
> ---
>
> What is the logic behind giving attr to pthread_get_minstack? If Florian
> is right and this:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034593.html
> is the implementation, that parameter seems not to be used. In that case,
> just to be sure, I would feed it a dummy attr.
>

I think I did experiment with a dummy attr initially (based on the same
reasoning as yours above) and was getting bogus minstack value, if I
remember correctly. I can re-look into that.


>
> ---
>
> I am not sure, but is adding the TLS sized buffer to just small thread
> stacks the right thing to do? Why it is okay for thread stacks slightly
> larger than the cut-off point to live without this additional space? Will
> they not as well be hurt by the large TLS subtraction from their stack
> sizes?
>

Some additional thoughts on top of Daivid's reply. Giving that TLS related
issue is not a broadly reported problem with common usages of today for
Java, using a tunable ratio (with a cut-off point as you described) is a
fine idea to address the problem while not affecting 'normal' cases.

Best regards,
Jiangli



>
> Thank you!
>
> Cheers, Thomas
>
>
>
>
> On Tue, Jun 11, 2019 at 12:30 AM Jiangli Zhou <jianglizhou at google.com>
> wrote:
>
>> Hi David and Florian,
>>
>> This the a follow up of the discussion regarding the TLS (Thread Local
>> Storage) issue:
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034459.html
>> .
>> The following is an updated patch with your suggestions incorporated:
>>
>> 1) Use __pthread_get_minstack
>> 2) By default only adjust the thread stack size if the
>> minstack-to-stacksize ratio is >= 10%
>> 3) Introduce a new command-line option (Linux only),
>> AdjustWithMinStackAt, which can be used to change the default 10%
>> ratio for triggering stack size adjustment
>>
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8225498
>> webrev: http://cr.openjdk.java.net/~jiangli/8225035/webrev.00/
>> bug: https://bugs.openjdk.java.net/browse/JDK-8225035
>>
>> Best regards,
>>
>> Jiangli
>>
>


More information about the hotspot-runtime-dev mailing list