RFR: 8225035: Thread stack size issue caused by large TLS size
Jiangli Zhou
jianglizhou at google.com
Wed Jun 12 03:55:37 UTC 2019
Hi David,
On Tue, Jun 11, 2019 at 7:25 PM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> On 12/06/2019 3:51 am, Jiangli Zhou wrote:
> > Hi David,
> >
> > On Tue, Jun 11, 2019 at 12:56 AM David Holmes <david.holmes at oracle.com> wrote:
> >>
> >> Hi Jiangli,
> >>
> >> Some initial feedback ...
> >>
> >> On 11/06/2019 8:29 am, Jiangli Zhou wrote:
> >>> Hi David and Florian,
> >>>
> >>> This the a follow up of the discussion regarding the TLS (Thread Local
> >>> Storage) issue:
> >>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034459.html.
> >>> The following is an updated patch with your suggestions incorporated:
> >>>
> >>> 1) Use __pthread_get_minstack
> >>> 2) By default only adjust the thread stack size if the
> >>> minstack-to-stacksize ratio is >= 10%
> >>> 3) Introduce a new command-line option (Linux only),
> >>> AdjustWithMinStackAt, which can be used to change the default 10%
> >>> ratio for triggering stack size adjustment
> >>
> >> I don't understand why the flag range is 0.0 to 50.0. If 0.1 = 10% then
> >> 50 = 500%, which means the minstack is 5x greater than the actual
> >> requested stack! The flag has to be limited between 0 and 100% AFAICS
> >> else the requested stack size is too small to accommodate the "min
> >> stack" and the attempt to create the thread will presumably fail. ??
> >
> > The minstack size is added on top of the current calculated size and
> > the resulting value is passed to pthread_attr_setstacksize before the
> > thread creation. We will be able to create a thread with stack size
> > that's large enough. I initially thought about using [0%, 100%] range,
> > which is 0 to 1 as well. However, that doesn't apply to the cases
> > where the minstack size is greater then the requested size. Not sure
> > how rare those cases are in reality, it is fairly easy to create a
> > test case with a minstack > request_size. In the added jtreg test, the
> > minstack is 540k with the TLS values.
> >
> > I picked 50.0, but I'm not too certain about using it as the upper
> > bound. We probably want to use a larger value to cover more cases.
>
> My point is that if TLS > requested stack then without enabling this
> workaround the thread creation will presumably fail - right? And it
> makes no sense to me to say "use the workaround if TLS is 5x greater
> than normal stack but don't if its only 4x greater" as both cases are
> equivalent from the perspective of "can the thread run" (answer: no!).
>
> I think this only needs to be a simple percentage flag: if TLS-size is >
> X% of requested stack size then add TLS-size. So 0% would always enable
> it and 100% would disable it until TLS-size was equal to requested stack
> size. I don't think it make sense to consider the case where TLS >
> requested stack size.
My reasoning was based on the ratio of
minstack_to_stack-size-without-minstack-adjustment. We would have to
consider the cases where the minstack > the requested size (without
adjustment), which is > 100%. It's a bit awkward, I admit.
Using the ratio of minstack_to_total-size, we can use the simple
percentage flag as you suggested, with 0% would always enable it and
100% would disable it. Sounds good.
> >>
> >> 827 static void pthread_get_minstack_init() {
> >> 828 // The code below uses the static symbol if available.
> >> 829 // If not, it will use dlsym to search for the symbol at runtime.
> >> 830 p_pthread_get_minstack = &__pthread_get_minstack;
> >>
> >> I don't understand this part. If at runtime the "static" symbol doesn't
> >> exist, won't this generate an error from the loader and prevent the VM
> >> from starting?
> >
> > I see Florian has provided detailed information on this question (thanks!).
> >
> >>
> >> Style nit: p_pthread_get_minstack
> >>
> >> Can you drop the "p" prefix and just use _pthread_get_minstack similar
> >> to other dynamically determined functions please.
> >
> > Done.
> >
> >>
> >> + min_stack_size = (*p_pthread_get_minstack)(attr);
> >>
> >> When you have a function pointer you don't need to dereference it to
> >> invoke it, you can just do:
> >>
> >> min_stack_size = p_pthread_get_minstack(attr);
> >
> > Dropped dereferencing.
> >
> >>
> >> + if (p_pthread_get_minstack) {
> >>
> >> Style nit: Please avoid implicit booleans and check != NULL.
> >
> > Done.
> >
> >>
> >> I don't quite see what the test is doing - why does the Java code run
> >> "echo" ??
> >
> > The TLS issue can be easily demonstrated by starting a process using
> > ProcessBuilder. It could be any valid command that can be passed to
> > ProcessBuilder. "echo" is inherited from the original test case
> > attached to JDK-8130425.
>
> Sorry I'm missing something here. What has an external process got to do
> with TLS stealing stack from a newly created thread ??
Starting a ProcessBuilder causes the process reaper thread being
created. The reaper thread can fail with StackOverflow in one of the
failure mode (as the originally reported issue) with certain TLS
sizes, because it has small stack size. With JDK 13, the
REAPER_DEFAULT_STACKSIZE is 128K. With JDK 8, it is 32K. Using the
process reaper thread is easy to demonstrate the problem.
>
> >> I expected to see different values of AdjustWithMinStackAt
> >> being used and threads then checking whether their stacks have been
> >> adjusted or not; and in some cases to get errors. ??
> >
> > It's a bit tricky, as we may not know the exact failure behavior
> > caused by insufficient stack space in different environments...
>
> Nevertheless I expect that a test in this area would be combining
> different amounts of TLS with different values of the flag to check
> whether thread's can be created and run. At the moment I don't get what
> this test is actually testing - what constitutes a pass and what
> constitutes a fail?
In the added jtreg test, it checks the output from the started
process. The test passes when the output matches with the expected
string (that means the process is started successfully).
The set up of the test is capable of reporting failures by verifying
the output from the started process. The part that I'm not too sure is
what happens if it's a hang (or other strange failure mode that I'm
currently not aware of). I guess the test would fail with a timeout if
it's a hang.
Best regard,
Jiangli
>
> Thanks,
> David
>
> >>
> >>> CSR: https://bugs.openjdk.java.net/browse/JDK-8225498
> >>
> >> I'm still thinking about the right name, type and range for the flag,
> >> which will impact the CSR.
> >
> > Thanks! Please let me know your suggestions, I'll be more than happy
> > to incorporate them.
> >
> > Best regards,
> > Jiangli
> >
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>> webrev: http://cr.openjdk.java.net/~jiangli/8225035/webrev.00/
> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8225035
> >>>
> >>> Best regards,
> >>>
> >>> Jiangli
> >>>
More information about the hotspot-runtime-dev
mailing list