RFR: Avoid jtreg test timeout in aarch64 due to a stackoverflow in process reaper
Jie He
github.com+10233373+jhe33 at openjdk.java.net
Fri May 8 02:43:19 UTC 2020
On Thu, 7 May 2020 16:24:11 GMT, Arthur Eubanks <aeubanks at openjdk.org> wrote:
> I read through https://groups.google.com/forum/#!topic/thread-sanitizer/RsPcxUXBokg (nice investigation btw!) but don't
> quite understand
> > and found the different GLIBC behaviors between x86 and aarch64 stack
> > allocation due to default stack size in openjdk. x86 will get the stack from
> > glibc cached stack because it matches the threshold to allocate a stack from
> > cached stack, but aarch64 not.
>
> Do you mean that on x86 the stack size of the thread is larger than requested because glibc happened to have something
> larger lying around to use? So we are currently getting lucky in x86 with stack sizes?
yes, but not exactly. x86 gets the stack from cached stack is another story.
Initially, when started to investigate the SOE failure, I noticed the different behaviors between x86 and aarch64. TSAN
increases the stack size to 384K, but x86 always could get a 1M stack, meanwhile, aarch64 couldn't. I thought it might
be the reason why no SOE on x86. In fact, it's not the root cause as you already know. even though x86 gets 384K stack
by bypassing the glibc allocation, it won't incur SOE in this case. However I have to take time to investigate the
glibc.
By default, stack size of x86 is 1M in openjdk, and aarch64 is 2M. I assume aarch64 will take more stack consumption
than x86 in most cases. sometimes glibc allocates the stack from cached stacks, it depends on if the requested stack
size is larger than 1/4 cached stack. here, 384K > 1/4 * 1M on x86, but not > 1/4 * 2M on aarch64.
anyway, I think the issue in TSAN also will impact the effective usable stack on x86, it could make easier to SOE even
though it doesn't happen in this case.
-------------
PR: https://git.openjdk.java.net/tsan/pull/8
More information about the tsan-dev
mailing list