RFR for JDK-8030284 TEST_BUG: intermittent StackOverflow in RMI bench/serial test
stuart.marks at oracle.com
Tue Dec 24 17:28:04 UTC 2013
Aha! Mystery solved. So it's not that the error started occurring after the
conversion from shell script to Java, it merely started /*appearing*/ after this
conversion. This makes sense now. Thanks for doing this investigation. As if we
needed any more reason to convert shell script tests to Java....
You had previously posted a one-line patch to raise the stack size, so I'll just
go ahead and push that for you.
On 12/23/13 9:59 PM, Tristan Yan wrote:
> Hi Stuart
> I did an experiment that set a small thread stack size using the -Xss228k or
> -Xss512k. The result is surprised that jtreg reports the test passed. Although
> I can see the StackOverflowError showing in the log even when I set thread
> stack size as 512k . So the problem is old shell script doesn't report the
> error even there is a StackOverflowError.
> Thank you.
> On 12/21/2013 08:01 AM, Stuart Marks wrote:
>> On 12/19/13 8:29 PM, David Holmes wrote:
>>> If you were always one frame from the end then it is not so surprising that a
>>> simple change pushes you past the limit :) Try running the shell test with
>>> additional recursive loads and see when it fails.
>> David doesn't seem surprised, but I guess I still am. :-)
>> Tristan, do you think you could do some investigation here, regarding the
>> shell script based test's stack consumption? Run the shell-based test with
>> some different values for -Xss and see how low you have to set it before it
>> generates a stack overflow.
>>>> It's also kind of strange that in the two stack traces I've seen (I
>>>> think I managed to capture only one in the bug report though) the
>>>> StackOverflowError occurs on loading exactly the 50th class. Since we're
>>>> observing intermittent behavior (happens sometimes but not others) the
>>>> stack size is apparently variable. Since it's variable I'd expect to see
>>>> it failing at different times, possibly the 49th or 48th recursive
>>>> classload, not just the 50th. And in such circumstances, do we know what
>>>> the default stack size is?
>>> Classloading consumes a reasonable chunk of stack so if the variance elsewhere
>>> is quite small it is not that surprising that the test always fails on the 50th
>>> class. I would not expect run-to-run stack usage variance to be high unless
>>> there is some random component to the test.
>> Hm. There should be no variance in stack usage coming from the test itself. I
>> believe the test does the same thing every time.
>> The thing I'm concerned about is whether the Java-based test is doing
>> something different from the shell-based test, because of the execution
>> environment (jtreg or other). We may end up simply raising the stack limit
>> anyway, but I still find it hard to believe that the shell-based test was
>> consistently just a few frames shy of a stack overflow.
>> The failure is intermittent; we've seen it twice in JPRT (our internal
>> build&test system). Possible sources of the intermittency are from the
>> different machines on which the test executes. So environmental factors could
>> be at play. How does the JVM determine the default stack size? Could
>> different test runs on different machines be running with different stack sizes?
>> Another source of variance is the JIT. I believe JIT-compiled code consumes
>> stack differently from interpreted code. At least, I've seen differences in
>> stack usage between -Xint and -Xcomp runs, and in the absence of these
>> options (which means -Xmixed, I guess) the results sometimes vary
>> unpredictably. I guess this might have to do with when the JIT compiler
>> decides to kick in.
>> This test does perform a bunch of iterations, so JIT compilation could be a
>>>> I don't know if you were able to reproduce this issue. If you were, it
>>>> would be good to understand in more detail exactly what's going on.
>>> FWIW there was a recent change in 7u to bump up the number of stack shadow
>>> in hotspot as "suddenly" StackOverflow tests were crashing instead of
>>> StackOverflowError. So something started using more stack in a way the caused
>>> there to not be enough space to process a stackoverflow properly. Finding the
>>> exact cause can be somewhat tedious.
>> This seems like a different problem. We're seeing actual StackOverflowErrors,
>> not crashes. Good to look out for this, though.
More information about the core-libs-dev