RFR for JDK-8030284 TEST_BUG: intermittent StackOverflow in RMI bench/serial test
david.holmes at oracle.com
Fri Dec 20 04:29:25 UTC 2013
On 20/12/2013 1:06 PM, Stuart Marks wrote:
> On 12/18/13 10:25 PM, Tristan Yan wrote:
>> Hi Everyone
>> Please help to review the fix for JDK-8030284.
>> This is a one line fix that add -Xss to prevent StackOverflowError.
> Hi, I guess this might make sense, but this still seems like a mystery
> to me.
> Do we have any evidence that this test hit the stack limit but otherwise
> is behaving identically? It does load 50 classes recursively. It seems
> strange that this test apparently ran for years without problems as a
> shell test, but when run in a jtreg environment, adding the additional
> six or so stack frames for jtreg would have pushed it over the limit.
If you were always one frame from the end then it is not so surprising
that a simple change pushes you past the limit :) Try running the shell
test with additional recursive loads and see when it fails.
> It's also kind of strange that in the two stack traces I've seen (I
> think I managed to capture only one in the bug report though) the
> StackOverflowError occurs on loading exactly the 50th class. Since we're
> observing intermittent behavior (happens sometimes but not others) the
> stack size is apparently variable. Since it's variable I'd expect to see
> it failing at different times, possibly the 49th or 48th recursive
> classload, not just the 50th. And in such circumstances, do we know what
> the default stack size is?
Classloading consumes a reasonable chunk of stack so if the variance
elsewhere is quite small it is not that surprising that the test always
fails on the 50th class. I would not expect run-to-run stack usage
variance to be high unless there is some random component to the test.
> I don't know if you were able to reproduce this issue. If you were, it
> would be good to understand in more detail exactly what's going on.
FWIW there was a recent change in 7u to bump up the number of stack
shadow pages in hotspot as "suddenly" StackOverflow tests were crashing
instead of triggering StackOverflowError. So something started using
more stack in a way the caused there to not be enough space to process a
stackoverflow properly. Finding the exact cause can be somewhat tedious.
More information about the core-libs-dev