strange behavior with stack overflow on windows

Thu Jun 6 03:08:57 PDT 2013

Hi Roland,

IMO, the problem here is a conflict between the hotspot and stack overflow detection and the Windows internal SOF detection:

Windows only uses reserved but only partially committed memory for its stacks. In order to detect when to commit more stack, it installs  a one-shot guard page (btw the same type of guard page that is used for the hotspot yellow and red zone) right at the edge of the currently commited stack zone. When a thread accesses this guard page an exception is thrown which Windows catches internally, commits more stack and re-establishes the one-shot guard page at the new edge of the commited zone. When Windows detects such an exception inside the _last 4 pages_ of a stack (I couldn't find any documentation for that on MSDN, I found this value from manually testing on several Windows machines with 4k stack pages) it throws a STACK_OVERFLOW_EXCEPTION.

This implies:
- If you only have 3 guard pages, a stack overflow will actually occur one page _ahead_ of the yellow zone the first time.
- If you have more than 4 guard pages they will have no use because Windows will interpret them as it's normal guard pages used for stack committing and will not throw a STACK_OVERFLOW_EXCEPTION until again the stack has reached the last 4 pages of the stack.

Does your problem also occur if you have 4 guard pages? Also, we'll have to test how Windows behaves for pages > 4k...

Regards,
Andreas

-----Original Message-----
From: hotspot-runtime-dev-bounces at openjdk.java.net [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin
Sent: Mittwoch, 5. Juni 2013 18:37
To: hotspot-runtime-dev at openjdk.java.net
Subject: strange behavior with stack overflow on windows

Runtime folks,

I'm investigating:
JDK-8015660 Test8009761.java "Failed: init recursive calls: 24. After deopt 25"
and I see a strange behavior on windows.

The test case calls an interpreted method recursively until a stack overflow and counts the number of invocations. Then it forces a compilation and a deoptimization. The test's goal is to check that a deoptimization rebuilds the interpreter frames correctly when there's one inlinee. It wasn't the case on sparc. So the test calls the interpreted method recursively again until a stack overflow and counts the number of invocations again.

It then checks the before and after counts and they should be the same. On windows (x64 at least), they are not the same. They differ by one, the second one being greater by one.

I looked at the stack layout and the faulting address and what I see is that the first access violation is triggered by an address in the page right above the yellow zone. For instance: stack guards are 3 pages from 0xdc20000 and the exception happens at 0xdc232a8. The second violation occurs in the yellow zone. If I add a third overflow to the test, then it happens in the yellow zone as well.

Is it a known issue?

I looked at the code and it looks like windows uses guard pages that are disabled on first violation. So maybe the windows thread creation code adds a page guard at the last page of the stack on thread creation but it's hard to tell from the documentation. I didn't find anything wrong in the hotspot code.

Does it look like a bug that's worth fixing?

Otherwise, I can change my test so that it triggers 3 overflows and only keep the invocation counts for the last 2.

Roland.