RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005

Fri Sep 6 06:24:52 UTC 2024

On Wed, 4 Sep 2024 21:05:10 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r
 egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005.
> 
> The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code).
> 
> I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7.
> 
> Thanks,
> Patricio

src/hotspot/share/runtime/continuationFreezeThaw.cpp line 289:

> 287:     address last_touched_page = watermark - StackOverflow::stack_shadow_zone_size();
> 288:     size_t pages_to_touch = align_up(watermark - new_sp, page_size) / page_size;
> 289:     while (pages_to_touch--) {

Suggestion:

    while (pages_to_touch-- > 0) {

src/hotspot/share/runtime/continuationFreezeThaw.cpp line 293:

> 291:       *last_touched_page = 0;
> 292:     }
> 293:     thread->stack_overflow_state()->set_shadow_zone_growth_watermark(new_sp);

I'm not familiar with the details of this stack management code and am unclear about the role of the `shadow_zone_growth_watermark` here. The banging code in `os::map_stack_shadow_pages` doesn't access it.

test/jdk/java/lang/Thread/virtual/BigStackChunk.java line 47:

> 45:         int i6 = i5 + 1;
> 46:         int i7 = i6 + 1;
> 47:         long ll = 2*(long)i1;

Suggestion:

        long ll = 2 * (long)i1;

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1746543746
PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1746568373
PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1746554210