RFR: 8273373: Zero: Handle thread stack sizes with a generic Linux code

Thomas Stuefe stuefe at openjdk.java.net
Wed Sep 8 07:46:06 UTC 2021


On Mon, 6 Sep 2021 09:45:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See the bug for more discussion.
> 
> Additional testing:
>  - [x] Linux Zero x86_64 passes most gtests now (there are unrelated failures)
>  - [x] Linux Zero x86_32 passes most gtests now (there are unrelated failures)

I got curious and analyzed this further. Let's see if I got this correctly.

When we create the VM on the primordial thread, we locate the stack boundaries for the primordial thread in `os::Linux::capture_initial_stack()`: 

There, we basically look for the boundaries of the primordial thread stack in `/proc/pid/maps`. We manage to locate the correct VMA for the primordial stack. But then we do not simply take the boundary of that VMA as stack boundary. Instead, we take the high VMA address, subtract an assumed stack size (calculated from -Xss and RLIMIT_STACK), then take that address as stack bottom (`_initial_thread_stack_bottom`). 

Seems that that assumed stack size can be smaller than the real size of the stack, resulting in an offset between the real stack VMA low address and `_initial_thread_stack_bottom`. 

That in itself would be not a problem, but in `StackOverflow::create_stack_guard_pages()` we place the guard pages at an address which we calculate from `os::current_stack_base()`. `os::current_stack_base()` calls `current_stack_region()`, which on Zero just calls `pthread_attr_getstack(3)`, ignoring the previously captured `_initial_thread_stack_bottom`.

In `os::pd_create_stack_guard_pages()` we then essentially use both values -  `os::current_stack_base()` and `_initial_thread_stack_bottom` - to calculate the arguments to `get_stack_commited_bottom()`. The resulting assert is caused by mixing those conflicting values.

Bottomline, we should either be using `pthread_attr_getstack(3)` or our `_initial_thread_stack_(bottom|size)` but not mix them since they may differ. Therefore the proposed patch is correct. The non-zero version of `current_stack_region()` in os_linux.cpp handles the primordial thread stack correctly.

---

There are a number of additional beauty spots, like the fact that in `get_stack_commited_bottom()` the `pages` variable is 32bit unsigned. So the assert itself is random in that it depends on whether or not the overflow produces a negative overflow.

Also, I don't understand the logic in `os::pd_create_stack_guard_pages()` where the "fallback" is to call mincore() with what basically would be a zero size? Since both `addr` and `stack_extend` are the stack bottom.

---

I would probably rename the issue to "Cannot invoke JVM in primordial threads on Zero".

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/5376


More information about the hotspot-runtime-dev mailing list