RFR (preliminary): 8202772: "NMT erroneously assumes thread stack boundaries to be page aligned"

Thomas Stüfe thomas.stuefe at gmail.com
Thu Jun 7 10:49:49 UTC 2018


Hi all,

I could use some input/advice for:

https://bugs.openjdk.java.net/browse/JDK-8202772

This is my - yet incomplete - fix attempt:

http://cr.openjdk.java.net/~stuefe/webrevs/8202772-NMT-erroneously-assumes-thread-stack-boundaries-to-be-page-aligned/current_work/webrev/

---------------

Problem:

NMT assumes thread stack boundaries to be page aligned. This is the
case on most OSes, but does not necessarily have to be. POSIX
certainly does not guarantee any alignment for pthread stack
boundaries. Implementors of pthread libraries are free to provide
stack memory as they see fit. Since some form of commit management of
thread stacks makes sense, and that has to be page size based, usually
thread stack boundaries happen to be on page borders, but this is not
a requirement.

On AIX, stack boundaries (which we get reported by the pthread
library) are not aligned to page size. For the stack end, this does
not matter: Thread->current_stack_{base|size} in the VM is, after all,
only our own notion of the real thread stack size. We can move up that
imaginary border in our head to the next larger page boundary with
impunity - since this only affects the part of the thread stack not
yet used. We just loose a bit of thread stack range.

In fact, on AIX, we do just that - align up the end of the stack to
the next page boundary, to be able place VM guard pages.

However, wrt the thread stack base the matter is different. This part
of the stack is already in use by the time we initialize the VM. So,
we cannot just move our notion of the stack base up or down as we
please (well, maybe we could, but we do not want to). That means that
on AIX, thread stack base can be located in the middle of a page.

Now, NMT assumes stack base to be page aligned. If not, it will assert
or crash when printing the NMT report.

My first attempt at fixing this (see above webrev) was to feed NMT a
corrected version of the thread stack size - just the page-aligned
inner portion of the stack - that way we loose a bit fidelity in NMT
thread stack accounting, but at least we do not crash. That makes
runtime errors go away, but there is a gtest which stubbornly refuses
to heal.

See CommittedVirtualMemoryTest
(test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp): I admit
I do not fully understand this test. It seems to record the current
threads stack base and size - ok - and then query the virtual regions
as perceived by NMT, expecting that the stack top is at the end of a
committed region. But even without the matter of unaligned stack base,
could it not be that virtual regions in NMT are fused, e.g. if
multiple thread stacks are placed next to each other? So, I am not
sure the test if correct.

Would be nice if someone with more NMT knowledge could comment.

--

Please note: Since I do most of my development on Linux, I modified
the stack base in the preliminary patch a bit to emulate the same
error on Linux I get on AIX. Because AIX is a terrible platform to
debug on :)

Note that the VM usually is fine with unaligned stack bases - NMT is
the only part I know of which has problems with that.

--

Thanks a lot,

Best Regards, Thomas


More information about the hotspot-runtime-dev mailing list