RFR (preliminary): 8202772: "NMT erroneously assumes thread stack boundaries to be page aligned"
Thomas Stüfe
thomas.stuefe at gmail.com
Thu Jun 7 12:57:03 UTC 2018
Hi Zhengyu,
On Thu, Jun 7, 2018 at 2:42 PM, Zhengyu Gu <zgu at redhat.com> wrote:
> Hi Thomas,
>
>
> On 06/07/2018 06:49 AM, Thomas Stüfe wrote:
>>
>> Hi all,
>>
>> I could use some input/advice for:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8202772
>>
>> This is my - yet incomplete - fix attempt:
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8202772-NMT-erroneously-assumes-thread-stack-boundaries-to-be-page-aligned/current_work/webrev/
>>
>> ---------------
>>
>> Problem:
>>
>> NMT assumes thread stack boundaries to be page aligned. This is the
>> case on most OSes, but does not necessarily have to be. POSIX
>> certainly does not guarantee any alignment for pthread stack
>> boundaries. Implementors of pthread libraries are free to provide
>> stack memory as they see fit. Since some form of commit management of
>> thread stacks makes sense, and that has to be page size based, usually
>> thread stack boundaries happen to be on page borders, but this is not
>> a requirement.
>>
>> On AIX, stack boundaries (which we get reported by the pthread
>> library) are not aligned to page size. For the stack end, this does
>> not matter: Thread->current_stack_{base|size} in the VM is, after all,
>> only our own notion of the real thread stack size. We can move up that
>> imaginary border in our head to the next larger page boundary with
>> impunity - since this only affects the part of the thread stack not
>> yet used. We just loose a bit of thread stack range.
>>
>> In fact, on AIX, we do just that - align up the end of the stack to
>> the next page boundary, to be able place VM guard pages.
>
>
> NMT virtual memory tracking is under assumption that the memory address and
> range are page aligned. I guess we made wrong assumption about thread stack
> from very beginning, which does not seem to belong to this category now.
>
Well lets see. If problems keep coming up, it may make sense to remove
thread stacks from NMT.
But on the other hand, accounting for thread stacks is very useful,
especially when committed size is accounted correctly. So, I'll spend
some more time trying to make this work.
>>
>> However, wrt the thread stack base the matter is different. This part
>> of the stack is already in use by the time we initialize the VM. So,
>> we cannot just move our notion of the stack base up or down as we
>> please (well, maybe we could, but we do not want to). That means that
>> on AIX, thread stack base can be located in the middle of a page.
>>
>> Now, NMT assumes stack base to be page aligned. If not, it will assert
>> or crash when printing the NMT report.
>
>
> Will this work for you? if we round down base address and size to page
> aligned in MemTracker::record/release_thread_stack() if they are not page
> aligned. Of course, it will lost some accuracy.
Yes, that was my original idea: rounding down stack base to next page
start. I was just worried that that would trigger other errors down
the road.
Next escalation would be to disable thead stack accounting just on
AIX. Or, platforms where thread stacks have not-page-aligned
boundaries. Or, dynamically on a by-thread base.
Are thread stack regions merged in NMT? So, two threads laying side by
side, could they be merged to one region with mtThreadStack attribute?
..Thomas
>
>
> Thanks,
>
> -Zhengyu
>
>
>>
>> My first attempt at fixing this (see above webrev) was to feed NMT a
>> corrected version of the thread stack size - just the page-aligned
>> inner portion of the stack - that way we loose a bit fidelity in NMT
>> thread stack accounting, but at least we do not crash. That makes
>> runtime errors go away, but there is a gtest which stubbornly refuses
>> to heal.
>>
>> See CommittedVirtualMemoryTest
>> (test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp): I admit
>> I do not fully understand this test. It seems to record the current
>> threads stack base and size - ok - and then query the virtual regions
>> as perceived by NMT, expecting that the stack top is at the end of a
>> committed region. But even without the matter of unaligned stack base,
>> could it not be that virtual regions in NMT are fused, e.g. if
>> multiple thread stacks are placed next to each other? So, I am not
>> sure the test if correct.
>>
>> Would be nice if someone with more NMT knowledge could comment.
>>
>> --
>>
>> Please note: Since I do most of my development on Linux, I modified
>> the stack base in the preliminary patch a bit to emulate the same
>> error on Linux I get on AIX. Because AIX is a terrible platform to
>> debug on :)
>>
>> Note that the VM usually is fine with unaligned stack bases - NMT is
>> the only part I know of which has problems with that.
>>
>> --
>>
>> Thanks a lot,
>>
>> Best Regards, Thomas
>>
>
More information about the hotspot-runtime-dev
mailing list