RFR: 8252921: NMT overwrite memory type for region assert when building dynamic archive [v4]
Zhengyu Gu
zgu at openjdk.java.net
Thu Sep 17 15:54:56 UTC 2020
On Thu, 17 Sep 2020 12:47:13 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
>>> > > > > > This code in
>>> > > > > > [virtualMemoryTracker.cpp](https://github.com/openjdk/jdk/blob/9a7dcdcdbad34e061a8988287fe691abfd4df305/src/hotspot/share/services/virtualMemoryTracker.cpp#L350)
>>> > > > > > still looks buggy to me: ```
>>> > > > > > if (reserved_rgn->same_region(base_addr, size)) {
>>> > > > > > reserved_rgn->set_call_stack(stack);
>>> > > > > > reserved_rgn->set_flag(flag);
>>> > > > > > return true;
>>> > > > > > } else {
>>> > > > > > assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>>> > > > > >
>>> > > > > > // Overlapped reservation.
>>> > > > > > // It can happen when the regions are thread stacks, as JNI
>>> > > > > > // thread does not detach from VM before exits, and leads to
>>> > > > > > // leak JavaThread object
>>> > > > > > if (reserved_rgn->flag() == mtThreadStack) {
>>> > > > > > guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>>> > > > > > // Overwrite with new region
>>> > > > > > ```
>>> > > > > >
>>> > > > > >
>>> > > > > > Why is the "Overlapped reservation" fix not done when the two regions are exactly the same?
>>> > > > >
>>> > > > >
>>> > > > > The same region branch was to deal with different scenario: recursive reservation, where os::reserve() ->
>>> > > > > pd_reserve() -> os::reserve()
>>> > > > > > If a JNI thread has exited without detaching, and its stack happens to be picked for a future memory reservation of the
>>> > > > > > exact size, I think we will get the same assert as in this bug report.
>>> > > > >
>>> > > > >
>>> > > > > Right, there is possibility. We can put the same guarantee there.
>>> > > >
>>> > > >
>>> > > > I think the order of the checks should be changed:
>>> > > > ```
>>> > > > if (reserved_rgn->overlap_region(base_addr, size)) {
>>> > > > check for overlapped reservation
>>> > > > }
>>> > > > if (reserved_rgn->same_region(base_addr, size)) {
>>> > > > check for recursive reservation
>>> > > > ```
>>> > > >
>>> > > >
>>> > > > In fact, if the code had been written this way, this bug would not have happened.
>>> > >
>>> > >
>>> > > Well, I don't know which bug you refer to, but overlap_region() covers same_region(), so reversing order is a no go.
>>> >
>>> >
>>> > I am talking the original assert reported by [JDK-8252921](https://bugs.openjdk.java.net/browse/JDK-8252921)
>>> > ```
>>> > assert((flag() == mtNone || flag() == f)) failed: Overwrite memory type for region
>>> > [0x000000a41cf40000-0x000000a41d040000), 3->23. ```
>>> >
>>> >
>>> > We get there because we have a old stack region which matches exactly as a newly reserved region. This case should also
>>> > be handled by this code ```
>>> > // Overlapped reservation.
>>> > // It can happen when the regions are thread stacks, as JNI
>>> > // thread does not detach from VM before exits, and leads to
>>> > ```
>>> >
>>> >
>>> > The current code assumes that if you have an exact match, it must be from a recursive reveration. This is wrong.
>>>
>>> In that case, it just hides the real problem. Because the thread stack region can be remapped to any other types and
>>> maybe different size.
>>
>> Let me clarify. I think your fix is good, but it only fixes part of the problems found by this bug.
>>
>> I am requesting that the following should **also** be fixed, so that the behavior is the same whether the new region
>> partially overlaps with an old stack, or is exactly the same size as an old stack.
>> Here's my proposed additional fix. I think your fix alone will not solve the problem where a JNI-attached thread was
>> not properly detached before the thread exits.
>> if (reserved_rgn == NULL) {
>> VirtualMemorySummary::record_reserved_memory(size, flag);
>> return _reserved_regions->add(rgn) != NULL;
>> } else {
>> if (reserved_rgn->overlap_region(base_addr, size) && reserved_rgn->flag() == mtThreadStack) {
>> // Overlapped reservation.
>> // It can happen when the regions are thread stacks, as JNI
>> // thread does not detach from VM before exits, and leads to
>> // leak JavaThread object
>> guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>> // Overwrite with new region
>>
>> // Release old region
>> VirtualMemorySummary::record_uncommitted_memory(reserved_rgn->committed_size(), reserved_rgn->flag());
>> VirtualMemorySummary::record_released_memory(reserved_rgn->size(), reserved_rgn->flag());
>>
>> // Add new region
>> VirtualMemorySummary::record_reserved_memory(rgn.size(), flag);
>>
>> *reserved_rgn = rgn;
>> return true;
>> } else if (reserved_rgn->same_region(base_addr, size)) {
>> reserved_rgn->set_call_stack(stack);
>> reserved_rgn->set_flag(flag);
>> return true;
>> } else {
>> assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>>
>> // CDS mapping region.
>
>> Okay I see the problem now - sorry for the confusion. The issue is when
>> printing the other threads. And that means the issue is in the GC thread
>> management code as if the GC worker has terminated it should have
>> already removed itself from the set of worker threads. Though as this is
>> a crash, and we are not at a safepoint, then there could still be a race
>> there. :( But note that if we actually deleted the thread we could still
>> crash accessing it at that point. So this just seems to be a rare race
>> condition that we are unlikely to hit - and if we did then the secondary
>> assertion failure would not be too bad IMO.
>>
>> But the modified assertion still makes no sense as Thread::current() is
>> not relevant AFAICS. I would prefer to keep the assertion to guard the
>> regular case of accidental misuse of a thread before the stack_base()
>> has been set.
>>
> The reported crash actually is at a safepoint, but it is irrelevant.
> The problem is that 'Thread' object can outlive thread itself.
>
> "G1 Main Marker" thread is not a worker, but a ConcurrentGCThread, a member of G1CollectedHeap.
> <code>G1ConcurrentMarkThread* _cm_thread;</code>
> and during error report
> <code>Universe::heap()->gc_threads_do(&print_closure);</code>
> For example, G1 always tries to report its internal states:
> <pre><code>
> void G1CollectedHeap::gc_threads_do(ThreadClosure* tc) const {
> workers()->threads_do(tc);
> tc->do_thread(_cm_thread); <====
> _cm->threads_do(tc);
> _cr->threads_do(tc);
> tc->do_thread(_service_thread);
> if (G1StringDedup::is_enabled()) {
> G1StringDedup::threads_do(tc);
> }
> }
> </code></pre>
>
> So it is more likely to hit the secondary assertion than you described.
>
> The modified assertion only suggests that you may query thread states from other thread, the result is the best effort
> and it keeps the semantics unchanged when queries from 'current' thread.
> Thanks.
Updated to address JNI detach thread issue. Ran tier1, vmTestbase_nsk_monitoring, vmTestbase_nsk_jdi and
vmTestbase_nsk_jdwp with NMT on.
-------------
PR: https://git.openjdk.java.net/jdk/pull/185
More information about the hotspot-runtime-dev
mailing list