RFR: 8252921: NMT overwrite memory type for region assert when building dynamic archive [v4]

Zhengyu Gu zgu at openjdk.java.net
Thu Sep 17 15:54:56 UTC 2020


On Thu, 17 Sep 2020 12:47:13 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>>> > > > > > This code in
>>> > > > > > [virtualMemoryTracker.cpp](https://github.com/openjdk/jdk/blob/9a7dcdcdbad34e061a8988287fe691abfd4df305/src/hotspot/share/services/virtualMemoryTracker.cpp#L350)
>>> > > > > > still looks buggy to me: ```
>>> > > > > >     if (reserved_rgn->same_region(base_addr, size)) {
>>> > > > > >       reserved_rgn->set_call_stack(stack);
>>> > > > > >       reserved_rgn->set_flag(flag);
>>> > > > > >       return true;
>>> > > > > >     } else {
>>> > > > > >       assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>>> > > > > > 
>>> > > > > >       // Overlapped reservation.
>>> > > > > >       // It can happen when the regions are thread stacks, as JNI
>>> > > > > >       // thread does not detach from VM before exits, and leads to
>>> > > > > >       // leak JavaThread object
>>> > > > > >       if (reserved_rgn->flag() == mtThreadStack) {
>>> > > > > >         guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>>> > > > > >         // Overwrite with new region
>>> > > > > > ```
>>> > > > > > 
>>> > > > > > 
>>> > > > > > Why is the "Overlapped reservation" fix not done when the two regions are exactly the same?
>>> > > > > 
>>> > > > > 
>>> > > > > The same region branch was to deal with different scenario: recursive reservation, where os::reserve() ->
>>> > > > > pd_reserve() -> os::reserve()
>>> > > > > > If a JNI thread has exited without detaching, and its stack happens to be picked for a future memory reservation of the
>>> > > > > > exact size, I think we will get the same assert as in this bug report.
>>> > > > > 
>>> > > > > 
>>> > > > > Right, there is possibility. We can put the same guarantee there.
>>> > > > 
>>> > > > 
>>> > > > I think the order of the checks should be changed:
>>> > > > ```
>>> > > > if (reserved_rgn->overlap_region(base_addr, size)) {
>>> > > >     check for overlapped reservation
>>> > > > }
>>> > > > if (reserved_rgn->same_region(base_addr, size)) {
>>> > > >    check for recursive reservation
>>> > > > ```
>>> > > > 
>>> > > > 
>>> > > > In fact, if the code had been written this way, this bug would not have happened.
>>> > > 
>>> > > 
>>> > > Well, I don't know which bug you refer to, but overlap_region() covers same_region(), so reversing order is a no go.
>>> > 
>>> > 
>>> > I am talking the original assert reported by [JDK-8252921](https://bugs.openjdk.java.net/browse/JDK-8252921)
>>> > ```
>>> > assert((flag() == mtNone || flag() == f)) failed: Overwrite memory type for region
>>> > [0x000000a41cf40000-0x000000a41d040000), 3->23. ```
>>> > 
>>> > 
>>> > We get there because we have a old stack region which matches exactly as a newly reserved region. This case should also
>>> > be handled by this code ```
>>> > // Overlapped reservation.
>>> > // It can happen when the regions are thread stacks, as JNI
>>> > // thread does not detach from VM before exits, and leads to
>>> > ```
>>> > 
>>> > 
>>> > The current code assumes that if you have an exact match, it must be from a recursive reveration. This is wrong.
>>> 
>>> In that case, it just hides the real problem. Because the thread stack region can be remapped to any other types and
>>> maybe different size.
>> 
>> Let me clarify. I think your fix is good, but it only fixes part of the problems found by this bug.
>> 
>> I am requesting that the following should **also** be fixed, so that the behavior is the same whether the new region
>> partially overlaps with an old stack, or is exactly the same size as an old stack.
>> Here's my proposed additional fix. I think your fix alone will not solve the problem where a JNI-attached thread was
>> not properly detached before the thread exits.
>>   if (reserved_rgn == NULL) {
>>     VirtualMemorySummary::record_reserved_memory(size, flag);
>>     return _reserved_regions->add(rgn) != NULL;
>>   } else {
>>     if (reserved_rgn->overlap_region(base_addr, size) && reserved_rgn->flag() == mtThreadStack) {
>>       // Overlapped reservation.
>>       // It can happen when the regions are thread stacks, as JNI
>>       // thread does not detach from VM before exits, and leads to
>>       // leak JavaThread object
>>       guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>>       // Overwrite with new region
>> 
>>       // Release old region
>>       VirtualMemorySummary::record_uncommitted_memory(reserved_rgn->committed_size(), reserved_rgn->flag());
>>       VirtualMemorySummary::record_released_memory(reserved_rgn->size(), reserved_rgn->flag());
>> 
>>       // Add new region
>>       VirtualMemorySummary::record_reserved_memory(rgn.size(), flag);
>> 
>>       *reserved_rgn = rgn;
>>       return true;
>>     } else if (reserved_rgn->same_region(base_addr, size)) {
>>       reserved_rgn->set_call_stack(stack);
>>       reserved_rgn->set_flag(flag);
>>       return true;
>>     } else {
>>       assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>> 
>>       // CDS mapping region.
>
>> Okay I see the problem now - sorry for the confusion. The issue is when
>> printing the other threads. And that means the issue is in the GC thread
>> management code as if the GC worker has terminated it should have
>> already removed itself from the set of worker threads. Though as this is
>> a crash, and we are not at a safepoint, then there could still be a race
>> there. :( But note that if we actually deleted the thread we could still
>> crash accessing it at that point. So this just seems to be a rare race
>> condition that we are unlikely to hit - and if we did then the secondary
>> assertion failure would not be too bad IMO.
>> 
>> But the modified assertion still makes no sense as Thread::current() is
>> not relevant AFAICS. I would prefer to keep the assertion to guard the
>> regular case of accidental misuse of a thread before the stack_base()
>> has been set.
>> 
> The reported crash actually is at a safepoint, but it is irrelevant.
> The problem is that 'Thread' object can outlive thread itself.
> 
> "G1 Main Marker" thread is not a worker, but a ConcurrentGCThread, a member of G1CollectedHeap.
> <code>G1ConcurrentMarkThread* _cm_thread;</code>
> and during error report
> <code>Universe::heap()->gc_threads_do(&print_closure);</code>
> For example, G1 always tries to report its internal states:
> <pre><code>
> void G1CollectedHeap::gc_threads_do(ThreadClosure* tc) const {
>   workers()->threads_do(tc);
>   tc->do_thread(_cm_thread);   <====
>   _cm->threads_do(tc);
>   _cr->threads_do(tc);
>   tc->do_thread(_service_thread);
>   if (G1StringDedup::is_enabled()) {
>     G1StringDedup::threads_do(tc);
>   }
> }
> </code></pre>
> 
> So it is more likely to hit the secondary assertion than you described.
> 
> The modified assertion only suggests that you may query thread states from other thread, the result is the best effort
> and it keeps the semantics unchanged when queries from 'current' thread.
> Thanks.

Updated to address JNI detach thread issue. Ran tier1, vmTestbase_nsk_monitoring, vmTestbase_nsk_jdi and
vmTestbase_nsk_jdwp with NMT on.

-------------

PR: https://git.openjdk.java.net/jdk/pull/185


More information about the hotspot-runtime-dev mailing list