RFR: 8252921: NMT overwrite memory type for region assert when building dynamic archive [v3]
Zhengyu Gu
zgu at openjdk.java.net
Thu Sep 17 12:50:20 UTC 2020
On Thu, 17 Sep 2020 03:38:41 GMT, Ioi Lam <iklam at openjdk.org> wrote:
>>> > > > > This code in
>>> > > > > [virtualMemoryTracker.cpp](https://github.com/openjdk/jdk/blob/9a7dcdcdbad34e061a8988287fe691abfd4df305/src/hotspot/share/services/virtualMemoryTracker.cpp#L350)
>>> > > > > still looks buggy to me: ```
>>> > > > > if (reserved_rgn->same_region(base_addr, size)) {
>>> > > > > reserved_rgn->set_call_stack(stack);
>>> > > > > reserved_rgn->set_flag(flag);
>>> > > > > return true;
>>> > > > > } else {
>>> > > > > assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>>> > > > >
>>> > > > > // Overlapped reservation.
>>> > > > > // It can happen when the regions are thread stacks, as JNI
>>> > > > > // thread does not detach from VM before exits, and leads to
>>> > > > > // leak JavaThread object
>>> > > > > if (reserved_rgn->flag() == mtThreadStack) {
>>> > > > > guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>>> > > > > // Overwrite with new region
>>> > > > > ```
>>> > > > >
>>> > > > >
>>> > > > > Why is the "Overlapped reservation" fix not done when the two regions are exactly the same?
>>> > > >
>>> > > >
>>> > > > The same region branch was to deal with different scenario: recursive reservation, where os::reserve() ->
>>> > > > pd_reserve() -> os::reserve()
>>> > > > > If a JNI thread has exited without detaching, and its stack happens to be picked for a future memory reservation of the
>>> > > > > exact size, I think we will get the same assert as in this bug report.
>>> > > >
>>> > > >
>>> > > > Right, there is possibility. We can put the same guarantee there.
>>> > >
>>> > >
>>> > > I think the order of the checks should be changed:
>>> > > ```
>>> > > if (reserved_rgn->overlap_region(base_addr, size)) {
>>> > > check for overlapped reservation
>>> > > }
>>> > > if (reserved_rgn->same_region(base_addr, size)) {
>>> > > check for recursive reservation
>>> > > ```
>>> > >
>>> > >
>>> > > In fact, if the code had been written this way, this bug would not have happened.
>>> >
>>> >
>>> > Well, I don't know which bug you refer to, but overlap_region() covers same_region(), so reversing order is a no go.
>>>
>>> I am talking the original assert reported by [JDK-8252921](https://bugs.openjdk.java.net/browse/JDK-8252921)
>>>
>>> ```
>>> assert((flag() == mtNone || flag() == f)) failed: Overwrite memory type for region
>>> [0x000000a41cf40000-0x000000a41d040000), 3->23. ```
>>>
>>> We get there because we have a old stack region which matches exactly as a newly reserved region. This case should also
>>> be handled by this code
>>> ```
>>> // Overlapped reservation.
>>> // It can happen when the regions are thread stacks, as JNI
>>> // thread does not detach from VM before exits, and leads to
>>> ```
>>>
>>> The current code assumes that if you have an exact match, it must be from a recursive reveration. This is wrong.
>>
>> In that case, it just hides the real problem. Because the thread stack region can be remapped to any other types and
>> maybe different size.
>
>> > > > > > This code in
>> > > > > > [virtualMemoryTracker.cpp](https://github.com/openjdk/jdk/blob/9a7dcdcdbad34e061a8988287fe691abfd4df305/src/hotspot/share/services/virtualMemoryTracker.cpp#L350)
>> > > > > > still looks buggy to me: ```
>> > > > > > if (reserved_rgn->same_region(base_addr, size)) {
>> > > > > > reserved_rgn->set_call_stack(stack);
>> > > > > > reserved_rgn->set_flag(flag);
>> > > > > > return true;
>> > > > > > } else {
>> > > > > > assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>> > > > > >
>> > > > > > // Overlapped reservation.
>> > > > > > // It can happen when the regions are thread stacks, as JNI
>> > > > > > // thread does not detach from VM before exits, and leads to
>> > > > > > // leak JavaThread object
>> > > > > > if (reserved_rgn->flag() == mtThreadStack) {
>> > > > > > guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>> > > > > > // Overwrite with new region
>> > > > > > ```
>> > > > > >
>> > > > > >
>> > > > > > Why is the "Overlapped reservation" fix not done when the two regions are exactly the same?
>> > > > >
>> > > > >
>> > > > > The same region branch was to deal with different scenario: recursive reservation, where os::reserve() ->
>> > > > > pd_reserve() -> os::reserve()
>> > > > > > If a JNI thread has exited without detaching, and its stack happens to be picked for a future memory reservation of the
>> > > > > > exact size, I think we will get the same assert as in this bug report.
>> > > > >
>> > > > >
>> > > > > Right, there is possibility. We can put the same guarantee there.
>> > > >
>> > > >
>> > > > I think the order of the checks should be changed:
>> > > > ```
>> > > > if (reserved_rgn->overlap_region(base_addr, size)) {
>> > > > check for overlapped reservation
>> > > > }
>> > > > if (reserved_rgn->same_region(base_addr, size)) {
>> > > > check for recursive reservation
>> > > > ```
>> > > >
>> > > >
>> > > > In fact, if the code had been written this way, this bug would not have happened.
>> > >
>> > >
>> > > Well, I don't know which bug you refer to, but overlap_region() covers same_region(), so reversing order is a no go.
>> >
>> >
>> > I am talking the original assert reported by [JDK-8252921](https://bugs.openjdk.java.net/browse/JDK-8252921)
>> > ```
>> > assert((flag() == mtNone || flag() == f)) failed: Overwrite memory type for region
>> > [0x000000a41cf40000-0x000000a41d040000), 3->23. ```
>> >
>> >
>> > We get there because we have a old stack region which matches exactly as a newly reserved region. This case should also
>> > be handled by this code ```
>> > // Overlapped reservation.
>> > // It can happen when the regions are thread stacks, as JNI
>> > // thread does not detach from VM before exits, and leads to
>> > ```
>> >
>> >
>> > The current code assumes that if you have an exact match, it must be from a recursive reveration. This is wrong.
>>
>> In that case, it just hides the real problem. Because the thread stack region can be remapped to any other types and
>> maybe different size.
>
> Let me clarify. I think your fix is good, but it only fixes part of the problems found by this bug.
>
> I am requesting that the following should **also** be fixed, so that the behavior is the same whether the new region
> partially overlaps with an old stack, or is exactly the same size as an old stack.
> Here's my proposed additional fix. I think your fix alone will not solve the problem where a JNI-attached thread was
> not properly detached before the thread exits.
> if (reserved_rgn == NULL) {
> VirtualMemorySummary::record_reserved_memory(size, flag);
> return _reserved_regions->add(rgn) != NULL;
> } else {
> if (reserved_rgn->overlap_region(base_addr, size) && reserved_rgn->flag() == mtThreadStack) {
> // Overlapped reservation.
> // It can happen when the regions are thread stacks, as JNI
> // thread does not detach from VM before exits, and leads to
> // leak JavaThread object
> guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
> // Overwrite with new region
>
> // Release old region
> VirtualMemorySummary::record_uncommitted_memory(reserved_rgn->committed_size(), reserved_rgn->flag());
> VirtualMemorySummary::record_released_memory(reserved_rgn->size(), reserved_rgn->flag());
>
> // Add new region
> VirtualMemorySummary::record_reserved_memory(rgn.size(), flag);
>
> *reserved_rgn = rgn;
> return true;
> } else if (reserved_rgn->same_region(base_addr, size)) {
> reserved_rgn->set_call_stack(stack);
> reserved_rgn->set_flag(flag);
> return true;
> } else {
> assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>
> // CDS mapping region.
> Okay I see the problem now - sorry for the confusion. The issue is when
> printing the other threads. And that means the issue is in the GC thread
> management code as if the GC worker has terminated it should have
> already removed itself from the set of worker threads. Though as this is
> a crash, and we are not at a safepoint, then there could still be a race
> there. :( But note that if we actually deleted the thread we could still
> crash accessing it at that point. So this just seems to be a rare race
> condition that we are unlikely to hit - and if we did then the secondary
> assertion failure would not be too bad IMO.
>
> But the modified assertion still makes no sense as Thread::current() is
> not relevant AFAICS. I would prefer to keep the assertion to guard the
> regular case of accidental misuse of a thread before the stack_base()
> has been set.
>
The reported crash actually is at a safepoint, but it is irrelevant.
The problem is that 'Thread' object can outlive thread itself.
"G1 Main Marker" thread is not a worker, but a ConcurrentGCThread, a member of G1CollectedHeap.
<code>G1ConcurrentMarkThread* _cm_thread;</code>
and during error report
<code>Universe::heap()->gc_threads_do(&print_closure);</code>
For example, G1 always tries to report its internal states:
<pre><code>
void G1CollectedHeap::gc_threads_do(ThreadClosure* tc) const {
workers()->threads_do(tc);
tc->do_thread(_cm_thread); <====
_cm->threads_do(tc);
_cr->threads_do(tc);
tc->do_thread(_service_thread);
if (G1StringDedup::is_enabled()) {
G1StringDedup::threads_do(tc);
}
}
</code></pre>
So it is more likely to hit the secondary assertion than you described.
The modified assertion only suggests that you may query thread states from other thread, the result is the best effort
and it keeps the semantics unchanged when queries from 'current' thread.
Thanks.
-------------
PR: https://git.openjdk.java.net/jdk/pull/185
More information about the hotspot-runtime-dev
mailing list