RFR: 8252921: NMT overwrite memory type for region assert when building dynamic archive [v3]
Zhengyu Gu
zgu at openjdk.java.net
Thu Sep 17 01:18:22 UTC 2020
On Thu, 17 Sep 2020 00:46:33 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
>>> > This code in
>>> > [virtualMemoryTracker.cpp](https://github.com/openjdk/jdk/blob/9a7dcdcdbad34e061a8988287fe691abfd4df305/src/hotspot/share/services/virtualMemoryTracker.cpp#L350)
>>> > still looks buggy to me: ```
>>> > if (reserved_rgn->same_region(base_addr, size)) {
>>> > reserved_rgn->set_call_stack(stack);
>>> > reserved_rgn->set_flag(flag);
>>> > return true;
>>> > } else {
>>> > assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>>> >
>>> > // Overlapped reservation.
>>> > // It can happen when the regions are thread stacks, as JNI
>>> > // thread does not detach from VM before exits, and leads to
>>> > // leak JavaThread object
>>> > if (reserved_rgn->flag() == mtThreadStack) {
>>> > guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>>> > // Overwrite with new region
>>> > ```
>>> >
>>> >
>>> > Why is the "Overlapped reservation" fix not done when the two regions are exactly the same?
>>>
>>> The same region branch was to deal with different scenario: recursive reservation, where os::reserve() ->
>>> pd_reserve() -> os::reserve()
>>> > If a JNI thread has exited without detaching, and its stack happens to be picked for a future memory reservation of the
>>> > exact size, I think we will get the same assert as in this bug report.
>>>
>>> Right, there is possibility. We can put the same guarantee there.
>>
>> I think the order of the checks should be changed:
>>
>> if (reserved_rgn->overlap_region(base_addr, size)) {
>> check for overlapped reservation
>> }
>> if (reserved_rgn->same_region(base_addr, size)) {
>> check for recursive reservation
>> In fact, if the code had been written this way, this bug would not have happened.
>
>> > > This code in
>> > > [virtualMemoryTracker.cpp](https://github.com/openjdk/jdk/blob/9a7dcdcdbad34e061a8988287fe691abfd4df305/src/hotspot/share/services/virtualMemoryTracker.cpp#L350)
>> > > still looks buggy to me: ```
>> > > if (reserved_rgn->same_region(base_addr, size)) {
>> > > reserved_rgn->set_call_stack(stack);
>> > > reserved_rgn->set_flag(flag);
>> > > return true;
>> > > } else {
>> > > assert(reserved_rgn->overlap_region(base_addr, size), "Must be");
>> > >
>> > > // Overlapped reservation.
>> > > // It can happen when the regions are thread stacks, as JNI
>> > > // thread does not detach from VM before exits, and leads to
>> > > // leak JavaThread object
>> > > if (reserved_rgn->flag() == mtThreadStack) {
>> > > guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached");
>> > > // Overwrite with new region
>> > > ```
>> > >
>> > >
>> > > Why is the "Overlapped reservation" fix not done when the two regions are exactly the same?
>> >
>> >
>> > The same region branch was to deal with different scenario: recursive reservation, where os::reserve() ->
>> > pd_reserve() -> os::reserve()
>> > > If a JNI thread has exited without detaching, and its stack happens to be picked for a future memory reservation of the
>> > > exact size, I think we will get the same assert as in this bug report.
>> >
>> >
>> > Right, there is possibility. We can put the same guarantee there.
>>
>> I think the order of the checks should be changed:
>>
>> ```
>> if (reserved_rgn->overlap_region(base_addr, size)) {
>> check for overlapped reservation
>> }
>> if (reserved_rgn->same_region(base_addr, size)) {
>> check for recursive reservation
>> ```
>>
>> In fact, if the code had been written this way, this bug would not have happened.
>
> Well, I don't know which bug you refer to, but overlap_region() covers same_region(), so reversing order is a no go.
> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on
> [hotspot-runtime-dev](mailto:hotspot-runtime-dev at openjdk.java.net):_
> Hi Zhengyu,
>
> On 16/09/2020 9:28 pm, Zhengyu Gu wrote:
>
> > On Wed, 16 Sep 2020 02:49:45 GMT, David Holmes <dholmes at openjdk.org> wrote:
> > > > Thread stack is currently unregistered with NMT in Thread's destructor. Apparently, only Java thread invokes destructor
> > > > before thread exits. For NonJavaThread, e.g. ConcurrentGCThread, thread may exit while its "Thread" object continues
> > > > alive, therefore, its thread stack is still "alive" from NMT perspective. Once thread exits, the virtual memory for the
> > > > thread stack can be reserved again, that confused NMT. The solution is to move thread stack unregistration code to
> > > > post_run() method.
> > >
> > >
> > > src/hotspot/share/runtime/thread.hpp line 762:
> > > > 760: public:
> > > > 761: // Stack overflow support
> > > > 762: address stack_base() const { return _stack_base; }
> > >
> > >
> > > Why did you remove the assertion? We want the assertion in general to ensure there are no improper uses of stack_base().
> >
> >
> > We now reset NonJavaThread's stack base and size, so _stack_base == NULL is possible, e.g. when generating hs_err
> > report, as we should now see empty stack for "G1 Main Marker" thread in original bug report.
>
> Resetting the stack base and size seems pointless given we follow that
> immediately with
>
> Thread::clear_thread_current();
>
> And once there is no thread-current for this thread its base and size
> can never be queried, so I don't think this change, or the change to the
> assertion in stack_base() is needed.
>
> > You do have a valid point for the assertion. I think following assertion is more precise:
> > assert(_stack_base != NULL || Thread::current() != NULL, "Sanity check");
> > What you think?
>
> That assertion doesn't make sense to me as the stack_base() being
> queried doesn't have to belong to the current thread.
In regular case, yes. But there is an exception, during error reporting (Threads::print_on_error()).
In the original bug report, "G1 Main Marker" thread already exited and should not have stack, but hs_err file shows:
0x000000a47fdee080 ConcurrentGCThread "G1 Main Marker" [stack: 0x000000a41cf40000,0x000000a41d040000] [id=17068]
Thanks,
-Zhengyu
>
> Thanks,
> David
> -----
-------------
PR: https://git.openjdk.java.net/jdk/pull/185
More information about the hotspot-runtime-dev
mailing list