RFR: 8252921: NMT overwrite memory type for region assert when building dynamic archive [v3]

David Holmes david.holmes at oracle.com
Thu Sep 17 04:13:58 UTC 2020


On 17/09/2020 1:45 pm, David Holmes wrote:
> On 17/09/2020 11:18 am, Zhengyu Gu wrote:
>> On Thu, 17 Sep 2020 00:46:33 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
>>>>> src/hotspot/share/runtime/thread.hpp line 762:
>>>>>> 760:  public:
>>>>>> 761:   // Stack overflow support
>>>>>> 762:   address stack_base() const           { return _stack_base; }
>>>>>
>>>>>
>>>>> Why did you remove the assertion? We want the assertion in general 
>>>>> to ensure there are no improper uses of stack_base().
>>>>
>>>>
>>>> We now reset NonJavaThread's stack base and size,  so _stack_base == 
>>>> NULL is possible, e.g. when generating hs_err
>>>> report, as we should now see empty stack for "G1 Main Marker" thread 
>>>> in original bug report.
>>>
>>> Resetting the stack base and size seems pointless given we follow that
>>> immediately with
>>>
>>> Thread::clear_thread_current();
>>>
>>> And once there is no thread-current for this thread its base and size
>>> can never be queried, so I don't think this change, or the change to the
>>> assertion in stack_base() is needed.
>>>
>>>> You do have a valid point for the assertion. I think following 
>>>> assertion is more precise:
>>>> assert(_stack_base != NULL || Thread::current() != NULL, "Sanity 
>>>> check");
>>>> What you think?
>>>
>>> That assertion doesn't make sense to me as the stack_base() being
>>> queried doesn't have to belong to the current thread.
>>
>> In regular case, yes. But there is an exception, during error 
>> reporting (Threads::print_on_error()).
>> In the original bug report, "G1 Main Marker" thread already exited and 
>> should not have stack, but hs_err file shows:
>>
>> 0x000000a47fdee080 ConcurrentGCThread "G1 Main Marker" [stack: 
>> 0x000000a41cf40000,0x000000a41d040000] [id=17068]
> 
> In the original bug report the crash is in the VMThread! Subsequent 
> snippets from Ioi show something happening with a G1 thread but I can't 
> see the actual hs_err files for those.
> 
> But if hs_err shows a stack it means that _thread was non-NULL, which 
> means that Thread::current() was non-NULL, which means we haven't 
> executed the Thread::clear_thread_current() nor cleared the stack_base 
> or size. The only time there could be a problem is if we were to crash 
> in Thread::clear_thread_current() itself - which is unlikely enough that 
> I am not concerned about that one extreme case versus the change being 
> made.

Okay I see the problem now - sorry for the confusion. The issue is when 
printing the other threads. And that means the issue is in the GC thread 
management code as if the GC worker has terminated it should have 
already removed itself from the set of worker threads. Though as this is 
a crash, and we are not at a safepoint, then there could still be a race 
there. :( But note that if we actually deleted the thread we could still 
crash accessing it at that point. So this just seems to be a rare race 
condition that we are unlikely to hit - and if we did then the secondary 
assertion failure would not be too bad IMO.

But the modified assertion still makes no sense as Thread::current() is 
not relevant AFAICS. I would prefer to keep the assertion to guard the 
regular case of accidental misuse of a thread before the stack_base() 
has been set.

David
-----

> 
> David
> -----
> 
> 
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/185
>>


More information about the hotspot-runtime-dev mailing list