Code review request: three native memory tracking related bugs

Zhengyu Gu zhengyu.gu at oracle.com
Tue Jul 17 06:40:44 PDT 2012


http://cr.openjdk.java.net/~zgu/7181986/webrev.01/
>
> src/share/vm/services/memSnapshot.cpp
>
> I don't understand your memory-management strategy in the MemSnapshot 
> code. In the constructor you don't check if any of the allocations are 
> successful. You check for null and bypass the action in some places ( 
> eg merge() wrt _staging_area), but assert that it is not null in 
> others (eg. promote()), and don't check at all in others (eg 
> print_snapshot_stats()). It is far from clear if those latter 
> functions will never be reached if some of those values are null. Also 
> if _lock is null, you will simply not lock! - which doesn't seem right.
>
As a requirement, NMT is a secondary functionality, which means that it 
should not cause VM to exit when it encounters error.

In MemTracker::start() method, it uses 'nothrow new' to instantiate 
MemSnapshot and performs out of memory checking, which checks 
MemSnapshot object and all allocations inside constructor (snapshot == 
NULL || snapshot->out_of_memory()), if OOM is encountered, it shuts down 
NMT. In short, when NMT state >= started, snapshot lock != NULL.

merge() merges recorders into _staging_area and _staging_area is 
expandable, so OOM checking is needed.  promote() promotes _staging_area 
into snapshot, can only be reached when staging phase is successfully 
completed, and _staging_area is read-only during promotion, that's why 
OOM checking on _staging_area is not necessary, while any insertion 
failures to snapshot indicates OOM condition, that also results shutting 
down NMT.

Again, if _lock == null, NMT can never transition to started state, so 
locking can never happen.


>> http://cr.openjdk.java.net/~zgu/7182543/webrev.01/
>
> It is difficult to gauge the full implications of using 
> ThreadCritical. I would not be surprised is there is a potential 
> problem with assertion failures while holding the ThreadCritical. But 
> this is little different to other debug option combinations that can 
> cause secondary failures/hangs during error reporting.
>
ThreadCritical is also used to block none JavaThread, if it is the case, 
we have to rethink what lock is proper.

Thanks,

-Zhengyu


> David
> -----
>
>>
>>
>> Thanks,
>>
>> -Zhengyu
>>
>> On 7/13/2012 3:43 PM, Zhengyu Gu wrote:
>>> 7181989: NMT ON: Assertion failure when NMT checks thread's native
>>> CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181989
>>> Webrev: http://cr.openjdk.java.net/~zgu/7181989/webrev.00/
>>>
>>> We try to assert Thread's stack base to ensure that
>>> Thread::record_stack_base_and_size() to record native stack to NMT,
>>> but there is scenario that the thread fails to start, which no native
>>> stack is created and should not be asserted.
>>>
>>> 7181986: NMT ON: Assertion failure when running jdi
>>> ExpiredRequestDeletionTest
>>> CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181986
>>> Webrev: http://cr.openjdk.java.net/~zgu/7181986/webrev.00/
>>>
>>> This is a racing condition when C/C++ runtime exit handler is ran
>>> before NMT worker thread exits. The exit handler calls _query_lock's
>>> destructor while NMT worker thread is still holding it. The fix is to
>>> make _query_lock a heap object, instead of static object, but the
>>> drawback is that, it does not seem that _query_lock can be safely
>>> deleted.
>>>
>>> Also, I reassigned MemSnaphot lock and query lock, so they participate
>>> in the deadlock detection logic.
>>>
>>> 7182543: NMT ON: Aggregate a few NMT related bugs
>>> CRhttp://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7182543
>>> Webrev: http://cr.openjdk.java.net/~zgu/7182543/webrev.00/
>>>
>>> - Fixed generations_in_used calculation
>>> - Wait MemRecorder instance count to drop to zero before completely
>>> shutdown NMT
>>> - Added assertion for JavaThread in _thread_blocked state.
>>>
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>


More information about the hotspot-dev mailing list