RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object

Thu Nov 23 10:14:09 UTC 2023

On Thu, 23 Nov 2023 00:48:19 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> In the rewrites made for:
>> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls`
>> 
>> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object.
>> 
>> The provided tests provoke this assert form:
>> * the JNI thread detach code
>> * thread dumping with locked monitors, and
>> * the JVMTI GetOwnedMonitorInfo API.
>> 
>> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak.
>> 
>> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`.
>> 
>> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls.
>> 
>> Test: the written tests with and without the fix. Tier1-Tier3, so far.
>
>>  Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code
> 
> I had not realized that. It explains some confusion in  a separate issue I had been looking into! It is important that these monitors are exposed and unlocked at detach time, otherwise it also messes up the `held_monitor_count`.
> 
>> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure
> 
> I think we may need to make that code tolerate the absence of an object.
> 
>> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor.
> 
> I think we probably should expose this to be accurate, but I think this needs investigation on the JVMTI side to ensure that the null entry is tolerated okay. So a separate RFE to handle this would be fine.
> 
> Thanks

While addressing @dholmes-ora's comments about the tests I found that the JNI implementation was incorrect and caused a failure, that seems to prevent the thread detach, and then the Java layer thread dumping caught the thread dumping bug. I'm going to see if I can split the various test cases so that they can be tested individually. Don't look too closely at the current test until it has been updated.

> test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 94:
> 
>> 92: 
>> 93:   // Let the GC clear the weak reference to the object.
>> 94:   system_gc(env);
> 
> AFAIK there is no guarantee that one call to `System.gc()` will suffice to clear the weakRef. We tend use a loop with a few iterations in other tests, or use a WhiteBox method to achieve it. In my testing I used the finalizer to observe that the objects had been finalized but even then, and with a loop, I did not always see them collected with G1.

ZGC will clear the weak reference. G1 clears the weak reference, except for in a few specific situations. I've verified that G1 clears this weak reference, so I don't think it would be worth making this test longer or more complicated.

> test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 104:
> 
>> 102:   // source of at least two bugs:
>> 103:   // - When the object reference in the monitor was made weak, the code
>> 104:   //   didn't unlock the monitor, leaving it lingering in the system.
> 
> Suggestion:
> 
> // - When the object reference in the monitor was cleared, the monitor
> //   iterator code would skip it, preventing it from being unlocked when
> //   the owner thread detached, leaving it lingering in the system.
> 
> the original made it sound to me like the code that cleared the reference (i.e. the GC) was expected to do the unlocking.

Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824116115
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403168662
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403169323