RFR(L): 8235795: replace monitor list mux{Acquire,Release}(&gListLock) with spin locks

Fri Jan 24 22:57:47 UTC 2020

On 1/24/20 5:42 PM, David Holmes wrote:
> On 25/01/2020 4:18 am, Daniel D. Daugherty wrote:
>> A follow up on audit_and_print_stats()...
>>
>> On 1/24/20 11:48 AM, Daniel D. Daugherty wrote:
>>> On 1/23/20 9:09 PM, David Holmes wrote:
>>
>> <snip>
>>
>>>
>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> 2189     // With fine grained locks on the monitor lists, it is 
>>>>>> possible for
>>>>>> 2190     // log_monitor_list_counts() to return a value that 
>>>>>> doesn't match
>>>>>> 2191     // LVars.population. So far a higher value has been seen 
>>>>>> in testing
>>>>>> 2192     // so something is being double counted by 
>>>>>> log_monitor_list_counts().
>>>>>>
>>>>>> AFAICS this code is always executed at a safepoint so I don't 
>>>>>> understand how the counts could be out-of-sync. Ditto for the 
>>>>>> comments here:
>>>>>>
>>>>>> 2315     // With fine grained locks on LVars.free_list, it is 
>>>>>> possible for an
>>>>>> 2316     // ObjectMonitor to be prepended to LVars.free_list 
>>>>>> after we started
>>>>>> 2317     // calculating chk_om_free_count so LVars.free_count may 
>>>>>> not
>>>>>> 2318     // match anymore.
>>>>>>
>>>>>> and here:
>>>>>>
>>>>>> 2346     // With fine grained locks on the monitor lists, it is 
>>>>>> possible for
>>>>>> 2347     // an exiting JavaThread to put its in-use 
>>>>>> ObjectMonitors on the
>>>>>> 2348     // global in-use list after chk_om_in_use_count is 
>>>>>> calculated above.
>>>>>>
>>>>>> ???
>>>>>
>>>>> This header comment was intended to cover that rationale:
>>>>>
>>>>>      L2152: // This function can be called at a safepoint or it 
>>>>> can be called when
>>>>>      L2153: // we are trying to exit the VM. When we are trying to 
>>>>> exit the VM, the
>>>>>      L2154: // list walker functions can run in parallel with the 
>>>>> other list
>>>>>      L2155: // operations so spin-locking is used for safety.
>>>>>      L2156: //
>>>>>      L2157: // Calls to this function can be added in various 
>>>>> places as a debugging
>>>>>      L2158: // aid; pass 'true' for the 'on_exit' parameter to 
>>>>> have in-use monitor
>>>>>      L2159: // details logged at the Info level and 'false' for 
>>>>> the 'on_exit'
>>>>>      L2160: // parameter to have in-use monitor details logged at 
>>>>> the Trace level.
>>>>>      L2161: //
>>>>>      L2162: void ObjectSynchronizer::audit_and_print_stats(bool 
>>>>> on_exit) {
>>>>>
>>>>> so no the audit_and_print_stats() function and the various chk_*()
>>>>> functions can be called at a non-safepoint.
>>>>
>>>> When we are exiting the VM this is called from exit_globals, which 
>>>> does execute at a safepoint - either:
>>>>
>>>> - from VM_Exit::do_it() (a safepoint VM op); or
>>>> - from Threads::destroy_vm() after the VMThread has been 
>>>> terminated, which brings the VM to a safepoint first.
>>>
>>> Hmmm... I have this assert here:
>>>
>>> 2162 void ObjectSynchronizer::audit_and_print_stats(bool on_exit) {
>>> 2163   assert(on_exit || SafepointSynchronize::is_at_safepoint(), 
>>> "invariant");
>>>
>>> and I remember adding the "on_exit" conditional so the assert wouldn't
>>> fire. I'll investigate more...
>>>
>>> Also, audit_and_print_stats() calls can be added as a debugging
>>> aide when chasing ObjectMonitor list issues. The way that it is
>>> written, it doesn't require a safepoint (when on_exit == true).
>>>
>>> I'll make the minor edits mentioned above before I send out the
>>> webrev for the next round of review.
>>>
>>> Thanks again for being such a thorough reviewer.
>>>
>>> Dan
>>
>> audit_and_print_stats() was added to the Async Monitor Deflation
>> project by this changeset:
>>
>> changeset:   34:64accf02d468
>> user:        dcubed
>> date:        Fri Dec 21 13:52:56 2018 -0500
>> summary:     dcubed.monitor_deflate_conc.23, 
>> diff.txt.dcubed.monitor_deflate_conc.23 - Add support for 
>> ObjectSynchronizer::audit_and_print_stats(), called by exit_globals() 
>> at Info level and by finish_deflate_idle_monitors() at Debug level; 
>> add is_busy_async() for use by audit_and_print_stats(); restore a 
>> guarantee(take->object() == NULL) check in omAlloc() now that 
>> AsyncDeflateIdleMonitors cleanup of _object field is finished.
>>
>>
>> Looking at my notes for that week, I was chasing tests that were failing
>> due to ObjectMonitors that were keeping objects pinned. The tests were
>> expecting that enough System.gc() calls would cause an unreferenced
>> and collectible object to be GC'ed. This was back when we were having
>> the JavaThread do its own deflation...
>>
>> It looks like I was using audit_and_print_stats(true) calls to probe
>> the VM at different points trying to determine what was keeping the
>> object pinned and why... So I was using audit_and_print_stats() at
>> non-safepoint locations for debugging purposes...
>
> Okay. If you want this ability that is fine then.

Thanks! (It has been useful...)

>
>> Dan
>>
>> P.S.
>>
>> In src/hotspot/share/compiler/compileBroker.cpp:
>> CompileBroker::handle_full_code_cache() calls exit_globals()
>> when the ExitOnFullCodeCache is true in non-PRODUCT bits.
>> I don't think we're at a safepoint for that call... :-)
>> I don't see any tests that enable that flag and the default
>> is false so I couldn't have run into it in my testing...
>
> Yikes! That's not the way to exit the VM :) But we probably don't 
> really care. That termination path seems mid-way between a graceful 
> exit and an abort. With no termination safepoint it can be quite 
> likely to run into shutdown crashes anyway.

Does seem to be a scary piece of code... it was added by this changeset:

$ hg log -r 71fd601907dc hotspot/src/share/vm/compiler/compileBroker.cpp
changeset:   4750:71fd601907dc
user:        kvn
date:        Fri Jan 29 09:27:22 2010 -0800
summary:     4360113: Evict nmethods when code cache gets full

so almost 10 years ago... :-)  Dunno if anyone still uses this
option (ExitOnFullCodeCache)...

Dan

>
> Cheers,
> David