RFR(L): 8235795: replace monitor list mux{Acquire,Release}(&gListLock) with spin locks

Fri Jan 24 02:09:28 UTC 2020

Hi Dan,

A few responses inline, with trimming ...

On 24/01/2020 4:31 am, Daniel D. Daugherty wrote:
> On 1/6/20 1:37 AM, David Holmes wrote:
>> ---
>>
>> 1401     // We use the simpler lock-mid-as-we-go protocol to prevent 
>> races
>>
>> "simpler" than what? Somewhere (in the code) we are missing a simple 
>> description of the chained-locking/hand-over-hand-locking pattern that 
>> is being used to allow safe traversal. The comment block at L2275 
>> touches on it but I think this needs to be clearly explained in the 
>> meta comments for the whole OM lifecycle process. Perhaps some list 
>> management notes immediately after the comment block at L1233-L1237 ?
> 
> Of course, due to this comment, I took a closer look at
> ObjectSynchronizer::om_release() and I figured out that I could greatly
> simplify the locking protocol that's used in that function so we no
> longer use the lock-mid-as-we-go protocol.
> 
> This simplification solves a bug in om_release() where this call:
> 
>      L1415:         } else {
>      L1416:           // mid is locked. Switch cur_mid_in_use's next 
> field to next
>      L1417:           // which is safe because we have no parallel list 
> deletions,
>      L1418:           // but we leave mid locked:
>      L1419:           set_next(cur_mid_in_use, next);
>      L1420:         }
> 
> can accidentally unlock a cur_mid_in_use that was just locked by a list
> walker (see lock_next_for_traversal()). When the list walker calls
> om_unlock() we fail:
> 
>      L194:   guarantee(((intptr_t)next & OM_LOCK_BIT) == OM_LOCK_BIT, 
> "next=" INTPTR_FORMAT
>      L195:             " must have OM_LOCK_BIT=%x set.", p2i(next), 
> OM_LOCK_BIT);
> 
> This bug doesn't exist in the 8153224 version of om_release() because
> we had to use the lock-cur_mid_in_use-and-mid-as-we-go protocol due to
> async deflation. So when I extracted and simplified the logic for this
> bug fix (8235795), I introduced the bug (bad Dan!).
> 
> 
> Of course, there is one other place where we used the simpler
> lock-mid-as-we-go protocol so I also reexamined deflate_monitor_list().
> It turns out that deflate_monitor_list() doesn't race with anything
> else when accessing the in-use lists so I was able to drop all of the
> locking calls from that code. It's not quite a reversion back to the
> baseline version because of other changes, but it is close.
> 
> deflate_monitor_list() is only called at a safepoint so it doesn't race
> with any non-safepoint audit_and_print_stats() calls. The VMThread handles
> deflating the global in-use list by itself. The per-thread in-use lists
> are either handled by the VMThread by itself or by worker thread(s) which
> each handle a JavaThread's per-thread in-use list by itself. So there is
> no overlay by multiple threads calling deflate_monitor_list() on the same
> list. Lastly, the VMThread makes the call to audit_and_print_stats() after
> all the deflate_monitor_list() calls have finished so no overlap there.
> 
> At this point, I don't know what I would/should write about list management
> protocols after L1233-L1237. We no longer have the lock-mid-as-we-go 
> protocol
> as the minimum. Each place that does list walking now uses locking in a way
> that's very specific to the function that's doing the list walking and what
> it might be racing with.
> 
> So while I've rewritten the list management locking in both
> ObjectSynchronizer::om_release() and deflate_monitor_list(), I haven't
> really updated header comments to talk about list management locking
> protocols.

Okay, I'll need to reconsider that when I see the updated code.

>> ---
>>
>> 1525         // s is locked so there must be a racing walker thread ahead
>> 1526         // of us so we'll give it a chance to finish.
>> 1527         while (is_locked(s)) {
>> 1528           os::naked_short_sleep(1);
>> 1529         }
>>
>> So this implies that any walking code is safepoint-check free (or more 
>> generally non-blocking). Is that asserted where possible?
> 
> If I understand correctly, what you're worried about here is whether
> a thread can lock an ObjectMonitor ('s' in this case) and then block
> for a safepoint which would cause the thread running this code to loop
> forever. This is similar to my worry that an algorithm error can lead
> to a thread forgetting to unlock an ObjectMonitor.
> 
> I don't really see a way to assert() for this situation other than the
> existing safepoint-takes-too-long checks done when SafepointTimeout
> is enabled. So by default, the VM would appear to hang here...
> 
> Do you have a suggestion or two here?

NoSafepointVerifier in the walking code paths?

> 
>> ---
>>
>> 1947   OrderAccess::storestore();           // flush inits for worker 
>> threads
>>
>> A storestore barrier doesn't flush anything, it simple maintains an 
>> ordering between stores astride the barrier. If you needed to ensure 
>> visibility of these updates to the GC worker threads then this would 
>> not suffice. But any communication with the GC worker threads (i.e. 
>> telling them there is work to be done) requires explicit 
>> synchronization anyway, so this barrier should not be needed as it 
>> should be subsumed by that existing mechanism.
> 
> So this particular "storestore()" call was added at your suggestion
> back on 2019.10.22:
> 
> On 10/22/19 3:40 AM, David Holmes wrote:
>> And again:
>>
>> 2470 void 
>> ObjectSynchronizer::prepare_deflate_idle_monitors(DeflateMonitorCounters* 
>> counters) {
>> 2471   OrderAccess::release_store(&counters->n_in_use, 0); // 
>> currently associated with objects
>> 2472 OrderAccess::release_store(&counters->n_in_circulation, 0); // 
>> extant
>> 2473   OrderAccess::release_store(&counters->n_scavenged, 0); // 
>> reclaimed (global and per-thread)
>> 2474 OrderAccess::release_store(&counters->per_thread_scavenged, 0); 
>> // per-thread scavenge total
>> 2475   counters->per_thread_times = 0.0; // per-thread scavenge times
>> 2476 }
>>
>> A release_store is not needed for each write. If "counters" is to be 
>> published somewhere for another thread to examine then it suffices to 
>> publish it with a suitable memory barrier. If you wanted to 
>> internalize it you could add a storeStore() barrier at the end. 
> 
> My reply to that comment was:
> 
>> I'll switch these counter inits to a simple store, e.g.:
>>
>>     L2471:   counters->n_in_use = 0;  // currently associated with 
>> objects
>>
>> with a single storeStore() after them all or a comment stating which 
>> memory
>> sync we're relying on. It maybe safer for future safepoint work 
>> juggling to
>> put the storeStore() there rather than relying on a comment and hoping 
>> for
>> unchanging code... 
> 
> so I thought I was doing what you requested back on 2019.10.22...
> I added a storestore() barrier as you said to do...

Yep that is all well and good. But now I see the comment it raises 
questions about the broader context of how this code interacts with GC 
worker threads. It seems to me the storestore is either insufficient 
(because it only affects ordering not visibility), or else redundant 
(because the sync with the worker threads already handles everything).

> We want to make sure those fields are initialized before the
> worker threads start doing their work and atomically incrementing
> those fields.
> 
> ObjectSynchronizer::prepare_deflate_idle_monitors() is called by
> SafepointSynchronize::do_cleanup_tasks() before it uses a group of:
> 
>      WorkGang* cleanup_workers = heap->get_safepoint_workers();
> 
> to get the work done:
> 
>    if (cleanup_workers != NULL) {
>      // Parallel cleanup using GC provided thread pool.
>      uint num_cleanup_workers = cleanup_workers->active_workers();
>      ParallelSPCleanupTask cleanup(num_cleanup_workers, &deflate_counters);
>      StrongRootsScope srs(num_cleanup_workers);
>      cleanup_workers->run_task(&cleanup);
>    } else {
> 
> I can't imagine there isn't a lock grab with a proper memory
> barrier before we dispatch all the worker threads to do the work...

The main synchronization mechanism appears to be Semaphores and the 
implementation of that does ensure memory synchronization - ref e.g.:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12

> Of course, if there are no worker threads, then the VMThread
> does it all and no memory barrier is needed.

Right so ...

> So how should I change this line:
> 
>      L1947:   OrderAccess::storestore();           // flush inits for 
> worker threads

... it seems this is actually redundant in this context and can be 
removed. Perhaps with a clarifying command in relation to synchronising 
with the worker threads or VMThread.

> 
>> ---
>>
>> 2047     // The working list is local so no need for a memory barrier.
>>
>> Unclear what kind of barrier you thought you might need. Did you mean 
>> "no need for locking/synchronization"?
> 
> This comment appears in three places:
> 
>      L1846:     // defined by free_head_p and free_tail_p. The working 
> list is
>      L1847:     // local so no need for a memory barrier.
>      L1848:     if (*free_head_p == NULL) *free_head_p = mid;
> 
>      L1977:     // The working list is local so no need for a memory 
> barrier.
>      L1978:     guarantee(free_tail_p != NULL && deflated_count > 0, 
> "invariant");
> 
>      L2047:     // The working list is local so no need for a memory 
> barrier.
>      L2048:     guarantee(free_tail_p != NULL && deflated_count > 0, 
> "invariant");
> 
> I _think_ those comments were added when I had too many load_acquire() and
> release_store() calls. So in these three cases, I'm saying that I don't
> need memory barriers for load() or store() because these are local lists
> that are not visible to other threads.

While I can see the comments aided you when figuring out the barrier 
placement, in general I think these ones will confuse future readers of 
the code.

> 
>> ---
>>
>> 2189     // With fine grained locks on the monitor lists, it is 
>> possible for
>> 2190     // log_monitor_list_counts() to return a value that doesn't 
>> match
>> 2191     // LVars.population. So far a higher value has been seen in 
>> testing
>> 2192     // so something is being double counted by 
>> log_monitor_list_counts().
>>
>> AFAICS this code is always executed at a safepoint so I don't 
>> understand how the counts could be out-of-sync. Ditto for the comments 
>> here:
>>
>> 2315     // With fine grained locks on LVars.free_list, it is possible 
>> for an
>> 2316     // ObjectMonitor to be prepended to LVars.free_list after we 
>> started
>> 2317     // calculating chk_om_free_count so LVars.free_count may not
>> 2318     // match anymore.
>>
>> and here:
>>
>> 2346     // With fine grained locks on the monitor lists, it is 
>> possible for
>> 2347     // an exiting JavaThread to put its in-use ObjectMonitors on the
>> 2348     // global in-use list after chk_om_in_use_count is calculated 
>> above.
>>
>> ???
> 
> This header comment was intended to cover that rationale:
> 
>      L2152: // This function can be called at a safepoint or it can be 
> called when
>      L2153: // we are trying to exit the VM. When we are trying to exit 
> the VM, the
>      L2154: // list walker functions can run in parallel with the other 
> list
>      L2155: // operations so spin-locking is used for safety.
>      L2156: //
>      L2157: // Calls to this function can be added in various places as 
> a debugging
>      L2158: // aid; pass 'true' for the 'on_exit' parameter to have 
> in-use monitor
>      L2159: // details logged at the Info level and 'false' for the 
> 'on_exit'
>      L2160: // parameter to have in-use monitor details logged at the 
> Trace level.
>      L2161: //
>      L2162: void ObjectSynchronizer::audit_and_print_stats(bool on_exit) {
> 
> so no the audit_and_print_stats() function and the various chk_*()
> functions can be called at a non-safepoint.

When we are exiting the VM this is called from exit_globals, which does 
execute at a safepoint - either:

- from VM_Exit::do_it() (a safepoint VM op); or
- from Threads::destroy_vm() after the VMThread has been terminated, 
which brings the VM to a safepoint first.

Cheers,
David
-----

> 
>>
>> ---
>>
>> That's it. :)
> 
> As always, thanks for the thorough review.
> 
> Dan