RFR(L) 8153224 Monitor deflation prolong safepoints (CR8/v2.08/11-for-jdk14)

Tue Nov 12 02:20:37 UTC 2019

Hi Robbin,

tl;dr I can see us moving to a new style of using Atomic::load/store to 
replace plain load/store and declaring variables as volatile. But I'd 
like to see it discussed and agreed upon and written up clearly in the 
wiki so we can consistently apply it. Only the new lock-free queue 
management code should attempt that in this set of changes IMO.

On 12/11/2019 6:24 am, Robbin Ehn wrote:
> Hi David,
> 
> On 2019-11-11 15:03, David Holmes wrote:
>> Word-tearing is only a potential issue for 16-bit or smaller accesses, 
>> or unaligned 32-bit or 64-bit accesses. But we don't (shouldn't) use 
>> unaligned 32-bit and 64-bit accesses to ensure there is no possibility 
>> of word-tearing. Otherwise we would need to use Atomic::load/store for 
>> every lock-free algorithm and data-structure that we have.
> 
> Yes, but we must be make sure compiler generates such accesses.
> 
> Here is what Linux docs says.
> https://www.kernel.org/doc/Documentation/memory-barriers.txt:
> 
>   (*) For aligned memory locations whose size allows them to be accessed
>       with a single memory-reference instruction, prevents "load tearing"
>       and "store tearing," in which a single large access is replaced by
>       multiple smaller accesses.  For example, given an architecture having
>       16-bit store instructions with 7-bit immediate fields, the compiler
>       might be tempted to use two 16-bit store-immediate instructions to
>       implement the following 32-bit store:
> 
>      p = 0x00010002;
> 
>       Please note that GCC really does use this sort of optimization,
>       which is not surprising given that it would likely take more
>       than two instructions to build the constant and then store it.
>       This optimization can therefore be a win in single-threaded code.
>       In fact, a recent bug (since fixed) caused GCC to incorrectly use
>       this optimization in a volatile store.  In the absence of such bugs,
>       use of WRITE_ONCE() prevents store tearing in the following example:
> 
>      WRITE_ONCE(p, 0x00010002);
> 
> 
> WRITE_ONCE implemented as cast to volatile then store.
> https://elixir.bootlin.com/linux/latest/source/include/linux/compiler.h#L220 
> 
> 
> WRITE_ONCE and Atomic::store is the same.

The document you quote goes on to say:

"All that aside, it is never necessary to use READ_ONCE() and
WRITE_ONCE() on a variable that has been marked volatile. "

So our use of "volatile" is already addressing this aspect.

>>
>> Atomic::load/store was primarily needed for 64-bit values on 32-bit 
>> platforms.
>>
>>
>> The point is we shouldn't need to guarantee atomic load/store for 
>> 32-bit or 64-bit values using the Atomic class because it is the 
>> implicit mode in which we operate.
> 
> Not sure, what the implicit mode is.

The implicit mode is where we assume/require/demand that accesses to 
aligned 32-bit and 64-bit fields are atomic - as per the comment from 
the wiki. The wiki may have been referring to Java variables but it was 
not specifically about C2 code, and in fact the comment about atomic 
accesses was a fundamental expectation of the compiler across the whole VM.

> Compiler may generate basically whatever it fells like as long as a 
> single-threaded execution have the same final result.
> The compiler may elide the stores/loads completely, it may fuse 
> accesses, it may add accesses, etc...

Sure. I hadn't registered the fact it was the "volatile" that was 
ensuring this but now I do, so that is fine. Compilers may not guarantee 
this but we were always working with very specific compilers, and 
specific versions, and only updated after verifying the compiler 
appeared to be doing the expected "right thing".

> E.g.
> 
> if (b == 7) a = 7;
> else a = 0;
> c = 42;
> 
> Compiler may emit:
> a = 0; // extra store
> c = 42;
> if (b == 7) a = 7;
> 
> Making it a volatile will prevents that and turn it into a atomic store 
> in both branches.
> But volatile might also force the compiler to order it with c (depending 
> on if c is volatile or not).
> If we use Atomic::load/store it's much more obvious that we allow the 
> compiler to move store of c before store of a.

I would be surprised if we ever actually care about this aspect. Most of 
the time we will have ordering dependencies that require specific 
OrderAccess operations.

>>
>> But I can see quite a number of uses have crept in to the code base. 
>> :( Go back to Java 7 and the only use of Atomic's was for dealing with 
>> 64-bit on 32-bit platforms.
>>
>> I also can't see how/where the Atomic class is in fact doing anything 
>> to guarantee atomicity for 32-bit or 64-bit values.
> 
> It casts to volatile AFAIK.

Okay and that is sufficient (albeit still a non-guaranteed property IIUC).

> When we can it should be C++ Atomic store with memory_order_relaxed.
> So compiler may re-order atomic stores/loads.

I can buy into this approach ...

>> volatile is only used to prevent basic compiler optimizations from 
>> being applied. 
> 
> And this prevention is either to weak or to strong, it's always wrong.
> I guess that is also the reason for C++ proposal of deprecating volatile.
> 
>> It is used for any concurrently modified variable that is accessed
>> lock-free to at least request the compiler to not try to be clever 
>> when accessing this variable. This may not have any well specified 
>> semantics according to language specifications but we have always used 
>> compilers in good faith that they do the right thing. Use of volatile 
>> has nothing to do with any perceived atomicity of access, nor does it 
>> suggest anything about hardware reordering.
> 
> volatile forces the compiler to generate plain store/load instructions 
> which gives us the atomic property we want, but the reordering 
> constraints are an unwanted side-effect. Using Atomic::load/store gives 
> us exactly what we need.

... as I said I'm not sure these reordering side-effects ever really 
adversely affect us, but I can buy into an approach that separates the 
two concerns.

>>
>> Atomicity of access for 32-bit and 64-bit values is implicitly 
>> obtained by using plain load/stores and having suitable aligned 
>> variables. That's the way it is supposed to work so that we don't need 
>> to write Atomic::load and Atomic::store on every variables used in 
>> lock-free contexts. But it seems that message has not been passed on 
>> through the years. I can point you to an internal wiki that I wrote up 
>> on 2010 where it states:
> 
> The only way to guaratee that the compiler will emit plain stores/load 
> is to use Atomic:: or volatile. Often if you have taken the time to 
> write something lock-free you have done that for preformance, and any 
> un-needed ordering just reduce performance. So using volatile semantics 
> is wrong, either you have a bug or to strict ordering.
> 
>>
>> "In addition the Java platform requires that basic accesses (simple 
>> loads and stores, but not compound operations like increment) are 
>> atomic for all 32-bit Java data types (all except long and double). 
>> This is usually trivially achieved by aligning values on 32-bit 
>> boundaries on 32-bit, or 64-bit systems."
> 
> We are compiling with gcc/clang, not C2, which do not care about above.

The above is not specifically about C2.

> Now, I'm not suggestion changing anything, but I think new code should 
> use correct semantic. This code is a bit mixed since the prexisting code 
> use volatile.

As I said I can buy into this style of programming _but_ this is 
something that should be discussed and agreed upon and implemented using 
clear and consistent styles, not applied ad-hoc here and there based on 
the preference of the current developer. Our use of "volatile" is not 
incorrect but may be suboptimal (though I doubt is it really an issue).

In reference to the async monitor deflation code I could see the new 
lock-free list management using this new style (a style we need to 
clearly document on the hotspot wiki), but I would not want to see a mix 
of Atomic and plain accesses on existing volatile variables at this 
stage. (We can adapt existing code to the new style as a later 
enhancement.) [ Note: I'm making an assumption about how well isolated 
the list management code and it may not quite be what I think.]

> We don't have consume yet and acquire is to strong, I suggested relaxed, 
> Atomic::load(). 

I don't think Atomic::load/store should have any memory-ordering 
properties - so yes "relaxed". We have OrderAccess for imposing true 
memory ordering and load_acquire/release_store etc use 
Atomic::load/store internally.

> Which I think is correct for all usecases of ref_count() 
> except the ADIM_guarantee where we double load ref count, where I think 
> consume is correct. But consume does not help, since if the ref count is 
> wrong who knows what the second load will be.

I'm not at all clear on "consume" (didn't they decide to scrap that 
access mode?). Anyway this is way too much overthinking in relation to 
the ADIM_guarantee. The guarantee initially checked the value of the 
local variable but reported the current value of ref_count() which could 
have changed - so you could see an inconsistent message of the form 
"assert failed: expected > 0 got 1". So Dan fixed that to report the 
value of the local, but also report the current value of ref_count(). 
This is useful in the case that ref_count() has in fact changed as you 
can more obviously see there is a race - but it does depend on the two 
calls to ref_count() not being compacted into one (which is certainly 
not an issue with the way it is/was implemented).

Cheers,
David
-----

> /Robbin
> 
>>
>> David
>> -----
>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>> On 8/11/2019 11:35 pm, Robbin Ehn wrote:
>>>>> Hi Dan,
>>>>>
>>>>> Thanks for looking into this, some comments on v8:
>>>>>
>>>>> ##################
>>>>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>>>>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>>>>> src/hotspot/share/logging/logTag.hpp
>>>>> src/hotspot/share/oops/markWord.hpp
>>>>> src/hotspot/share/runtime/basicLock.cpp
>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>> src/hotspot/share/runtime/sharedRuntime.cpp
>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>> src/hotspot/share/runtime/vmStructs.cpp
>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>> test/hotspot/gtest/oops/test_markWord.cpp
>>>>>
>>>>> No comments.
>>>>>
>>>>> ##################
>>>>> I don't see the benefit of having the 
>>>>> -HandshakeAfterDeflateIdleMonitors code paths.
>>>>> Removing that option would mean these files can be reverted:
>>>>> src/hotspot/cpu/aarch64/globals_aarch64.hpp
>>>>> src/hotspot/cpu/arm/globals_arm.hpp
>>>>> src/hotspot/cpu/ppc/globals_ppc.hpp
>>>>> src/hotspot/cpu/s390/globals_s390.hpp
>>>>> src/hotspot/cpu/sparc/globals_sparc.hpp
>>>>> src/hotspot/cpu/x86/globals_x86.hpp
>>>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>>>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>>>> src/hotspot/cpu/zero/globals_zero.hpp
>>>>>
>>>>> And one less option here:
>>>>> src/hotspot/share/runtime/globals.hpp
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/prims/jvm.cpp
>>>>>
>>>>> Unclear if this is a good idea.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/prims/whitebox.cpp
>>>>>
>>>>> This would assume the test expects the right thing, but that is not 
>>>>> obvious.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>>>
>>>>> The current pending and waiting monitor is only changed by the 
>>>>> JavaThread itself.
>>>>> It only sets it after _contentions is increased.
>>>>> It clears it before _contentions is decreased.
>>>>> We are depending on safepoint or the thread is suspended, so it 
>>>>> can't be deflated since _contentions are > 0.
>>>>> Plus the thread have already increased the ref count and can't 
>>>>> decrease it (since at safepoint or suspended).
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>
>>>>> ###1
>>>>> You have several these (and in other files):
>>>>> 242   jint l_ref_count = ref_count();
>>>>> 243   ADIM_guarantee(l_ref_count > 0, "must be positive: 
>>>>> l_ref_count=%d, ref_count=%d", l_ref_count, ref_count());
>>>>> Please use Atomic::load() in ref_count.
>>>>> Since this is dependent on ref_count being volatile, otherwise the 
>>>>> compiler may only do one load.
>>>>>
>>>>> ###2
>>>>> 307   // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
>>>>> ...
>>>>> 311   Atomic::add(1, &_contentions);
>>>>> In ObjectSynchronizer::deflate_monitor if you would check ref count 
>>>>> instead of _contetion, we could remove contention.
>>>>> Since all waiters also have a ref count it looks like we don't need 
>>>>> waiters either.
>>>>> In ObjectSynchronizer::deflate_monitor:
>>>>> if (mid->_contentions != 0 || mid->_waiters != 0) {
>>>>> Why not just do:
>>>>> if (mid->ref_count()) {
>>>>> ?
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>>
>>>>> ###1
>>>>>   252   intptr_t is_busy() const {
>>>>>   253     // TODO-FIXME: assert _owner == null implies _recursions = 0
>>>>>   254     // We do not include _ref_count in the is_busy() check 
>>>>> because
>>>>>   255     // _ref_count is for indicating that the ObjectMonitor* 
>>>>> is in
>>>>>   256     // use which is orthogonal to whether the ObjectMonitor 
>>>>> itself
>>>>>   257     // is in use for a locking operation.
>>>>>
>>>>> But in the non-debug code we always check:
>>>>> +  if (mid->is_busy() || mid->ref_count() != 0) {
>>>>>
>>>>> So it seem like you should have a method including ref count.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>
>>>>> Use Atomic::load for ref count.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>
>>>>> ###1
>>>>>   139 static volatile int g_om_free_count = 0;    // # on g_free_list
>>>>>   140 static volatile int g_om_in_use_count = 0;  // # on 
>>>>> g_om_in_use_list
>>>>>   141 static volatile int g_om_population = 0;    // # Extant -- in 
>>>>> circulation
>>>>>   142 static volatile int g_om_wait_count = 0;    // # on g_wait_list
>>>>> No padding here, aren't they more contended than the fields in the OM?
>>>>>
>>>>> ###2
>>>>> 151 static bool is_next_marked(ObjectMonitor* om) {
>>>>>
>>>>> Is only used in ObjectSynchronizer::om_flush.
>>>>> Here you fetch a OM and read the next field, this do not need LA 
>>>>> semantics on supported platforms.
>>>>> This would only need Atomic::load.
>>>>>
>>>>> ###3
>>>>> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>>>>>
>>>>> In no place you need SR, in the only places it would made a 
>>>>> difference:
>>>>>   345       OrderAccess::storestore();
>>>>>   346       set_next(cur, next);  // Unmark the previous list head.
>>>>> and
>>>>> 1714     OrderAccess::storestore();
>>>>> 1715     set_next(in_use_list, next);
>>>>>
>>>>> You have a storestore already!
>>>>>
>>>>> This code reads as:
>>>>> OrderAccess::storestore();
>>>>> OrderAccess::loadstore();
>>>>> OrderAccess::storestore();
>>>>> om->_next_om = value
>>>>>
>>>>> So it should be an Atomic::store.
>>>>>
>>>>> ###4
>>>>> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>>>>>
>>>>> Since the mark is an embedded spinlock I think the terminology 
>>>>> should be changed. (that the spinlock is inside a the next pointer 
>>>>> should be abstracted away)
>>>>> E.g. mark_next_loop would just be lock.
>>>>> The load of the list heads should use Atmoic:load.
>>>>> It also seem a bit wired to return next for the locking method.
>>>>> And output parameter can just be returned, and return NULL if list 
>>>>> head is NULL.
>>>>> E.g.
>>>>>
>>>>>   198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* 
>>>>> volatile * list_p) {
>>>>>   200   while (true) {
>>>>>   201     ObjectMonitor* mid = Atomic::load(list_p);
>>>>>   202     if (mid == NULL) {
>>>>>   203       return NULL;  // The list is empty.
>>>>>   204     }
>>>>>   205     if (try_lock(mid)) {
>>>>>   206       if (Atmoic::load(list_p) != mid) {
>>>>>   207         // The list head changed so we have to retry.
>>>>>   208         unlock(mid);
>>>>>   210       } else {
>>>>>               return mid;
>>>>>         }
>>>>>   214     }
>>>>>           // Yield ?
>>>>>   215   }
>>>>>   216 }
>>>>>
>>>>> With colleteral changes.
>>>>>
>>>>> ###5
>>>>> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
>>>>> Atomic::store is what needed.
>>>>>
>>>>> ###6
>>>>> 333 static void prepend_to_common(
>>>>>
>>>>>   345       OrderAccess::storestore();
>>>>>   346       set_next(cur, next);  // Unmark the previous list head.
>>>>> Double storestore. (fixed by changing set_next to Atomic::store)
>>>>>
>>>>> ###7
>>>>>   375 static ObjectMonitor* 
>>>>> take_from_start_of_common(ObjectMonitor* volatile * list_p,
>>>>>
>>>>> Triple storestore here.
>>>>>
>>>>>   386   Atomic::dec(count_p);
>>>>>   387   // mark_list_head() used cmpxchg() above, switching list 
>>>>> head can be lazier:
>>>>>   388   OrderAccess::storestore();
>>>>>   389   // Unmark take, but leave the next value for any lagging list
>>>>>   390   // walkers. It will get cleaned up when take is prepended to
>>>>>   391   // the in-use list:
>>>>>   392   set_next(take, next);
>>>>>   393   return take;
>>>>>
>>>>> Reads:
>>>>> count_p--
>>>>> OrderAccess::loadstore();
>>>>> OrderAccess::storestore();
>>>>> OrderAccess::storestore();
>>>>> OrderAccess::loadstore();
>>>>> OrderAccess::storestore();
>>>>> take->_next_om = next;
>>>>>
>>>>> Fixed by changing set_next to Atomic::store and removing the 
>>>>> OrderAccess::storestore();
>>>>>
>>>>> ###8
>>>>> ObjectSynchronizer::om_release(
>>>>>
>>>>> 1591       if (m == mid) {
>>>>> 1592         // We found 'm' on the per-thread in-use list so try 
>>>>> to extract it.
>>>>> 1593         if (cur_mid_in_use == NULL) {
>>>>> 1594           // mid is the list head and it is marked. Switch the 
>>>>> list head
>>>>> 1595           // to next which unmarks the list head, but leaves 
>>>>> mid marked:
>>>>> 1596           self->om_in_use_list = next;
>>>>> 1597           // mark_list_head() used cmpxchg() above, switching 
>>>>> list head can be lazier:
>>>>> 1598           OrderAccess::storestore();
>>>>> 1599         } else {
>>>>> 1600           // mid and cur_mid_in_use are marked. Switch 
>>>>> cur_mid_in_use's
>>>>> 1601           // next field to next which unmarks cur_mid_in_use, 
>>>>> but leaves
>>>>> 1602           // mid marked:
>>>>> 1603           
>>>>> OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
>>>>> 1604         }
>>>>> 1605         extracted = true;
>>>>> 1606         Atomic::dec(&self->om_in_use_count);
>>>>> 1607         // Unmark mid, but leave the next value for any 
>>>>> lagging list
>>>>> 1608         // walkers. It will get cleaned up when mid is 
>>>>> prepended to
>>>>> 1609         // the thread's free list:
>>>>> 1610         set_next(mid, next);
>>>>> 1611         break;
>>>>> 1612       }
>>>>>
>>>>> This does not look correct. Before taking this branch we have done 
>>>>> a cmpxchg in mark_list_head or mark_next_loop.
>>>>> This is how it reads:
>>>>> OrderAccess::storestore(); // from previous cmpxchg
>>>>> OrderAccess::loadstore(); // from previous cmpxchg
>>>>> 1591       if (m == mid) {
>>>>> 1593         if (cur_mid_in_use == NULL) {
>>>>> 1596           self->om_in_use_list = next;
>>>>> 1598           OrderAccess::storestore();
>>>>> 1599         } else {
>>>>>                 OrderAccess::storestore();
>>>>>                 OrderAccess::loadstore();
>>>>> 1603           cur_mid_in_use->_next_om = next;
>>>>> 1604         }
>>>>> 1605         extracted = true;
>>>>>               OrderAccess::storestore();
>>>>>               OrderAccess::fence(); // 
>>>>> storestore|storeload|loadstore|loadload
>>>>>           self->om_in_use_count--; // Atomic::dec
>>>>>               OrderAccess::storestore();
>>>>>               OrderAccess::loadstore();
>>>>>               OrderAccess::storestore();
>>>>>               OrderAccess::loadstore();
>>>>>           mid->_next_om = next; // Atomic::store
>>>>> 1611         break;
>>>>> 1612       }
>>>>>
>>>>> extracted is local variable so you so not need any orderaccess 
>>>>> before it set.
>>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>>> OrderAccess::storestore() and changing OrderAccess::release_store 
>>>>> to Atmoic::store();
>>>>>
>>>>> ###9
>>>>> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>>>>>
>>>>> 1714     OrderAccess::storestore();
>>>>> 1715     set_next(in_use_list, next);
>>>>> Fixed by changing set_next to Atomic::store.
>>>>>
>>>>> ###10
>>>>> 1737     self->om_free_list = NULL;
>>>>> 1738     OrderAccess::storestore();  // Lazier memory is okay for 
>>>>> list walkers.
>>>>>
>>>>> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does 
>>>>> first thing cmpxchg so there is no need for this storestore.
>>>>>
>>>>> ###11
>>>>> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, 
>>>>> Thread* self,
>>>>>
>>>>> 1938       // Once ObjectMonitor is configured and the object is 
>>>>> associated
>>>>> 1939       // with the ObjectMonitor, it is safe to allow async 
>>>>> deflation:
>>>>> 1940       assert(m->is_new(), "freshly allocated monitor must be 
>>>>> new");
>>>>> 1941       m->set_allocation_state(ObjectMonitor::Old);
>>>>>
>>>>> So we use ref count, contention, waiter, owner and allocation state 
>>>>> to keep OM alive in different scenarios.
>>>>> There is not way for me to keep track of that. I don't see why you 
>>>>> would need more than owner and ref count.
>>>>> If you allocate the om with ref count 1 you can remove 
>>>>> _allocation_state and just decrease ref count here instead.
>>>>>
>>>>> ###12
>>>>> 2079 bool ObjectSynchronizer::deflate_monitor
>>>>>
>>>>> 2112     if (AsyncDeflateIdleMonitors) {
>>>>> 2113       // clear() expects the owner field to be NULL and we 
>>>>> won't race
>>>>> 2114       // with the simple C2 ObjectMonitor
>>>>>
>>>>> The macro assambler code is not just executed by C2, so this 
>>>>> comment is a bit misleading. (there are some more also)
>>>>>
>>>>> ###13
>>>>> 2306 int ObjectSynchronizer::deflate_monitor_list(
>>>>>
>>>>> Same issue as ObjectSynchronizer::om_release.
>>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>>> OrderAccess::storestore() and changing OrderAccess::release_store 
>>>>> to Atmoic::store();
>>>>>
>>>>> ###14
>>>>> 2474       if (SafepointSynchronize::is_synchronizing() &&
>>>>>
>>>>> This is the wrong method to call, it should 
>>>>> SafepointMechanism::should_block(Thread* thread);
>>>>>
>>>>> ###15
>>>>> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>>>>>
>>>>> 2616     g_wait_list = NULL;
>>>>> 2617     OrderAccess::storestore();  // Lazier memory sync is okay 
>>>>> for list walkers.
>>>>>
>>>>> I don't see that g_wait_list is ever simutainously read.
>>>>> Either it is accessed by serviceThread outside a safepoint or by 
>>>>> VMThread inside a safepoint?
>>>>>
>>>>> It looks like g_wait_list can just be a local in:
>>>>> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>>>>>
>>>>> (disregarding the debug code that might read it in a safepoint)
>>>>>
>>>>> ###16
>>>>> 2722         assert(SafepointSynchronize::is_synchronizing(), 
>>>>> "sanity check");
>>>>>
>>>>> This is the wrong method to call, it should 
>>>>> SafepointMechanism::should_block(Thread* thread);
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/vframe.cpp
>>>>>
>>>>> We are at safepoint or current thread or in a handshake, current 
>>>>> pending and waiting monitor is already stable.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/services/threadService.cpp
>>>>>
>>>>> These changes are only needed for the 
>>>>> -HandshakeAfterDeflateIdleMonitors path.
>>>>>
>>>>> ##################
>>>>> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java 
>>>>>
>>>>>
>>>>> Note: if OM had a weak to object instead this would not be needed.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>
>>>>> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have made changes to the Async Monitor Deflation code in 
>>>>>> response to
>>>>>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., 
>>>>>> Robbin
>>>>>> and Erik O. for their comments!
>>>>>>
>>>>>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async 
>>>>>> Monitor
>>>>>> Deflation project needs to push before Nov. 12, 2019 in order to 
>>>>>> allow
>>>>>> for sufficient bake time for such a big change. Nov. 12 is _next_ 
>>>>>> Tuesday
>>>>>> so we have 8 days from today to finish this code review cycle and 
>>>>>> push
>>>>>> this code for JDK14.
>>>>>>
>>>>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>>>>> reviews.
>>>>>>
>>>>>> I have attached the change list from CR7 to CR8 instead of putting 
>>>>>> it in
>>>>>> the body of this email. I've also added a link to the 
>>>>>> CR7-to-CR8-changes
>>>>>> file to the webrevs so it should be easy to find.
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>>      JDK-8153224 Monitor deflation prolong safepoints
>>>>>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-14+21.
>>>>>>
>>>>>> Here's the full webrev URL for those folks that want to see all of 
>>>>>> the
>>>>>> current Async Monitor Deflation code in one go (v2.08 full):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full 
>>>>>>
>>>>>>
>>>>>> Some folks might want to see just what has changed since the last 
>>>>>> review
>>>>>> cycle so here's a webrev for that (v2.08 inc):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/ 
>>>>>>
>>>>>>
>>>>>> The OpenJDK wiki did not need any changes for this round:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>>>>>> tier[1-8]
>>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>>> through my usual
>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with 
>>>>>> the addition
>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>>>>> tests in
>>>>>> my lab. Some testing is still running, but so far there are no new 
>>>>>> regressions.
>>>>>>
>>>>>> I have not yet done a SPECjbb2015 round on the 
>>>>>> CR8/v2.08/11-for-jdk14 bits.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> The Async Monitor Deflation project is reaching the end game. I 
>>>>>>> have no
>>>>>>> changes planned for the project at this time so all that is left 
>>>>>>> is code
>>>>>>> review and any changes that results from those reviews.
>>>>>>>
>>>>>>> Carsten and Roman! Time for you guys to chime in again on the 
>>>>>>> code reviews.
>>>>>>>
>>>>>>> I have attached the list of fixes from CR6 to CR7 instead of 
>>>>>>> putting it
>>>>>>> in the main body of this email.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-14+19.
>>>>>>>
>>>>>>> Here's the full webrev URL for those folks that want to see all 
>>>>>>> of the
>>>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>>>>>
>>>>>>>
>>>>>>> Some folks might want to see just what has changed since the last 
>>>>>>> review
>>>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>>>>>
>>>>>>>
>>>>>>> The OpenJDK wiki has been updated to match the 
>>>>>>> CR7/v2.07/10-for-jdk14 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> The jdk-14+18 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-8]
>>>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>>>> through my usual
>>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with 
>>>>>>> the addition
>>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the 
>>>>>>> other tests in
>>>>>>> my lab.
>>>>>>>
>>>>>>> The jdk-14+19 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-3]
>>>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>>>>>> process.
>>>>>>>
>>>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>> Performance lab
>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>>>
>>>>>>>     - "base" is jdk-14+18
>>>>>>>     - "v2.07" is the latest version and includes C2 
>>>>>>> inc_om_ref_count() support
>>>>>>>       on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors 
>>>>>>> option
>>>>>>>     - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>>>>     - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>>>>>> specified
>>>>>>>
>>>>>>>          hbIR           hbIR
>>>>>>>     (max attempted)  (settled)  max-jOPS  critical-jOPS  runtime
>>>>>>>     ---------------  ---------  --------  -------------  -------
>>>>>>>            34282.00   30635.90  28831.30       20969.20  3841.30 
>>>>>>> base
>>>>>>>            34282.00   30973.00  29345.80       21025.20  3964.10 
>>>>>>> v2.07
>>>>>>>            34282.00   31105.60  29174.30       21074.00  3931.30 
>>>>>>> v2.07_handshake
>>>>>>>            34282.00   30789.70  27151.60       19839.10  3850.20 
>>>>>>> v2.07_off
>>>>>>>
>>>>>>>     - The Aurora Perf comparison tool reports:
>>>>>>>
>>>>>>>         Comparison              max-jOPS critical-jOPS
>>>>>>>         ----------------------  -------------------- 
>>>>>>> --------------------
>>>>>>>         base vs 2.07            +1.78% (s, p=0.000)   +0.27% (ns, 
>>>>>>> p=0.790)
>>>>>>>         base vs 2.07_handshake  +1.19% (s, p=0.007)   +0.58% (ns, 
>>>>>>> p=0.536)
>>>>>>>         base vs 2.07_off        -5.83% (ns, p=0.394)  -5.39% (ns, 
>>>>>>> p=0.347)
>>>>>>>
>>>>>>>         (s) - significant  (ns) - not-significant
>>>>>>>
>>>>>>>     - For historical comparison, the Aurora Perf comparision tool
>>>>>>>         reported for v2.06 with a baseline of jdk-13+31:
>>>>>>>
>>>>>>>         Comparison              max-jOPS critical-jOPS
>>>>>>>         ----------------------  -------------------- 
>>>>>>> --------------------
>>>>>>>         base vs 2.06            -0.32% (ns, p=0.345)  +0.71% (ns, 
>>>>>>> p=0.646)
>>>>>>>         base vs 2.06_off        +0.49% (ns, p=0.292)  -1.21% (ns, 
>>>>>>> p=0.481)
>>>>>>>
>>>>>>>         (s) - significant  (ns) - not-significant
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> The Async Monitor Deflation project has rebased to JDK14 so it's 
>>>>>>>> time
>>>>>>>> for our first code review in that new context!!
>>>>>>>>
>>>>>>>> I've been focused on changing the monitor list management code 
>>>>>>>> to be
>>>>>>>> lock-free in order to make SPECjbb2015 happier. Of course with a 
>>>>>>>> change
>>>>>>>> like that, it takes a while to chase down all the new and wonderful
>>>>>>>> races. At this point, I have the code back to the same stability 
>>>>>>>> that
>>>>>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>>>>>
>>>>>>>> To lay the ground work for this round of review, I pushed the 
>>>>>>>> following
>>>>>>>> two fixes to jdk/jdk earlier today:
>>>>>>>>
>>>>>>>>     JDK-8230184 rename, whitespace, indent and comments changes 
>>>>>>>> in preparation
>>>>>>>>                 for lock free Monitor lists
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>>>>>
>>>>>>>>     JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>>>>>> after 8230184
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>>>>>
>>>>>>>> I have attached the list of fixes from CR5 to CR6 instead of 
>>>>>>>> putting
>>>>>>>> in the main body of this email.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>>>>>> JDK-8230184 and JDK-8230317.
>>>>>>>>
>>>>>>>> Here's the full webrev URL for those folks that want to see all 
>>>>>>>> of the
>>>>>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The primary focus of this review cycle is on the lock-free 
>>>>>>>> Monitor List
>>>>>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The secondary focus of this review cycle is on the bug fixes 
>>>>>>>> that have
>>>>>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for 
>>>>>>>> just that
>>>>>>>> patch (v2.06b):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The third and final bucket for this review cycle is the rename, 
>>>>>>>> whitespace,
>>>>>>>> indent and comments changes made in preparation for lock free 
>>>>>>>> Monitor list
>>>>>>>> management. Almost all of that was extracted into JDK-8230184 
>>>>>>>> for the
>>>>>>>> baseline so this bucket now has just a few comment changes 
>>>>>>>> relative to
>>>>>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Some folks might want to see just what has changed since the 
>>>>>>>> last review
>>>>>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Last, but not least, some folks might want to see the code 
>>>>>>>> before the
>>>>>>>> addition of lock-free Monitor List management so here's a webrev 
>>>>>>>> for
>>>>>>>> that (v2.00 -> v2.05):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> but that should only be changes to describe per-thread list 
>>>>>>>> async monitor
>>>>>>>> deflation being done by the ServiceThread.
>>>>>>>>
>>>>>>>> (I did update the OpenJDK wiki for the CR5 changes back on 
>>>>>>>> 2019.08.14)
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>> Oracle's usual set of platforms. It has also been through my 
>>>>>>>> usual set
>>>>>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>>> Performance lab
>>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This 
>>>>>>>> was using
>>>>>>>> this patch baselined on jdk-13+31 (for stability):
>>>>>>>>
>>>>>>>>           hbIR           hbIR
>>>>>>>>      (max attempted)  (settled)  max-jOPS  critical-jOPS runtime
>>>>>>>>      ---------------  ---------  --------  ------------- -------
>>>>>>>>             34282.00   28837.20  27905.20       19817.40 3658.10 
>>>>>>>> base
>>>>>>>>             34965.70   29798.80  27814.90       19959.00 3514.60 
>>>>>>>> v2.06d
>>>>>>>>             34282.00   29100.70  28042.50       19577.00 3701.90 
>>>>>>>> v2.06d_off
>>>>>>>>             34282.00   29218.50  27562.80       19397.30 3657.60 
>>>>>>>> v2.06d_ocache
>>>>>>>>             34965.70   29838.30  26512.40       19170.60 3569.90 
>>>>>>>> v2.05
>>>>>>>>             34282.00   28926.10  27734.00       19835.10 3588.40 
>>>>>>>> v2.05_off
>>>>>>>>
>>>>>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors 
>>>>>>>> specified and
>>>>>>>> the "ocache" config is with 128 byte cache line sizes instead of 
>>>>>>>> 64 byte
>>>>>>>> cache lines sizes. "v2.06d" is the last set of changes that I 
>>>>>>>> made before
>>>>>>>> those changes were distributed into the "v2.06a", "v2.06b" and 
>>>>>>>> "v2.06c"
>>>>>>>> buckets for this review recycle.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I've been focused on chasing down and fixing the rare test 
>>>>>>>>> failures
>>>>>>>>> that only pop up rarely. So this round is primarily fixes for 
>>>>>>>>> races
>>>>>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>>>>>> Thanks Karen!
>>>>>>>>>
>>>>>>>>> I have attached the list of fixes from CR4 to CR5 instead of 
>>>>>>>>> putting
>>>>>>>>> in the main body of this email.
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+29. This will 
>>>>>>>>> likely be
>>>>>>>>> the last JDK13 baseline for this project and I'll roll to the 
>>>>>>>>> JDK14
>>>>>>>>> (jdk/jdk) repo soon...
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>>>>>
>>>>>>>>> I have not yet checked the OpenJDK wiki to see if it needs any 
>>>>>>>>> updates
>>>>>>>>> to match the CR5 changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 
>>>>>>>>> 2019.06.26)
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now 
>>>>>>>>> and
>>>>>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>>>>>> available.
>>>>>>>>> Since I haven't made any performance changes in this round, 
>>>>>>>>> I'll only
>>>>>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>>>>>
>>>>>>>>> Next up:
>>>>>>>>>
>>>>>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>>>>>   Linux-X64 and we've determined that some of that comes from
>>>>>>>>>   contention on the gListLock. So I'm going to investigate 
>>>>>>>>> removing
>>>>>>>>>   the gListLock. Yes, another lock free set of changes is coming!
>>>>>>>>> - Of course, going lock free often causes new races and new 
>>>>>>>>> failures
>>>>>>>>>   so that's a good reason for make those changes isolated in their
>>>>>>>>>   own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>>>>>> - I finally have a potential fix for the Win* failure with
>>>>>>>>>     gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>   but I haven't run it through Mach5 yet so it'll be in the 
>>>>>>>>> next round.
>>>>>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm 
>>>>>>>>> seeing some
>>>>>>>>>   monitor related failures there. I suspect that I need to go 
>>>>>>>>> take a
>>>>>>>>>   look at the C2 RTM macro assembler code and look for things 
>>>>>>>>> that might
>>>>>>>>>   conflict if Async Monitor Deflation. If you're interested in 
>>>>>>>>> that kind
>>>>>>>>>   of issue, then see the macroAssembler_x86.cpp sanity check 
>>>>>>>>> that I
>>>>>>>>>   added in this round!
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I have a fix for an issue that came up during performance 
>>>>>>>>>> testing.
>>>>>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>>>>>> experiments.
>>>>>>>>>>
>>>>>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>>>>>> themselves are not that big.
>>>>>>>>>>
>>>>>>>>>> Functional:
>>>>>>>>>>   - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>>>>>     ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>>>>>     ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>>>>>     - is_safepoint_deflation_needed() returns the result of
>>>>>>>>>>       monitors_used_above_threshold() for safepoint based
>>>>>>>>>>    ��  monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>>>>>     - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>>>>>       there is a special deflation request, e.g., System.gc()
>>>>>>>>>>       - This solves a bug where there are a bunch of Cleanup
>>>>>>>>>>         safepoints that simply request async deflation which
>>>>>>>>>>         keeps the async JavaThreads from making progress on
>>>>>>>>>>         their async deflation work.
>>>>>>>>>>   - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>>>>>       Async deflate idle monitors every so many milliseconds when
>>>>>>>>>>       MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>>>>>   - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() 
>>>>>>>>>> with
>>>>>>>>>>     ObjectSynchronizer::is_async_deflation_needed():
>>>>>>>>>>     - is_async_deflation_needed() returns true when
>>>>>>>>>>       is_async_cleanup_requested() is true or when
>>>>>>>>>>       monitors_used_above_threshold() is true (but no more 
>>>>>>>>>> often than
>>>>>>>>>>       AsyncDeflationInterval).
>>>>>>>>>>     - if AsyncDeflateIdleMonitors Service_lock->wait() now 
>>>>>>>>>> waits for
>>>>>>>>>>       at most GuaranteedSafepointInterval millis:
>>>>>>>>>>       - This allows is_async_deflation_needed() to be checked at
>>>>>>>>>>         the same interval as GuaranteedSafepointInterval.
>>>>>>>>>>         (default is 1000 millis/1 second)
>>>>>>>>>>       - Once is_async_deflation_needed() has returned true, it
>>>>>>>>>>         generally cannot return true for AsyncDeflationInterval.
>>>>>>>>>>         This is to prevent async deflation from swamping the
>>>>>>>>>>         ServiceThread.
>>>>>>>>>>   - The ServiceThread still handles async deflation of the global
>>>>>>>>>>     in-use list and now it also marks JavaThreads for async 
>>>>>>>>>> deflation
>>>>>>>>>>     of their in-use lists.
>>>>>>>>>>     - The ServiceThread will check for async deflation work every
>>>>>>>>>>       GuaranteedSafepointInterval.
>>>>>>>>>>     - A safepoint can still cause the ServiceThread to check for
>>>>>>>>>>       async deflation work via is_async_deflation_requested.
>>>>>>>>>>   - Refactor code from ObjectSynchronizer::is_cleanup_needed() 
>>>>>>>>>> into
>>>>>>>>>>     monitors_used_above_threshold() and remove 
>>>>>>>>>> is_cleanup_needed().
>>>>>>>>>>   - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>>>>>     VMThread safepoint now set the is_special_deflation_requested
>>>>>>>>>>     flag to reduce the in-use monitor population that is 
>>>>>>>>>> reported by
>>>>>>>>>>     ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>>>>>
>>>>>>>>>> Test update:
>>>>>>>>>>   - test/hotspot/gtest/oops/test_markOop.cpp is updated to 
>>>>>>>>>> work with
>>>>>>>>>>     AsyncDeflateIdleMonitors.
>>>>>>>>>>
>>>>>>>>>> Collateral:
>>>>>>>>>>   - Add/clarify/update some logging messages.
>>>>>>>>>>
>>>>>>>>>> Cleanup:
>>>>>>>>>>   - Updated comments based on Karen's code review.
>>>>>>>>>>   - Change 'special cleanup' -> 'special deflation' and
>>>>>>>>>>     'async cleanup' -> 'async deflation'.
>>>>>>>>>>     - comment and function name changes
>>>>>>>>>>   - Clarify MonitorUsedDeflationThreshold description;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Main bug URL:
>>>>>>>>>>
>>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>>>>>
>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The wiki doesn't say a whole lot about the async deflation 
>>>>>>>>>> invocation
>>>>>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>>>>>> running now. Kitchensink8H on product, fastdebug, and 
>>>>>>>>>> slowdebug bits
>>>>>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have 
>>>>>>>>>> to run
>>>>>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This 
>>>>>>>>>>> race was
>>>>>>>>>>> theoretical and I had no test failures due to it. The fix is 
>>>>>>>>>>> pretty
>>>>>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>>>>>
>>>>>>>>>>> During those discussions Karen also floated the idea of using 
>>>>>>>>>>> the
>>>>>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up 
>>>>>>>>>>> that
>>>>>>>>>>> change and I have run it through the usual stress and Mach5 
>>>>>>>>>>> testing
>>>>>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>>>>>> Sorry for all the names...
>>>>>>>>>>>
>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>
>>>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>
>>>>>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>>>>>
>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>>> testing on
>>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit 
>>>>>>>>>>> run had
>>>>>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug 
>>>>>>>>>>> bits
>>>>>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>>>>>> Solaris-X64 release had the usual "Too large time diff" 
>>>>>>>>>>> complaints.
>>>>>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug 
>>>>>>>>>>> bits on
>>>>>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>>>>>> stress kit is running right now.
>>>>>>>>>>>
>>>>>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to 
>>>>>>>>>>> gather
>>>>>>>>>>> the results and analyze them.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>>>>>> Deflation
>>>>>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>>>>>> with the
>>>>>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev 
>>>>>>>>>>>> URLs). Sorry
>>>>>>>>>>>> for all the names...
>>>>>>>>>>>>
>>>>>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that 
>>>>>>>>>>>> baseline patch
>>>>>>>>>>>> is out of our hair.
>>>>>>>>>>>>
>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>
>>>>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>
>>>>>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 
>>>>>>>>>>>> changes:
>>>>>>>>>>>>
>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] 
>>>>>>>>>>>> testing on
>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running 
>>>>>>>>>>>> now.
>>>>>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H 
>>>>>>>>>>>> is running
>>>>>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, 
>>>>>>>>>>>> MacOSX
>>>>>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on 
>>>>>>>>>>>> product,
>>>>>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and 
>>>>>>>>>>>> Solaris-X64.
>>>>>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday 
>>>>>>>>>>>> (after
>>>>>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>>>>>
>>>>>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>>>>>> testing is done.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>>>>>> ready to
>>>>>>>>>>>>> go. It's also known as v2.01 (for those for with the 
>>>>>>>>>>>>> patches) and as
>>>>>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for 
>>>>>>>>>>>>> all the
>>>>>>>>>>>>> names...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>
>>>>>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>>>>>> Deflation project
>>>>>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>>>>>
>>>>>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the webrev for the latest baseline changes 
>>>>>>>>>>>>> (JDK-8222295):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the 
>>>>>>>>>>>>> latest version
>>>>>>>>>>>>> of JDK-8153224...
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR 
>>>>>>>>>>>>> changes:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] 
>>>>>>>>>>>>> testing on
>>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running 
>>>>>>>>>>>>> now and
>>>>>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>>>>>> Solaris-X64
>>>>>>>>>>>>> is running now. Linux-X64 stress testing will start on 
>>>>>>>>>>>>> Sunday. I'm
>>>>>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my 
>>>>>>>>>>>>> monitor
>>>>>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or 
>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Welcome to the OpenJDK review thread for my port of 
>>>>>>>>>>>>>> Carsten's work on:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/ 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>>>>>> rounds of
>>>>>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, 
>>>>>>>>>>>>>> Robbin, and
>>>>>>>>>>>>>> Roman for their preliminary code review comments. A very 
>>>>>>>>>>>>>> special
>>>>>>>>>>>>>> thanks to Robbin and Roman for building and testing the 
>>>>>>>>>>>>>> patch in
>>>>>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>>>>>> testing on
>>>>>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have 
>>>>>>>>>>>>>> been run
>>>>>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>>>>>> Kitchensink
>>>>>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>>>>>> (product, fastdebug
>>>>>>>>>>>>>> and slowdebug). Earlier versions have run my monitor 
>>>>>>>>>>>>>> inflation stress
>>>>>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>>>>>> (product,
>>>>>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All of the testing done on earlier versions will be redone 
>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>> latest version of the patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or 
>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> P.S.
>>>>>>>>>>>>>> One subtest in 
>>>>>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've 
>>>>>>>>>>>>>> been trying
>>>>>>>>>>>>>> to characterize/analyze this failure for more than a week 
>>>>>>>>>>>>>> now. At
>>>>>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>>>>>> aggravating
>>>>>>>>>>>>>> an existing bug. However, I plan to have a better handle 
>>>>>>>>>>>>>> on that
>>>>>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>