RFR(L) 8153224 Monitor deflation prolong safepoints (CR8/v2.08/11-for-jdk14)

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Nov 8 14:10:22 UTC 2019


Robbin,

Thanks for doing such a thorough crawl thru review! I very much appreciate
the feedback. It will take a bit of time to go thru and address all of
these comments which I'll do in another reply.

So this is just an ACK that I've gotten the review email... :-)

Dan


On 11/8/19 8:35 AM, Robbin Ehn wrote:
> Hi Dan,
>
> Thanks for looking into this, some comments on v8:
>
> ##################
> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
> src/hotspot/share/logging/logTag.hpp
> src/hotspot/share/oops/markWord.hpp
> src/hotspot/share/runtime/basicLock.cpp
> src/hotspot/share/runtime/safepoint.cpp
> src/hotspot/share/runtime/serviceThread.cpp
> src/hotspot/share/runtime/sharedRuntime.cpp
> src/hotspot/share/runtime/synchronizer.hpp
> src/hotspot/share/runtime/vmOperations.cpp
> src/hotspot/share/runtime/vmOperations.hpp
> src/hotspot/share/runtime/vmStructs.cpp
> src/hotspot/share/runtime/vmThread.cpp
> test/hotspot/gtest/oops/test_markWord.cpp
>
> No comments.
>
> ##################
> I don't see the benefit of having the 
> -HandshakeAfterDeflateIdleMonitors code paths.
> Removing that option would mean these files can be reverted:
> src/hotspot/cpu/aarch64/globals_aarch64.hpp
> src/hotspot/cpu/arm/globals_arm.hpp
> src/hotspot/cpu/ppc/globals_ppc.hpp
> src/hotspot/cpu/s390/globals_s390.hpp
> src/hotspot/cpu/sparc/globals_sparc.hpp
> src/hotspot/cpu/x86/globals_x86.hpp
> src/hotspot/cpu/x86/macroAssembler_x86.cpp
> src/hotspot/cpu/x86/macroAssembler_x86.hpp
> src/hotspot/cpu/zero/globals_zero.hpp
>
> And one less option here:
> src/hotspot/share/runtime/globals.hpp
>
> ##################
> src/hotspot/share/prims/jvm.cpp
>
> Unclear if this is a good idea.
>
> ##################
> src/hotspot/share/prims/whitebox.cpp
>
> This would assume the test expects the right thing, but that is not 
> obvious.
>
> ##################
> src/hotspot/share/prims/jvmtiEnvBase.cpp
>
> The current pending and waiting monitor is only changed by the 
> JavaThread itself.
> It only sets it after _contentions is increased.
> It clears it before _contentions is decreased.
> We are depending on safepoint or the thread is suspended, so it can't 
> be deflated since _contentions are > 0.
> Plus the thread have already increased the ref count and can't 
> decrease it (since at safepoint or suspended).
>
> ##################
> src/hotspot/share/runtime/objectMonitor.cpp
>
> ###1
> You have several these (and in other files):
> 242   jint l_ref_count = ref_count();
> 243   ADIM_guarantee(l_ref_count > 0, "must be positive: 
> l_ref_count=%d, ref_count=%d", l_ref_count, ref_count());
> Please use Atomic::load() in ref_count.
> Since this is dependent on ref_count being volatile, otherwise the 
> compiler may only do one load.
>
> ###2
> 307   // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
> ...
> 311   Atomic::add(1, &_contentions);
> In ObjectSynchronizer::deflate_monitor if you would check ref count 
> instead of _contetion, we could remove contention.
> Since all waiters also have a ref count it looks like we don't need 
> waiters either.
> In ObjectSynchronizer::deflate_monitor:
> if (mid->_contentions != 0 || mid->_waiters != 0) {
> Why not just do:
> if (mid->ref_count()) {
> ?
>
> ##################
> src/hotspot/share/runtime/objectMonitor.hpp
>
> ###1
>  252   intptr_t is_busy() const {
>  253     // TODO-FIXME: assert _owner == null implies _recursions = 0
>  254     // We do not include _ref_count in the is_busy() check because
>  255     // _ref_count is for indicating that the ObjectMonitor* is in
>  256     // use which is orthogonal to whether the ObjectMonitor itself
>  257     // is in use for a locking operation.
>
> But in the non-debug code we always check:
> +  if (mid->is_busy() || mid->ref_count() != 0) {
>
> So it seem like you should have a method including ref count.
>
> ##################
> src/hotspot/share/runtime/objectMonitor.inline.hpp
>
> Use Atomic::load for ref count.
>
> ##################
> src/hotspot/share/runtime/synchronizer.cpp
>
> ###1
>  139 static volatile int g_om_free_count = 0;    // # on g_free_list
>  140 static volatile int g_om_in_use_count = 0;  // # on g_om_in_use_list
>  141 static volatile int g_om_population = 0;    // # Extant -- in 
> circulation
>  142 static volatile int g_om_wait_count = 0;    // # on g_wait_list
> No padding here, aren't they more contended than the fields in the OM?
>
> ###2
> 151 static bool is_next_marked(ObjectMonitor* om) {
>
> Is only used in ObjectSynchronizer::om_flush.
> Here you fetch a OM and read the next field, this do not need LA 
> semantics on supported platforms.
> This would only need Atomic::load.
>
> ###3
> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>
> In no place you need SR, in the only places it would made a difference:
>  345       OrderAccess::storestore();
>  346       set_next(cur, next);  // Unmark the previous list head.
> and
> 1714     OrderAccess::storestore();
> 1715     set_next(in_use_list, next);
>
> You have a storestore already!
>
> This code reads as:
> OrderAccess::storestore();
> OrderAccess::loadstore();
> OrderAccess::storestore();
> om->_next_om = value
>
> So it should be an Atomic::store.
>
> ###4
> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>
> Since the mark is an embedded spinlock I think the terminology should 
> be changed. (that the spinlock is inside a the next pointer should be 
> abstracted away)
> E.g. mark_next_loop would just be lock.
> The load of the list heads should use Atmoic:load.
> It also seem a bit wired to return next for the locking method.
> And output parameter can just be returned, and return NULL if list 
> head is NULL.
> E.g.
>
>  198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* 
> volatile * list_p) {
>  200   while (true) {
>  201     ObjectMonitor* mid = Atomic::load(list_p);
>  202     if (mid == NULL) {
>  203       return NULL;  // The list is empty.
>  204     }
>  205     if (try_lock(mid)) {
>  206       if (Atmoic::load(list_p) != mid) {
>  207         // The list head changed so we have to retry.
>  208         unlock(mid);
>  210       } else {
>              return mid;
>        }
>  214     }
>          // Yield ?
>  215   }
>  216 }
>
> With colleteral changes.
>
> ###5
> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
> Atomic::store is what needed.
>
> ###6
> 333 static void prepend_to_common(
>
>  345       OrderAccess::storestore();
>  346       set_next(cur, next);  // Unmark the previous list head.
> Double storestore. (fixed by changing set_next to Atomic::store)
>
> ###7
>  375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* 
> volatile * list_p,
>
> Triple storestore here.
>
>  386   Atomic::dec(count_p);
>  387   // mark_list_head() used cmpxchg() above, switching list head 
> can be lazier:
>  388   OrderAccess::storestore();
>  389   // Unmark take, but leave the next value for any lagging list
>  390   // walkers. It will get cleaned up when take is prepended to
>  391   // the in-use list:
>  392   set_next(take, next);
>  393   return take;
>
> Reads:
> count_p--
> OrderAccess::loadstore();
> OrderAccess::storestore();
> OrderAccess::storestore();
> OrderAccess::loadstore();
> OrderAccess::storestore();
> take->_next_om = next;
>
> Fixed by changing set_next to Atomic::store and removing the 
> OrderAccess::storestore();
>
> ###8
> ObjectSynchronizer::om_release(
>
> 1591       if (m == mid) {
> 1592         // We found 'm' on the per-thread in-use list so try to 
> extract it.
> 1593         if (cur_mid_in_use == NULL) {
> 1594           // mid is the list head and it is marked. Switch the 
> list head
> 1595           // to next which unmarks the list head, but leaves mid 
> marked:
> 1596           self->om_in_use_list = next;
> 1597           // mark_list_head() used cmpxchg() above, switching 
> list head can be lazier:
> 1598           OrderAccess::storestore();
> 1599         } else {
> 1600           // mid and cur_mid_in_use are marked. Switch 
> cur_mid_in_use's
> 1601           // next field to next which unmarks cur_mid_in_use, but 
> leaves
> 1602           // mid marked:
> 1603 OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
> 1604         }
> 1605         extracted = true;
> 1606         Atomic::dec(&self->om_in_use_count);
> 1607         // Unmark mid, but leave the next value for any lagging list
> 1608         // walkers. It will get cleaned up when mid is prepended to
> 1609         // the thread's free list:
> 1610         set_next(mid, next);
> 1611         break;
> 1612       }
>
> This does not look correct. Before taking this branch we have done a 
> cmpxchg in mark_list_head or mark_next_loop.
> This is how it reads:
> OrderAccess::storestore(); // from previous cmpxchg
> OrderAccess::loadstore(); // from previous cmpxchg
> 1591       if (m == mid) {
> 1593         if (cur_mid_in_use == NULL) {
> 1596           self->om_in_use_list = next;
> 1598           OrderAccess::storestore();
> 1599         } else {
>                OrderAccess::storestore();
>                OrderAccess::loadstore();
> 1603           cur_mid_in_use->_next_om = next;
> 1604         }
> 1605         extracted = true;
>              OrderAccess::storestore();
>              OrderAccess::fence(); // 
> storestore|storeload|loadstore|loadload
>          self->om_in_use_count--; // Atomic::dec
>              OrderAccess::storestore();
>              OrderAccess::loadstore();
>              OrderAccess::storestore();
>              OrderAccess::loadstore();
>          mid->_next_om = next; // Atomic::store
> 1611         break;
> 1612       }
>
> extracted is local variable so you so not need any orderaccess before 
> it set.
> Fixed by changing set_next to Atomic::store, removing the 
> OrderAccess::storestore() and changing OrderAccess::release_store to 
> Atmoic::store();
>
> ###9
> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>
> 1714     OrderAccess::storestore();
> 1715     set_next(in_use_list, next);
> Fixed by changing set_next to Atomic::store.
>
> ###10
> 1737     self->om_free_list = NULL;
> 1738     OrderAccess::storestore();  // Lazier memory is okay for list 
> walkers.
>
> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does 
> first thing cmpxchg so there is no need for this storestore.
>
> ###11
> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, 
> Thread* self,
>
> 1938       // Once ObjectMonitor is configured and the object is 
> associated
> 1939       // with the ObjectMonitor, it is safe to allow async 
> deflation:
> 1940       assert(m->is_new(), "freshly allocated monitor must be new");
> 1941       m->set_allocation_state(ObjectMonitor::Old);
>
> So we use ref count, contention, waiter, owner and allocation state to 
> keep OM alive in different scenarios.
> There is not way for me to keep track of that. I don't see why you 
> would need more than owner and ref count.
> If you allocate the om with ref count 1 you can remove 
> _allocation_state and just decrease ref count here instead.
>
> ###12
> 2079 bool ObjectSynchronizer::deflate_monitor
>
> 2112     if (AsyncDeflateIdleMonitors) {
> 2113       // clear() expects the owner field to be NULL and we won't 
> race
> 2114       // with the simple C2 ObjectMonitor
>
> The macro assambler code is not just executed by C2, so this comment 
> is a bit misleading. (there are some more also)
>
> ###13
> 2306 int ObjectSynchronizer::deflate_monitor_list(
>
> Same issue as ObjectSynchronizer::om_release.
> Fixed by changing set_next to Atomic::store, removing the 
> OrderAccess::storestore() and changing OrderAccess::release_store to 
> Atmoic::store();
>
> ###14
> 2474       if (SafepointSynchronize::is_synchronizing() &&
>
> This is the wrong method to call, it should 
> SafepointMechanism::should_block(Thread* thread);
>
> ###15
> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>
> 2616     g_wait_list = NULL;
> 2617     OrderAccess::storestore();  // Lazier memory sync is okay for 
> list walkers.
>
> I don't see that g_wait_list is ever simutainously read.
> Either it is accessed by serviceThread outside a safepoint or by 
> VMThread inside a safepoint?
>
> It looks like g_wait_list can just be a local in:
> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>
> (disregarding the debug code that might read it in a safepoint)
>
> ###16
> 2722         assert(SafepointSynchronize::is_synchronizing(), "sanity 
> check");
>
> This is the wrong method to call, it should 
> SafepointMechanism::should_block(Thread* thread);
>
> ##################
> src/hotspot/share/runtime/vframe.cpp
>
> We are at safepoint or current thread or in a handshake, current 
> pending and waiting monitor is already stable.
>
> ##################
> src/hotspot/share/services/threadService.cpp
>
> These changes are only needed for the 
> -HandshakeAfterDeflateIdleMonitors path.
>
> ##################
> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java 
>
>
> Note: if OM had a weak to object instead this would not be needed.
>
> Thanks, Robbin
>
>
> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have made changes to the Async Monitor Deflation code in response to
>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>> and Erik O. for their comments!
>>
>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async 
>> Monitor
>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>> for sufficient bake time for such a big change. Nov. 12 is _next_ 
>> Tuesday
>> so we have 8 days from today to finish this code review cycle and push
>> this code for JDK14.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the change list from CR7 to CR8 instead of putting it in
>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>> file to the webrevs so it should be easy to find.
>>
>> Main bug URL:
>>
>>      JDK-8153224 Monitor deflation prolong safepoints
>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+21.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.08 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full 
>>
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.08 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/ 
>>
>>
>> The OpenJDK wiki did not need any changes for this round:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab. Some testing is still running, but so far there are no new 
>> regressions.
>>
>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 
>> bits.
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project is reaching the end game. I have no
>>> changes planned for the project at this time so all that is left is 
>>> code
>>> review and any changes that results from those reviews.
>>>
>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>> reviews.
>>>
>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+19.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>
>>>
>>> Some folks might want to see just what has changed since the last 
>>> review
>>> cycle so here's a webrev for that (v2.07 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>
>>>
>>> The OpenJDK wiki has been updated to match the 
>>> CR7/v2.07/10-for-jdk14 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-8]
>>> testing on Oracle's usual set of platforms. It has also been through 
>>> my usual
>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>> addition
>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>> tests in
>>> my lab.
>>>
>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-3]
>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>> process.
>>>
>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>> Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>
>>>     - "base" is jdk-14+18
>>>     - "v2.07" is the latest version and includes C2 
>>> inc_om_ref_count() support
>>>       on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>>     - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>     - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>> specified
>>>
>>>          hbIR           hbIR
>>>     (max attempted)  (settled)  max-jOPS  critical-jOPS runtime
>>>     ---------------  ---------  --------  ------------- -------
>>>            34282.00   30635.90  28831.30       20969.20 3841.30 base
>>>            34282.00   30973.00  29345.80       21025.20 3964.10 v2.07
>>>            34282.00   31105.60  29174.30       21074.00 3931.30 
>>> v2.07_handshake
>>>            34282.00   30789.70  27151.60       19839.10 3850.20 
>>> v2.07_off
>>>
>>>     - The Aurora Perf comparison tool reports:
>>>
>>>         Comparison              max-jOPS critical-jOPS
>>>         ----------------------  -------------------- 
>>> --------------------
>>>         base vs 2.07            +1.78% (s, p=0.000)   +0.27% (ns, 
>>> p=0.790)
>>>         base vs 2.07_handshake  +1.19% (s, p=0.007)   +0.58% (ns, 
>>> p=0.536)
>>>         base vs 2.07_off        -5.83% (ns, p=0.394)  -5.39% (ns, 
>>> p=0.347)
>>>
>>>         (s) - significant  (ns) - not-significant
>>>
>>>     - For historical comparison, the Aurora Perf comparision tool
>>>         reported for v2.06 with a baseline of jdk-13+31:
>>>
>>>         Comparison              max-jOPS critical-jOPS
>>>         ----------------------  -------------------- 
>>> --------------------
>>>         base vs 2.06            -0.32% (ns, p=0.345)  +0.71% (ns, 
>>> p=0.646)
>>>         base vs 2.06_off        +0.49% (ns, p=0.292)  -1.21% (ns, 
>>> p=0.481)
>>>
>>>         (s) - significant  (ns) - not-significant
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>> for our first code review in that new context!!
>>>>
>>>> I've been focused on changing the monitor list management code to be
>>>> lock-free in order to make SPECjbb2015 happier. Of course with a 
>>>> change
>>>> like that, it takes a while to chase down all the new and wonderful
>>>> races. At this point, I have the code back to the same stability that
>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>
>>>> To lay the ground work for this round of review, I pushed the 
>>>> following
>>>> two fixes to jdk/jdk earlier today:
>>>>
>>>>     JDK-8230184 rename, whitespace, indent and comments changes in 
>>>> preparation
>>>>                 for lock free Monitor lists
>>>>     https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>
>>>>     JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>> after 8230184
>>>>     https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>
>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>> JDK-8230184 and JDK-8230317.
>>>>
>>>> Here's the full webrev URL for those folks that want to see all of the
>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>
>>>>
>>>>
>>>> The primary focus of this review cycle is on the lock-free Monitor 
>>>> List
>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>
>>>>
>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>> patch (v2.06b):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>
>>>>
>>>> The third and final bucket for this review cycle is the rename, 
>>>> whitespace,
>>>> indent and comments changes made in preparation for lock free 
>>>> Monitor list
>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>> baseline so this bucket now has just a few comment changes relative to
>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>
>>>>
>>>>
>>>> Some folks might want to see just what has changed since the last 
>>>> review
>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>
>>>>
>>>>
>>>> Last, but not least, some folks might want to see the code before the
>>>> addition of lock-free Monitor List management so here's a webrev for
>>>> that (v2.00 -> v2.05):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>
>>>>
>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> but that should only be changes to describe per-thread list async 
>>>> monitor
>>>> deflation being done by the ServiceThread.
>>>>
>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>
>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance 
>>>> lab
>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>>> using
>>>> this patch baselined on jdk-13+31 (for stability):
>>>>
>>>>           hbIR           hbIR
>>>>      (max attempted)  (settled)  max-jOPS  critical-jOPS runtime
>>>>      ---------------  ---------  --------  ------------- -------
>>>>             34282.00   28837.20  27905.20       19817.40 3658.10 base
>>>>             34965.70   29798.80  27814.90       19959.00 3514.60 
>>>> v2.06d
>>>>             34282.00   29100.70  28042.50       19577.00 3701.90 
>>>> v2.06d_off
>>>>             34282.00   29218.50  27562.80       19397.30 3657.60 
>>>> v2.06d_ocache
>>>>             34965.70   29838.30  26512.40       19170.60 3569.90 v2.05
>>>>             34282.00   28926.10  27734.00       19835.10 3588.40 
>>>> v2.05_off
>>>>
>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 
>>>> byte
>>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>>> before
>>>> those changes were distributed into the "v2.06a", "v2.06b" and 
>>>> "v2.06c"
>>>> buckets for this review recycle.
>>>>
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>> Thanks Karen!
>>>>>
>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>> in the main body of this email.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>> (jdk/jdk) repo soon...
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>
>>>>> I have not yet checked the OpenJDK wiki to see if it needs any 
>>>>> updates
>>>>> to match the CR5 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 
>>>>> 2019.06.26)
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>> available.
>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>
>>>>> Next up:
>>>>>
>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>   Linux-X64 and we've determined that some of that comes from
>>>>>   contention on the gListLock. So I'm going to investigate removing
>>>>>   the gListLock. Yes, another lock free set of changes is coming!
>>>>> - Of course, going lock free often causes new races and new failures
>>>>>   so that's a good reason for make those changes isolated in their
>>>>>   own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>> - I finally have a potential fix for the Win* failure with
>>>>>     gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>   but I haven't run it through Mach5 yet so it'll be in the next 
>>>>> round.
>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing 
>>>>> some
>>>>>   monitor related failures there. I suspect that I need to go take a
>>>>>   look at the C2 RTM macro assembler code and look for things that 
>>>>> might
>>>>>   conflict if Async Monitor Deflation. If you're interested in 
>>>>> that kind
>>>>>   of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>>   added in this round!
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>> experiments.
>>>>>>
>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>> themselves are not that big.
>>>>>>
>>>>>> Functional:
>>>>>>   - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>     ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>     ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>     - is_safepoint_deflation_needed() returns the result of
>>>>>>       monitors_used_above_threshold() for safepoint based
>>>>>>       monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>     - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>       there is a special deflation request, e.g., System.gc()
>>>>>>       - This solves a bug where there are a bunch of Cleanup
>>>>>>         safepoints that simply request async deflation which
>>>>>>         keeps the async JavaThreads from making progress on
>>>>>>         their async deflation work.
>>>>>>   - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>       Async deflate idle monitors every so many milliseconds when
>>>>>>       MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>   - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>>     ObjectSynchronizer::is_async_deflation_needed():
>>>>>>     - is_async_deflation_needed() returns true when
>>>>>>       is_async_cleanup_requested() is true or when
>>>>>>       monitors_used_above_threshold() is true (but no more often 
>>>>>> than
>>>>>>       AsyncDeflationInterval).
>>>>>>     - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>>       at most GuaranteedSafepointInterval millis:
>>>>>>       - This allows is_async_deflation_needed() to be checked at
>>>>>>         the same interval as GuaranteedSafepointInterval.
>>>>>>         (default is 1000 millis/1 second)
>>>>>>       - Once is_async_deflation_needed() has returned true, it
>>>>>>         generally cannot return true for AsyncDeflationInterval.
>>>>>>         This is to prevent async deflation from swamping the
>>>>>>         ServiceThread.
>>>>>>   - The ServiceThread still handles async deflation of the global
>>>>>>     in-use list and now it also marks JavaThreads for async 
>>>>>> deflation
>>>>>>     of their in-use lists.
>>>>>>     - The ServiceThread will check for async deflation work every
>>>>>>       GuaranteedSafepointInterval.
>>>>>>     - A safepoint can still cause the ServiceThread to check for
>>>>>>       async deflation work via is_async_deflation_requested.
>>>>>>   - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>>     monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>>   - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>     VMThread safepoint now set the is_special_deflation_requested
>>>>>>     flag to reduce the in-use monitor population that is reported by
>>>>>>     ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>
>>>>>> Test update:
>>>>>>   - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>>     AsyncDeflateIdleMonitors.
>>>>>>
>>>>>> Collateral:
>>>>>>   - Add/clarify/update some logging messages.
>>>>>>
>>>>>> Cleanup:
>>>>>>   - Updated comments based on Karen's code review.
>>>>>>   - Change 'special cleanup' -> 'special deflation' and
>>>>>>     'async cleanup' -> 'async deflation'.
>>>>>>     - comment and function name changes
>>>>>>   - Clarify MonitorUsedDeflationThreshold description;
>>>>>>
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>
>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>
>>>>>>
>>>>>> The wiki doesn't say a whole lot about the async deflation 
>>>>>> invocation
>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to 
>>>>>> run
>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>
>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>> Sorry for all the names...
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>
>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>> stress kit is running right now.
>>>>>>>
>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>> the results and analyze them.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>> Deflation
>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>> with the
>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev 
>>>>>>>> URLs). Sorry
>>>>>>>> for all the names...
>>>>>>>>
>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>>> patch
>>>>>>>> is out of our hair.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 
>>>>>>>> changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>>> running
>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>
>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>> testing is done.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>> ready to
>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>>> and as
>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all 
>>>>>>>>> the
>>>>>>>>> names...
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>
>>>>>>>>>     JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>> Deflation project
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>
>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>>> version
>>>>>>>>> of JDK-8153224...
>>>>>>>>>
>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR 
>>>>>>>>> changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] 
>>>>>>>>> testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running 
>>>>>>>>> now and
>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>> Solaris-X64
>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. 
>>>>>>>>> I'm
>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>>> work on:
>>>>>>>>>>
>>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>
>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>
>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>> rounds of
>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>> Kitchensink
>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>>> fastdebug
>>>>>>>>>> and slowdebug). Earlier versions have run my monitor 
>>>>>>>>>> inflation stress
>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>> (product,
>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>
>>>>>>>>>> All of the testing done on earlier versions will be redone on 
>>>>>>>>>> the
>>>>>>>>>> latest version of the patch.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> P.S.
>>>>>>>>>> One subtest in 
>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>>> trying
>>>>>>>>>> to characterize/analyze this failure for more than a week 
>>>>>>>>>> now. At
>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>> aggravating
>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list