RFR(L) 8153224 Monitor deflation prolong safepoints (CR4/v2.04/7-for-jdk13)

Karen Kinnear karen.kinnear at oracle.com
Mon Jun 3 03:20:44 UTC 2019


Thank you for all the explanations.

thanks,
Karen

> On May 31, 2019, at 4:17 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
> 
> Hi Karen!
> 
> I see you were bored on a Friday afternoon! :-)
> 
> 
> On 5/31/19 1:31 PM, Karen Kinnear wrote:
>> Dan,
>> 
>> Looks good.
> 
> Thanks!
> 
> 
>> Thank you for the update - it helps a great deal to have the Functional list of changes!
> 
> I did it for CR0 -> CR1 and then forgot to include the CR1 -> CR2 and
> CR2 -> CR3 notes in those CR requests... hard to remember all these
> things that I'm trying to do to keep such a crazy project organized...
> I do have the CR1 -> CR2 and CR2 -> CR3 notes in my archive. I just
> never sent them to the alias...
> 
>> Very glad you are doing performance testing and stress testing.
> 
> Me too! I wish SPECjbb2015 was more stable/predictable/something,
> but it is what it is... Thank god for Robbin's help with it...
> I got some initial comments from Claes, but as far as I know, he
> never took the bits for a spin... Probably don't need that now
> that Robbin is pitching in...
> 
>> 1. So why are we spending time to deflate at VM_Exit VM op/final VMThread safepoint?
>> I get that our logging will more accurately reflect monitor use. However, our customers
>> will pay cpu cycles and elapsed time for this work on exit. Would it make sense to
>> only do this if the logging is enabled?
> 
> I thought about that... With safepoint based deflation we get that
> final cleanup with the final safepoint so I made the async deflation
> code do the same (apples vs. apples). However, I could change it so
> that we only do it when we're logging.
> 
> 
>> 2. ObjectSynchronizer::do_safepoint_work
>> This has a long helpful comment - which is all about !AsyncDeflateIdleMonitors.
>> Would it be worth adding a paragraph about AsyncDeflateIdleMonitors?
> 
> Good point! In the next patch, I've taken a pass at making MonitorBound
> work for async deflation and I went ahead and mostly added comments in
> that area. I strongly resisted the urge to make baseline comment changes...
> 
> I could do a paragraph in the do_safepoint_work() header comment to talk
> about async deflation.
> 
> 
>> 3. serviceThread.cpp lines 223-226
>> If count > 0, log: "requesting async deflation …"
>> line 226 sets set_is_async_deflation_requested(false) ? // async deflation has been requested
>> 
>> Is the point here - that the request has been honored, so you are turning it off now?
>> If so, could you possibly clarify the comment on line 226, e.g. // async deflation request has been processed
> 
> I rewrote that comment a couple of times. Perhaps this:
> 
>     // The ServiceThread's async deflation request has been processed.
>     ObjectSynchronizer::set_is_async_deflation_requested(false);
> 
>     // The global in-use list was handled above, but the request won't
>     // be complete until the JavaThreads have handled their in-use
>     // lists. This is the nature of an async deflation request.
>   }
Thank you - that is clearer to me at least.
> 
> 
>> 4. Why did you add marking the per-thread omShouldDeflateIdleMonitors to the ServiceThread?
>> Is this to cover the situation in which we don’t have frequent enough safe points to trigger the
>> per-thread deflation?
> 
> Exactly. We changed SafepointSynchronize::is_cleanup_needed() to
> use ObjectSynchronizer::is_safepoint_deflation_needed() which
> only returns true when AsyncDeflateIdleMonitors is enabled if a
> special deflation is requested. When !AsyncDeflateIdleMonitors,
> is_safepoint_deflation_needed() works just like the old
> is_cleanup_needed().
> 
> The old function ObjectSynchronizer::is_cleanup_needed() returned
> true whenever monitors_used_above_threshold() returned true and this
> resulted in a 'Cleanup' safepoint. With AsyncDeflateIdleMonitors,
> once we went above the 90% in-use threshold, we kept doing Cleanup
> safepoints until the async deflation mechanisms caught up. Not a
> good thing!
> 
> I had observed the higher Cleanup safepoint count in my earlier
> testing and thought it was due to longer lived objects. Robbin
> investigated and found the real reason. Thanks Robbin! (again)
> 
> So since we now have fewer Cleanup safepoints, we needed a way
> to trigger periodic async monitor deflation. I also put a limit
> on how frequently the ServiceThread would do the work so that we
> wouldn't swamp the ServiceThread with an inflation/deflation
> storm...
> 
>> Or do not all GCS use the ParallelSPCleanupThreadClosure?
> 
> I think they all do. That part of the safepoint cleanup system
> is setup to work with a single thread (VMThread) or with the
> VMThread and N worker threads. So I think it works for all GCs.
> 
> 
> Thanks for reviewing! (again, and again, and...)
> 
> Dan
> 
> 
>> 
>> thanks,
>> Karen
>> 
>>> On May 26, 2019, at 8:30 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
>>> 
>>> Greetings,
>>> 
>>> I have a fix for an issue that came up during performance testing.
>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>> experiments.
>>> 
>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>> verbose due to the complexity of the issue, but the changes
>>> themselves are not that big.
>>> 
>>> Functional:
>>>   - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>     ObjectSynchronizer::is_cleanup_needed() to calling
>>>     ObjectSynchronizer::is_safepoint_deflation_needed():
>>>     - is_safepoint_deflation_needed() returns the result of
>>>       monitors_used_above_threshold() for safepoint based
>>>       monitor deflation (!AsyncDeflateIdleMonitors).
>>>     - For AsyncDeflateIdleMonitors, it only returns true if
>>>       there is a special deflation request, e.g., System.gc()
>>>       - This solves a bug where there are a bunch of Cleanup
>>>         safepoints that simply request async deflation which
>>>         keeps the async JavaThreads from making progress on
>>>         their async deflation work.
>>>   - Add AsyncDeflationInterval diagnostic option. Description:
>>>       Async deflate idle monitors every so many milliseconds when
>>>       MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>   - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>     ObjectSynchronizer::is_async_deflation_needed():
>>>     - is_async_deflation_needed() returns true when
>>>       is_async_cleanup_requested() is true or when
>>>       monitors_used_above_threshold() is true (but no more often than
>>>       AsyncDeflationInterval).
>>>     - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>       at most GuaranteedSafepointInterval millis:
>>>       - This allows is_async_deflation_needed() to be checked at
>>>         the same interval as GuaranteedSafepointInterval.
>>>         (default is 1000 millis/1 second)
>>>       - Once is_async_deflation_needed() has returned true, it
>>>         generally cannot return true for AsyncDeflationInterval.
>>>         This is to prevent async deflation from swamping the
>>>         ServiceThread.
>>>   - The ServiceThread still handles async deflation of the global
>>>     in-use list and now it also marks JavaThreads for async deflation
>>>     of their in-use lists.
>>>     - The ServiceThread will check for async deflation work every
>>>       GuaranteedSafepointInterval.
>>>     - A safepoint can still cause the ServiceThread to check for
>>>       async deflation work via is_async_deflation_requested.
>>>   - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>     monitors_used_above_threshold() and remove is_cleanup_needed().
>>>   - In addition to System.gc(), the VM_Exit VM op and the final
>>>     VMThread safepoint now set the is_special_deflation_requested
>>>     flag to reduce the in-use monitor population that is reported by
>>>     ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>> 
>>> Test update:
>>>   - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>     AsyncDeflateIdleMonitors.
>>> 
>>> Collateral:
>>>   - Add/clarify/update some logging messages.
>>> 
>>> Cleanup:
>>>   - Updated comments based on Karen's code review.
>>>   - Change 'special cleanup' -> 'special deflation' and
>>>     'async cleanup' -> 'async deflation'.
>>>     - comment and function name changes
>>>   - Clarify MonitorUsedDeflationThreshold description;
>>> 
>>> 
>>> Main bug URL:
>>> 
>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>> 
>>> The project is currently baselined on jdk-13+22.
>>> 
>>> Here's the full webrev URL:
>>> 
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>> 
>>> Here's the incremental webrev URL:
>>> 
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>> 
>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>> 
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>> 
>>> The wiki doesn't say a whole lot about the async deflation invocation
>>> mechanism so I have to figure out how to add that content.
>>> 
>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>> 
>>> Thanks, in advance, for any questions, comments or suggestions.
>>> 
>>> Dan
>>> 
>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>> 
>>>> I had some discussions with Karen about a race that was in the
>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>> simple: remove the special case code for async deflation in the
>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>> for ObjectMonitor::enter() protection.
>>>> 
>>>> During those discussions Karen also floated the idea of using the
>>>> ref_count field instead of the contentions field for the Async
>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>> change and I have run it through the usual stress and Mach5 testing
>>>> with no issues. It's also known as v2.03 (for those for with the
>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>> Sorry for all the names...
>>>> 
>>>> Main bug URL:
>>>> 
>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>> 
>>>> The project is currently baselined on jdk-13+18.
>>>> 
>>>> Here's the full webrev URL:
>>>> 
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>> 
>>>> Here's the incremental webrev URL:
>>>> 
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>> 
>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>> 
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>> 
>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>> stress kit is running right now.
>>>> 
>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>> the results and analyze them.
>>>> 
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>> 
>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>> project ready to go. It's also known as v2.02 (for those for with the
>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>>>> for all the names...
>>>>> 
>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>>>> is out of our hair.
>>>>> 
>>>>> Main bug URL:
>>>>> 
>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>> 
>>>>> The project is currently baselined on jdk-13+17.
>>>>> 
>>>>> Here's the full webrev URL:
>>>>> 
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>> 
>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>> 
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>> 
>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>> 
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>> 
>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>> my jdk-13+18 stress run is done).
>>>>> 
>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>> testing is done.
>>>>> 
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>> 
>>>>> Dan
>>>>> 
>>>>> 
>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>> 
>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>> names...
>>>>>> 
>>>>>> Main bug URL:
>>>>>> 
>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>> 
>>>>>> Baseline bug fixes URL:
>>>>>> 
>>>>>>     JDK-8222295 more baseline cleanups from Async Monitor Deflation project
>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>> 
>>>>>> The project is currently baselined on jdk-13+15.
>>>>>> 
>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>>>> 
>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>> 
>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>> 
>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest version
>>>>>> of JDK-8153224...
>>>>>> 
>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>> 
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>> 
>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>> 
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>> 
>>>>>> Dan
>>>>>> 
>>>>>> 
>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>> 
>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>>> 
>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>> 
>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>> 
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>> 
>>>>>>> Here's the webrev URL:
>>>>>>> 
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>> 
>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>> 
>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>> 
>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>> their own environments (including specJBB2015).
>>>>>>> 
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>> fastdebug and slowdebug).
>>>>>>> 
>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>> latest version of the patch.
>>>>>>> 
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>> 
>>>>>>> Dan
>>>>>>> 
>>>>>>> P.S.
>>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
> 



More information about the hotspot-runtime-dev mailing list