RFR(L) 8153224 Monitor deflation prolong safepoints (CR4/v2.04/7-for-jdk13)

Karen Kinnear karen.kinnear at oracle.com
Fri May 31 17:31:42 UTC 2019


Dan,

Looks good.

Thank you for the update - it helps a great deal to have the Functional list of changes!
Very glad you are doing performance testing and stress testing.

1. So why are we spending time to deflate at VM_Exit VM op/final VMThread safepoint?
I get that our logging will more accurately reflect monitor use. However, our customers
will pay cpu cycles and elapsed time for this work on exit. Would it make sense to
only do this if the logging is enabled?

2. ObjectSynchronizer::do_safepoint_work
This has a long helpful comment - which is all about !AsyncDeflateIdleMonitors.
Would it be worth adding a paragraph about AsyncDeflateIdleMonitors?

3. serviceThread.cpp lines 223-226
If count > 0, log: "requesting async deflation …"
line 226 sets set_is_async_deflation_requested(false) ? // async deflation has been requested

Is the point here - that the request has been honored, so you are turning it off now?
If so, could you possibly clarify the comment on line 226, e.g. // async deflation request has been processed

4. Why did you add marking the per-thread omShouldDeflateIdleMonitors to the ServiceThread?
Is this to cover the situation in which we don’t have frequent enough safe points to trigger the
per-thread deflation? Or do not all GCS use the ParallelSPCleanupThreadClosure?

thanks,
Karen

> On May 26, 2019, at 8:30 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
> 
> Greetings,
> 
> I have a fix for an issue that came up during performance testing.
> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
> experiments.
> 
> Here's the list of changes from CR3 to CR4. The list is a bit
> verbose due to the complexity of the issue, but the changes
> themselves are not that big.
> 
> Functional:
>   - Change SafepointSynchronize::is_cleanup_needed() from calling
>     ObjectSynchronizer::is_cleanup_needed() to calling
>     ObjectSynchronizer::is_safepoint_deflation_needed():
>     - is_safepoint_deflation_needed() returns the result of
>       monitors_used_above_threshold() for safepoint based
>       monitor deflation (!AsyncDeflateIdleMonitors).
>     - For AsyncDeflateIdleMonitors, it only returns true if
>       there is a special deflation request, e.g., System.gc()
>       - This solves a bug where there are a bunch of Cleanup
>         safepoints that simply request async deflation which
>         keeps the async JavaThreads from making progress on
>         their async deflation work.
>   - Add AsyncDeflationInterval diagnostic option. Description:
>       Async deflate idle monitors every so many milliseconds when
>       MonitorUsedDeflationThreshold is exceeded (0 is off).
>   - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>     ObjectSynchronizer::is_async_deflation_needed():
>     - is_async_deflation_needed() returns true when
>       is_async_cleanup_requested() is true or when
>       monitors_used_above_threshold() is true (but no more often than
>       AsyncDeflationInterval).
>     - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>       at most GuaranteedSafepointInterval millis:
>       - This allows is_async_deflation_needed() to be checked at
>         the same interval as GuaranteedSafepointInterval.
>         (default is 1000 millis/1 second)
>       - Once is_async_deflation_needed() has returned true, it
>         generally cannot return true for AsyncDeflationInterval.
>         This is to prevent async deflation from swamping the
>         ServiceThread.
>   - The ServiceThread still handles async deflation of the global
>     in-use list and now it also marks JavaThreads for async deflation
>     of their in-use lists.
>     - The ServiceThread will check for async deflation work every
>       GuaranteedSafepointInterval.
>     - A safepoint can still cause the ServiceThread to check for
>       async deflation work via is_async_deflation_requested.
>   - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>     monitors_used_above_threshold() and remove is_cleanup_needed().
>   - In addition to System.gc(), the VM_Exit VM op and the final
>     VMThread safepoint now set the is_special_deflation_requested
>     flag to reduce the in-use monitor population that is reported by
>     ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
> 
> Test update:
>   - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>     AsyncDeflateIdleMonitors.
> 
> Collateral:
>   - Add/clarify/update some logging messages.
> 
> Cleanup:
>   - Updated comments based on Karen's code review.
>   - Change 'special cleanup' -> 'special deflation' and
>     'async cleanup' -> 'async deflation'.
>     - comment and function name changes
>   - Clarify MonitorUsedDeflationThreshold description;
> 
> 
> Main bug URL:
> 
>     JDK-8153224 Monitor deflation prolong safepoints
>     https://bugs.openjdk.java.net/browse/JDK-8153224
> 
> The project is currently baselined on jdk-13+22.
> 
> Here's the full webrev URL:
> 
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
> 
> Here's the incremental webrev URL:
> 
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
> 
> I have not updated the OpenJDK wiki to reflect the CR4 changes:
> 
> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
> 
> The wiki doesn't say a whole lot about the async deflation invocation
> mechanism so I have to figure out how to add that content.
> 
> This version of the patch has been thru Mach5 tier[1-8] testing on
> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
> my stress kit on Linux-X64. I still have to run the SPECjbb2015
> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
> 
> Thanks, in advance, for any questions, comments or suggestions.
> 
> Dan
> 
> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>> Greetings,
>> 
>> I had some discussions with Karen about a race that was in the
>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>> theoretical and I had no test failures due to it. The fix is pretty
>> simple: remove the special case code for async deflation in the
>> ObjectMonitor::enter() function and rely solely on the ref_count
>> for ObjectMonitor::enter() protection.
>> 
>> During those discussions Karen also floated the idea of using the
>> ref_count field instead of the contentions field for the Async
>> Monitor Deflation protocol. I decided to go ahead and code up that
>> change and I have run it through the usual stress and Mach5 testing
>> with no issues. It's also known as v2.03 (for those for with the
>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>> Sorry for all the names...
>> 
>> Main bug URL:
>> 
>>     JDK-8153224 Monitor deflation prolong safepoints
>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>> 
>> The project is currently baselined on jdk-13+18.
>> 
>> Here's the full webrev URL:
>> 
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>> 
>> Here's the incremental webrev URL:
>> 
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>> 
>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>> 
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>> 
>> This version of the patch has been thru Mach5 tier[1-8] testing on
>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>> Solaris-X64 release had the usual "Too large time diff" complaints.
>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>> stress kit is running right now.
>> 
>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>> the results and analyze them.
>> 
>> Thanks, in advance, for any questions, comments or suggestions.
>> 
>> Dan
>> 
>> 
>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>> 
>>> I have a small but important bug fix for the Async Monitor Deflation
>>> project ready to go. It's also known as v2.02 (for those for with the
>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>> for all the names...
>>> 
>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>> is out of our hair.
>>> 
>>> Main bug URL:
>>> 
>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>> 
>>> The project is currently baselined on jdk-13+17.
>>> 
>>> Here's the full webrev URL:
>>> 
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>> 
>>> Here's the incremental webrev URL (JDK-8153224):
>>> 
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>> 
>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>> 
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>> 
>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>> my jdk-13+18 stress run is done).
>>> 
>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>> testing is done.
>>> 
>>> Thanks, in advance, for any questions, comments or suggestions.
>>> 
>>> Dan
>>> 
>>> 
>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>> 
>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>> names...
>>>> 
>>>> Main bug URL:
>>>> 
>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>> 
>>>> Baseline bug fixes URL:
>>>> 
>>>>     JDK-8222295 more baseline cleanups from Async Monitor Deflation project
>>>>     https://bugs.openjdk.java.net/browse/JDK-8222295
>>>> 
>>>> The project is currently baselined on jdk-13+15.
>>>> 
>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>> 
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>> 
>>>> Here's the full webrev URL (JDK-8153224 only):
>>>> 
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>> 
>>>> Here's the incremental webrev URL (JDK-8153224):
>>>> 
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>> 
>>>> So I'm looking for reviews for both JDK-8222295 and the latest version
>>>> of JDK-8153224...
>>>> 
>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>> 
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>> 
>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>> 
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>> 
>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>> 
>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>> 
>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>> 
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>> 
>>>>> Here's the webrev URL:
>>>>> 
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>> 
>>>>> Here's a link to Carsten's original webrev:
>>>>> 
>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>> 
>>>>> Earlier versions of this patch have been through several rounds of
>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>> Roman for their preliminary code review comments. A very special
>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>> their own environments (including specJBB2015).
>>>>> 
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>> fastdebug and slowdebug).
>>>>> 
>>>>> All of the testing done on earlier versions will be redone on the
>>>>> latest version of the patch.
>>>>> 
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>> 
>>>>> Dan
>>>>> 
>>>>> P.S.
>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>> to characterize/analyze this failure for more than a week now. At
>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>> an existing bug. However, I plan to have a better handle on that
>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 



More information about the hotspot-runtime-dev mailing list