RFR(L) 8153224 Monitor deflation prolong safepoints (CR3/v2.03/6-for-jdk13)

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue May 7 00:20:34 UTC 2019


Karen,

Thanks for reviewing (yet again)!

Carsten, there's a query down below for you...


On 5/6/19 5:21 PM, Karen Kinnear wrote:
> Dan,
>
> Looks good to me

Thanks!


> - this makes the timing dance something I can get my head around.

Same here!


> I do not see any timing holes here.

That's good news! Thanks for taking the detailed look.


> I did not re-review the wiki - it said it was still WIP.

Oops. I forgot to remove the WIP markers when I uploaded the webrevs
and sent out the CR3/v2.03/6-for-jdk13 invite...


> Couple of minor comments:
> 1. synchronizer.cpp comment lines 1119-1120
> Could you possibly clarify in the comment that internal inflation happens at times
> when it is not safe to stop for a safepoint?

Good idea. The updated comment will look like this:

        // Deflate any per-thread idle monitors for this JavaThread if
        // this is not an internal inflation; internal inflations can
        // occur in places where it is not safe to pause for a safepoint.
        // Clean up your own mess. (Gibbs Rule 45) Otherwise, skip this 
cleanup.
        // deflate_global_idle_monitors_using_JT() is called by the 
ServiceThread.

> 2. synchronizer.cpp
> 1174 assertion: how could the ref_count be > 0 if this is on the free list? I get that we don’t
> “reset” it, but shouldn’t the ref_count inc/dec match once you’ve handled the -max_jint?

Here's the whole block:

src/hotspot/share/runtime/synchronizer.cpp:

L1106: ObjectMonitor* ObjectSynchronizer::omAlloc(
<snip>
L1169:           if (take->ref_count() < 0) {
L1170:             // Add back max_jint to restore the ref_count field 
to its
L1171:             // proper value.
L1172:             Atomic::add(max_jint, &take->_ref_count);
L1173:
L1174:             assert(take->ref_count() >= 0, "must not be negative: 
ref_count=%d",
L1175:                    take->ref_count());
L1176:           }

First, we only get to the assert on L1174 if ref_count() < 0 (L1169)
so we only try to adjust the count if it's negative and then we verify
that the resulting count is >= 0. This appears to be a slightly paranoid
assert on my part, but we need to make sure ref_count is not negative as
we are about to take the node off the global free list (and put it on the
calling thread's per-thread free list).

Now for the question you actually asked:

 > how could the ref_count be > 0 if this is on the free list?

Let's pick on one of the non-lock operations:

src/hotspot/share/runtime/synchronizer.cpp:

L928: JavaThread* ObjectSynchronizer::get_lock_owner(
<snip>
L942:     markOop mark = ReadStableMark(obj);
<snip>
L950:     else if (mark->has_monitor()) {
L951:       ObjectMonitorHandle omh;
L952:       if (!omh.save_om_ptr(obj, mark)) {
L953:         // Lost a race with async deflation so try again.
L954:         assert(AsyncDeflateIdleMonitors, "sanity check");
L955:         continue;
L956:       }

Imagine that the caller to get_lock_owner() stalls after the
ReadStableMark(obj) call. And the deflater thread manages to
finish its work...

When get_lock_owner() calls save_om_ptr(), it will increment the
ref_count:

src/hotspot/share/runtime/objectMonitor.cpp:

L2070: bool ObjectMonitorHandle::save_om_ptr(
<snip>
L2075:   om_ptr->inc_ref_count();
<snip>
L2081:     if (om_ptr->_owner == DEFLATER_MARKER && om_ptr->ref_count() 
<= 0) {
L2082:       // Async deflation is in progress and our ref_count increment
L2083:       // above lost the race to async deflation. Attempt to restore
L2084:       // the header/dmw to the object's header so that we only retry
L2085:       // once if the deflater thread happens to be slow.
L2086: om_ptr->install_displaced_markword_in_object(object);
L2087:       om_ptr->dec_ref_count();
L2088:       return false;
L2089:     }

and then it will detect that async deflation won the race so it will
decrement the ref_count and return false.

If the window between L2075 and L2087 overlaps with the code block
in omAlloc() above, then it's possible for this:

L1172:             Atomic::add(max_jint, &take->_ref_count);

to result in a ref_count > 0. Of course, when save_om_ptr() gets to
the dec_ref_count() call, all will be good, but...

So while this looks a little paranoid:

L1174:             assert(take->ref_count() >= 0, "must not be negative: 
ref_count=%d",
L1175:                    take->ref_count());

a ref_count > 0 is possible.


> 3. How did you decide which tryLock() calls to put the check for DEFLATER_MARKER after?

Hmmm... I don't think Carsten was looking at it in terms of putting the
checks after tryLock() calls, but of course I'm guessing. Carsten?

The way that I look at it is that EnterI() has two places where it makes
sense to detect the beginning of an async deflation cycle and ReenterI()
has one place. In all three of those places, the calling thread can make
forward progress without a retry (or blocking) if it can swap Self in for
the DEFLATER_MARKER and immediately gain/regain ownership of the monitor.

The "T-enter Wins By A-B-A" scenario in the wiki shows where the first
check in EnterI() comes into play.

For the check in ReenterI(), it might be the case that an A-B-A scenario
can't happen there due to a non-zero waiters or a non-zero ref_count,
but I haven't written up the scenario to prove it one way or the other.


> Looking forward to the performance numbers.

I posted my latest SPECjbb2015 results a little while ago. I'll leave
it up to folks with more experience with SPECjbb2015 to decide what
those results mean.


> I can’t resist any longer - link to our new superhero: DEFLATER MOUSE  - https://steamcommunity.com/sharedfiles/filedetails/?id=729760850

Nice!

Thanks again for the review!

Dan


>
> many thanks,
> Karen
>
>> On May 6, 2019, at 11:52 AM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
>>
>> Greetings,
>>
>> I had some discussions with Karen about a race that was in the
>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>> theoretical and I had no test failures due to it. The fix is pretty
>> simple: remove the special case code for async deflation in the
>> ObjectMonitor::enter() function and rely solely on the ref_count
>> for ObjectMonitor::enter() protection.
>>
>> During those discussions Karen also floated the idea of using the
>> ref_count field instead of the contentions field for the Async
>> Monitor Deflation protocol. I decided to go ahead and code up that
>> change and I have run it through the usual stress and Mach5 testing
>> with no issues. It's also known as v2.03 (for those for with the
>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>> Sorry for all the names...
>>
>> Main bug URL:
>>
>>      JDK-8153224 Monitor deflation prolong safepoints
>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-13+18.
>>
>> Here's the full webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>
>> Here's the incremental webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>
>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> This version of the patch has been thru Mach5 tier[1-8] testing on
>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>> Solaris-X64 release had the usual "Too large time diff" complaints.
>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>> stress kit is running right now.
>>
>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>> the results and analyze them.
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have a small but important bug fix for the Async Monitor Deflation
>>> project ready to go. It's also known as v2.02 (for those for with the
>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>> for all the names...
>>>
>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>> is out of our hair.
>>>
>>> Main bug URL:
>>>
>>>      JDK-8153224 Monitor deflation prolong safepoints
>>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-13+17.
>>>
>>> Here's the full webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>
>>> Here's the incremental webrev URL (JDK-8153224):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>
>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>> my jdk-13+18 stress run is done).
>>>
>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>> testing is done.
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>> names...
>>>>
>>>> Main bug URL:
>>>>
>>>>      JDK-8153224 Monitor deflation prolong safepoints
>>>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> Baseline bug fixes URL:
>>>>
>>>>      JDK-8222295 more baseline cleanups from Async Monitor Deflation project
>>>>      https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>
>>>> The project is currently baselined on jdk-13+15.
>>>>
>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>>
>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>
>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>
>>>> So I'm looking for reviews for both JDK-8222295 and the latest version
>>>> of JDK-8153224...
>>>>
>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>
>>>>>      JDK-8153224 Monitor deflation prolong safepoints
>>>>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>
>>>>> Here's a link to Carsten's original webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>
>>>>> Earlier versions of this patch have been through several rounds of
>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>> Roman for their preliminary code review comments. A very special
>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>> their own environments (including specJBB2015).
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>> fastdebug and slowdebug).
>>>>>
>>>>> All of the testing done on earlier versions will be redone on the
>>>>> latest version of the patch.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> P.S.
>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>> to characterize/analyze this failure for more than a week now. At
>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>> an existing bug. However, I plan to have a better handle on that
>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>
>>>>
>>>



More information about the hotspot-runtime-dev mailing list