RFR(L) 8153224 Monitor deflation prolong safepoints

Tue Apr 9 15:53:36 UTC 2019

Hi Carsten,

Thanks for responding to Karen's code review comments.

Karen,

I have a query for you down at the end of my reply...

More below...

On 4/5/19 11:01 PM, Carsten Varming wrote:
> Dear Karen,
>
> Please see inline answers.
>
> On Fri, Apr 5, 2019 at 4:59 PM Karen Kinnear <karen.kinnear at oracle.com 
> <mailto:karen.kinnear at oracle.com>> wrote:
>
>
>     4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees
>     that 0 < _count
>     with comments that caller ensured _count <= 0
>     In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>     — Am I missing something subtle here or should they be the same
>     guarantees?
>
>
> In ::enter _count is incremented when the thread is trying to acquire 
> the monitor and decremented after the monitor has been acquired. The 0 
> < _count assertion is between those two point in the code. A thread 
> acquiring a monitor and then calling wait will increment _count and 
> then decrement _count as part of acquiring the monitor, thus _count 
> can be 0 by the time the thread calls wait and when ReenterI is called.

I had a similar answer and I'm planning to tweak the comments
and the guarantees a bit in the next round of code review (CR1);
please see my reply to Karen's CR for the proposed changes.

>
>     9. install_displaced_markword_in_object
>     What happens if the cas_set_mark fails?
>     I get that today this handles the race with enter and
>     deflate_monitor_using_JT. If we remove
>     the call from enter, is the expectation that we’ve blocked all
>     others who did not set is_marked themselves?
>     If we remove the call from enter would it make sense to ensure
>     that the cas_set_mark succeeds here?
>
>
> I designed my original patch such that no thread would ever wait for 
> the the deflating thread to finish deflating a monitor. If you remove 
> install_displaced_markword_in_object from enter, then the entering 
> thread can end up busy waiting by continuously reading the monitor 
> pointer from the object mark word and then realizing that the monitor 
> is being deflated and it should retry by going back to reading the 
> object mark word. This bad behavior is completely avoided by calling 
> install_displaced_markword_in_object.

Here's the code in question:

src/hotspot/share/runtime/objectMonitor.cpp:

> bool ObjectMonitor::enter(TRAPS) {
<snip>
>   // Prevent deflation. See ObjectSynchronizer::deflate_monitor() and 
> is_busy().
>   // Ensure the object-monitor relationship remains stable while 
> there's contention.
>   const jint count = Atomic::add(1, &_count);
>   if (count <= 0 && _owner == DEFLATER_MARKER) {
>     // Async deflation in progress. Help deflater thread install
>     // the mark word (in case deflater thread is slow).
>     install_displaced_markword_in_object();
>     Self->_Stalled = 0;
>     return false;  // Caller should retry. Never mind about _count as 
> this monitor has been deflated.
>   }

Our thread (T-enter) observes that the ObjectMonitor is being deflated
by T-deflate, calls install_displaced_markword_in_object() and returns
false to the caller which causes a retry.

Restoring the header/dmw from the ObjectMonitor to the object's header
here isn't needed for correctness so it could be dropped (and would
simplify the code). Your counterpoint is if we drop the call, then
T-enter could do retry after retry if T-deflate is slow to get to its
install_displaced_markword_in_object() call.

If T-enter calls install_displaced_markword_in_object(), then T-enter
will do a single retry because the object T-enter is trying to lock
will no longer have an ObjectMonitor. Okay I finally grok it...

I think we need to clarify the comment a bit:

 >   if (count <= 0 && _owner == DEFLATER_MARKER) {
 >     // Async deflation is in progress. Attempt to restore the
 >     // header/dmw to the object's header so that we only retry once
 >     // if the deflater thread happens to be slow.
 >     install_displaced_markword_in_object();

> In my original patch no thread would ever wait for a deflating thread 
> to finish. This property got lost in FastHashCode as that function 
> evolved since I wrote my patch, but I think this property is worth 
> preserving where possible. It might even be worth looking at 
> FastHashCode to see if we can re-establish this property.

Async Monitor Deflation causes races with FastHashCode() when the target
object has an existing ObjectMonitor. Here's the base code:

>   768   } else if (mark->has_monitor()) {
>   769     monitor = mark->monitor();
>   770     temp = monitor->header();
> 771 assert(temp->is_neutral(), "invariant");
>   772     hash = temp->hash();
>   773     if (hash) {
>   774       return hash;
>   775     }
>   776     // Skip to the following code to reduce code size

The 'monitor' fetched on L769 is unstable due to Async Monitor
Deflation and can cause an incorrect hash value to be returned.
The solution is to protect the ObjectMonitor*:

>   775   } else if (mark->has_monitor()) {
> 776 ObjectMonitorHandle omh;
> 777 if (!omh.save_om_ptr(obj, mark)) {
> 778 // Lost a race with async deflation so try again.
> 779 assert(AsyncDeflateIdleMonitors, "sanity check");
> 780 goto Retry;
> 781 }
> 782 monitor = omh.om_ptr();
>   783     temp = monitor->header();
>   784     assert(temp->is_neutral(), "invariant: header=" INTPTR_FORMAT, p2i((address)temp));
>   785     hash = temp->hash();
>   786     if (hash != 0) {
>   787       return hash;
>   788     }
>   789     // Skip to the following code to reduce code size

where L776-L782 handle the protection duty and possible retry. So we
have to protect the ObjectMonitor*, but, like enter(), we could call
install_displaced_markword_in_object() when we retry which would limit
T-hash to a single retry.

ObjectSynchronizer::inflate() has a similar collision and retry issue:

> 1456     // CASE: inflated
> 1457     if (mark->has_monitor()) {
> 1458 if (!omh_p->save_om_ptr(object, mark)) {
> 1459 // Lost a race with async deflation so try again.
> 1460 assert(AsyncDeflateIdleMonitors, "sanity check");
> 1461 continue;
> 1462 }

In this situation, inflate() discovers that the object already has an
ObjectMonitor; the object may not have had one when inflate() was
called, but it has one now. That particular race predates this project.

In any case, inflate() wants to return a stable ObjectMonitor* in the
ObjectMonitorHandle, but if save_om_ptr() returns false, then inflate()
has to retry. The only reason for save_om_ptr() to return false is due
to a collision with Async Monitor Deflation. Like enter, we could call
install_displaced_markword_in_object() when we retry which would limit
inflate() to a single retry.

Okay, I've evolved from thinking we could simplify the code by dropping
install_displaced_markword_in_object() to thinking that I understand
what install_displaced_markword_in_object() brings to the party. And now
I'm proposing that we add 2 more install_displaced_markword_in_object()
calls to limit retries on two more code paths.

Karen, are you convinced that install_displaced_markword_in_object() is
useful?

Dan

>
> I hope this helps.
>
> Best,
> Carsten
>
>>     On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty
>>     <daniel.daugherty at oracle.com
>>     <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>>     Filed:
>>
>>         JDK-8222034 Thread-SMR functions should be updated to remove
>>     work around
>>     https://bugs.openjdk.java.net/browse/JDK-8222034
>>
>>     Martin and Robbin, please check it out and make sure that I captured
>>     things correctly...
>>
>>     Dan
>>
>>
>>
>>     On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>>     On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>>     Hi everybody,
>>>>
>>>>>     I think was fixed with:
>>>>>     8202080: Introduce ordering semantics for Atomic::add/inc and
>>>>>     other RMW atomics
>>>>>     You should get a leading sync and trailing one with the
>>>>>     default conservative
>>>>>     model and thus get proper memory ordering.
>>>>>     Martin, I'm I correct?
>>>>     Exactly. Thanks for pointing this out. PPC uses the strongest
>>>>     possible ordering semantics with memory_order_conservative
>>>>     (default parameter).
>>>>     I've seen that comment about PPC in "void
>>>>     ThreadsList::inc_nested_handle_cnt()". This function could get
>>>>     replaced.
>>>
>>>     Okay so we need a new bug to update these two Thread-SMR functions:
>>>
>>>     src/hotspot/share/runtime/threadSMR.cpp:
>>>
>>>     void ThreadsList::dec_nested_handle_cnt() {
>>>       // The decrement needs to be MO_ACQ_REL. At the moment, the
>>>     Atomic::dec
>>>       // backend on PPC does not yet conform to these requirements.
>>>     Therefore
>>>       // the decrement is simulated with an Atomic::sub(1, &addr).
>>>       // Without this MO_ACQ_REL Atomic::dec simulation, the nested
>>>     SMR mechanism
>>>       // is not generally safe to use.
>>>       Atomic::sub(1, &_nested_handle_cnt);
>>>     }
>>>
>>>     void ThreadsList::inc_nested_handle_cnt() {
>>>       // The increment needs to be MO_SEQ_CST. At the moment, the
>>>     Atomic::inc
>>>       // backend on PPC does not yet conform to these requirements.
>>>     Therefore
>>>       // the increment is simulated with a load phi; cas phi + 1; loop.
>>>       // Without this MO_SEQ_CST Atomic::inc simulation, the nested
>>>     SMR mechanism
>>>       // is not generally safe to use.
>>>       intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>       for (;;) {
>>>         if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt, sample)
>>>     == sample) {
>>>           return;
>>>         } else {
>>>           sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>         }
>>>       }
>>>     }
>>>
>>>     I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>>     sure we're all in agreement. Once we decide that Thread-SMR's
>>>     functions look like, I'll adapt my Async Monitor Deflation
>>>     functions...
>>>
>>>     Dan
>>>
>>>
>>>>
>>>>     Best regards,
>>>>     Martin
>>>>
>>>>
>>>>     -----Original Message-----
>>>>     From: Robbin Ehn <robbin.ehn at oracle.com
>>>>     <mailto:robbin.ehn at oracle.com>>
>>>>     Sent: Freitag, 5. April 2019 14:07
>>>>     To: daniel.daugherty at oracle.com
>>>>     <mailto:daniel.daugherty at oracle.com>;
>>>>     hotspot-runtime-dev at openjdk.java.net
>>>>     <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten Varming
>>>>     <varming at gmail.com <mailto:varming at gmail.com>>; Roman Kennke
>>>>     <rkennke at redhat.com <mailto:rkennke at redhat.com>>; Doerr, Martin
>>>>     <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>>     Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>
>>>>     Hi Dan,
>>>>
>>>>     (Martin there is question for you last in this email)
>>>>
>>>>     After first pass I did not find any real issues.
>>>>     Considering what you had to work with, it looks good!
>>>>
>>>>     #1
>>>>     There are some assert which are redundant (to me at least) like:
>>>>     src/hotspot/share/runtime/objectMonitor.cpp
>>>>     L445
>>>>         if (!dmw->is_marked() && dmw->hash() == 0) {
>>>>           // This dmw is neutral and has not yet started the
>>>>     restoration
>>>>           // protocol so we mark a copy of the dmw to begin the
>>>>     protocol.
>>>>           markOop marked_dmw = dmw->set_marked();
>>>>           assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>>                  "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>>                  marked_dmw->is_marked(), marked_dmw->hash());
>>>>
>>>>     That assert is basically a test that set_marked worked?
>>>>
>>>>     L505
>>>>           if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>>     DEFLATER_MARKER) {
>>>>             assert(_succ != Self, "invariant");
>>>>             assert(_owner == Self, "invariant");
>>>>
>>>>     Assert on _owner checks that our cmpxchg is not broken?
>>>>
>>>>     I think it's easier to read the code if some on the most
>>>>     obvious asserts are
>>>>     removed. Maybe comments instead.
>>>>
>>>>     #2
>>>>     Not your doing but I think we should remove TRAPS/Thread * Self
>>>>     and use
>>>>     JavaThread* instead.
>>>>     E.g. so we can change:
>>>>     void ObjectMonitor::EnterI(TRAPS) {
>>>>         Thread * const Self = THREAD;
>>>>         assert(Self->is_Java_thread(), "invariant");
>>>>         assert(((JavaThread *) Self)->thread_state() ==
>>>>     _thread_blocked, "invariant");
>>>>
>>>>     to:
>>>>
>>>>     void ObjectMonitor::EnterI(JavaThread* Self) {
>>>>         assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>
>>>>     #3
>>>>     src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>        164 inline void ObjectMonitor::inc_ref_count() {
>>>>        165   // The increment needs to be MO_SEQ_CST. At the
>>>>     moment, the Atomic::inc
>>>>        166   // backend on PPC does not yet conform to these
>>>>     requirements. Therefore
>>>>        167   // the increment is simulated with a load phi; cas phi
>>>>     + 1; loop.
>>>>        168   // Without this MO_SEQ_CST Atomic::inc simulation,
>>>>     AsyncDeflateIdleMonitors
>>>>        169   // is not safe.
>>>>
>>>>     I think was fixed with:
>>>>     8202080: Introduce ordering semantics for Atomic::add/inc and
>>>>     other RMW atomics
>>>>     You should get a leading sync and trailing one with the default
>>>>     conservative
>>>>     model and thus get proper memory ordering.
>>>>     Martin, I'm I correct?
>>>>
>>>>     Thanks, Robbin
>>>>
>>>>     On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>>     Greetings,
>>>>>
>>>>>     Welcome to the OpenJDK review thread for my port of Carsten's
>>>>>     work on:
>>>>>
>>>>>           JDK-8153224 Monitor deflation prolong safepoints
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>>     Here's a link to the OpenJDK wiki that describes my port:
>>>>>
>>>>>     https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>>     Here's the webrev URL:
>>>>>
>>>>>     http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>
>>>>>     Here's a link to Carsten's original webrev:
>>>>>
>>>>>     http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>
>>>>>     Earlier versions of this patch have been through several rounds of
>>>>>     preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>     Roman for their preliminary code review comments. A very special
>>>>>     thanks to Robbin and Roman for building and testing the patch in
>>>>>     their own environments (including specJBB2015).
>>>>>
>>>>>     This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>     Oracle's usual set of platforms. Earlier versions have been run
>>>>>     through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>     (product, fastdebug, slowdebug).Earlier versions have run
>>>>>     Kitchensink
>>>>>     for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>     fastdebug
>>>>>     and slowdebug). Earlier versions have run my monitor inflation
>>>>>     stress
>>>>>     tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>     fastdebug and slowdebug).
>>>>>
>>>>>     All of the testing done on earlier versions will be redone on the
>>>>>     latest version of the patch.
>>>>>
>>>>>     Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>>     Dan
>>>>>
>>>>>     P.S.
>>>>>     One subtest in
>>>>>     gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>     is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>     to characterize/analyze this failure for more than a week now. At
>>>>>     this point I'm convinced that Async Monitor Deflation is
>>>>>     aggravating
>>>>>     an existing bug. However, I plan to have a better handle on that
>>>>>     failure before these bits are pushed to the jdk/jdk repo.
>>>
>>>
>>
>