RFR(L) 8153224 Monitor deflation prolong safepoints

Mon Apr 15 19:25:01 UTC 2019

On 4/15/19 3:19 PM, Karen Kinnear wrote:
> Dan,
>
> Thank you for delving into this
>
>> On Apr 9, 2019, at 11:53 AM, Daniel D. Daugherty 
>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>> Hi Carsten,
>>
>> Thanks for responding to Karen's code review comments.
>>
>> Karen,
>>
>> I have a query for you down at the end of my reply...
>>
>> More below...
>>
>> On 4/5/19 11:01 PM, Carsten Varming wrote:
>>> Dear Karen,
>>>
>>> Please see inline answers.
>>>
>>> On Fri, Apr 5, 2019 at 4:59 PM Karen Kinnear 
>>> <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>>>
>>>
>>>     4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees
>>>     that 0 < _count
>>>     with comments that caller ensured _count <= 0
>>>     In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>>>     — Am I missing something subtle here or should they be the same
>>>     guarantees?
>>>
>>>
>>> In ::enter _count is incremented when the thread is trying to 
>>> acquire the monitor and decremented after the monitor has been 
>>> acquired. The 0 < _count assertion is between those two point in the 
>>> code. A thread acquiring a monitor and then calling wait will 
>>> increment _count and then decrement _count as part of acquiring the 
>>> monitor, thus _count can be 0 by the time the thread calls wait and 
>>> when ReenterI is called.
>>
>> I had a similar answer and I'm planning to tweak the comments
>> and the guarantees a bit in the next round of code review (CR1);
>> please see my reply to Karen's CR for the proposed changes.
> Thanks again.
>>
>>
>>>
>>>     9. install_displaced_markword_in_object
>>>     What happens if the cas_set_mark fails?
>>>     I get that today this handles the race with enter and
>>>     deflate_monitor_using_JT. If we remove
>>>     the call from enter, is the expectation that we’ve blocked all
>>>     others who did not set is_marked themselves?
>>>     If we remove the call from enter would it make sense to ensure
>>>     that the cas_set_mark succeeds here?
>>>
>>>
>>> I designed my original patch such that no thread would ever wait for 
>>> the the deflating thread to finish deflating a monitor. If you 
>>> remove install_displaced_markword_in_object from enter, then the 
>>> entering thread can end up busy waiting by continuously reading the 
>>> monitor pointer from the object mark word and then realizing that 
>>> the monitor is being deflated and it should retry by going back to 
>>> reading the object mark word. This bad behavior is completely 
>>> avoided by calling install_displaced_markword_in_object.
>>
>> Here's the code in question:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp:
>>
>>> bool ObjectMonitor::enter(TRAPS) {
>> <snip>
>>>   // Prevent deflation. See ObjectSynchronizer::deflate_monitor() 
>>> and is_busy().
>>>   // Ensure the object-monitor relationship remains stable while 
>>> there's contention.
>>>   const jint count = Atomic::add(1, &_count);
>>>   if (count <= 0 && _owner == DEFLATER_MARKER) {
>>>     // Async deflation in progress. Help deflater thread install
>>>     // the mark word (in case deflater thread is slow).
>>>     install_displaced_markword_in_object();
>>>     Self->_Stalled = 0;
>>>     return false;  // Caller should retry. Never mind about _count 
>>> as this monitor has been deflated.
>>>   }
>>
>> Our thread (T-enter) observes that the ObjectMonitor is being deflated
>> by T-deflate, calls install_displaced_markword_in_object() and returns
>> false to the caller which causes a retry.
>>
>> Restoring the header/dmw from the ObjectMonitor to the object's header
>> here isn't needed for correctness so it could be dropped (and would
>> simplify the code). Your counterpoint is if we drop the call, then
>> T-enter could do retry after retry if T-deflate is slow to get to its
>> install_displaced_markword_in_object() call.
>>
>> If T-enter calls install_displaced_markword_in_object(), then T-enter
>> will do a single retry because the object T-enter is trying to lock
>> will no longer have an ObjectMonitor. Okay I finally grok it...
>>
>> I think we need to clarify the comment a bit:
>>
>> >   if (count <= 0 && _owner == DEFLATER_MARKER) {
>> >     // Async deflation is in progress. Attempt to restore the
>> >     // header/dmw to the object's header so that we only retry once
>> >     // if the deflater thread happens to be slow.
>> >     install_displaced_markword_in_object();
>>
>>
>>> In my original patch no thread would ever wait for a deflating 
>>> thread to finish. This property got lost in FastHashCode as that 
>>> function evolved since I wrote my patch, but I think this property 
>>> is worth preserving where possible. It might even be worth looking 
>>> at FastHashCode to see if we can re-establish this property.
>>
>> Async Monitor Deflation causes races with FastHashCode() when the target
>> object has an existing ObjectMonitor. Here's the base code:
>>
>>>   768   } else if (mark->has_monitor()) {
>>>   769     monitor = mark->monitor();
>>>   770     temp = monitor->header();
>>> 771 assert(temp->is_neutral(), "invariant");
>>>   772     hash = temp->hash();
>>>   773     if (hash) {
>>>   774       return hash;
>>>   775     }
>>>   776     // Skip to the following code to reduce code size
>>
>> The 'monitor' fetched on L769 is unstable due to Async Monitor
>> Deflation and can cause an incorrect hash value to be returned.
>> The solution is to protect the ObjectMonitor*:
>>
>>>   775   } else if (mark->has_monitor()) {
>>> 776 ObjectMonitorHandle omh;
>>> 777 if (!omh.save_om_ptr(obj, mark)) {
>>> 778 // Lost a race with async deflation so try again.
>>> 779 assert(AsyncDeflateIdleMonitors, "sanity check");
>>> 780 goto Retry;
>>> 781 }
>>> 782 monitor = omh.om_ptr();
>>>   783     temp = monitor->header();
>>>   784     assert(temp->is_neutral(), "invariant: header=" INTPTR_FORMAT, p2i((address)temp));
>>>   785     hash = temp->hash();
>>>   786     if (hash != 0) {
>>>   787       return hash;
>>>   788     }
>>>   789     // Skip to the following code to reduce code size
>>
>> where L776-L782 handle the protection duty and possible retry. So we
>> have to protect the ObjectMonitor*, but, like enter(), we could call
>> install_displaced_markword_in_object() when we retry which would limit
>> T-hash to a single retry.
>
>>
>> ObjectSynchronizer::inflate() has a similar collision and retry issue:
>>
>>> 1456     // CASE: inflated
>>> 1457     if (mark->has_monitor()) {
>>> 1458 if (!omh_p->save_om_ptr(object, mark)) {
>>> 1459 // Lost a race with async deflation so try again.
>>> 1460 assert(AsyncDeflateIdleMonitors, "sanity check");
>>> 1461 continue;
>>> 1462 }
>>
>> In this situation, inflate() discovers that the object already has an
>> ObjectMonitor; the object may not have had one when inflate() was
>> called, but it has one now. That particular race predates this project.
>>
>> In any case, inflate() wants to return a stable ObjectMonitor* in the
>> ObjectMonitorHandle, but if save_om_ptr() returns false, then inflate()
>> has to retry. The only reason for save_om_ptr() to return false is due
>> to a collision with Async Monitor Deflation. Like enter, we could call
>> install_displaced_markword_in_object() when we retry which would limit
>> inflate() to a single retry.
>>
>>
>> Okay, I've evolved from thinking we could simplify the code by dropping
>> install_displaced_markword_in_object() to thinking that I understand
>> what install_displaced_markword_in_object() brings to the party. And now
>> I'm proposing that we add 2 more install_displaced_markword_in_object()
>> calls to limit retries on two more code paths.
>>
>> Karen, are you convinced that install_displaced_markword_in_object() is
>> useful?
>
> Dan - between Carsten’s and your explanation - I get that
> 1) there is value in being able to make forward progress on the object 
> itself sooner, once we know
> that DEFLATE_MARKER is trying to deflate it
> 2) since an enter() can CAS an _owner after DEFLATER_MARKER set but 
> before _count -MAX_JINT,
> then anyone else waiting for deflation to finish could wait a while 
> until they actually get a turn
> 3) If the other callers were to use install_dmw… then they could have 
> shorter retry cycles (although
> with the enter, and now potentially other equivalent callers - more 
> recontenders).
>
> So I see the benefit in others trying this approach as well - to free 
> the object from the not-currently-used
> inflated monitor and retry.

Glad we're all on the same page.

> You may have already sent out a new webrev with that approach - I have 
> not worked through in my
> head how that changes the details, since there are now more players, 
> with different gates.

I have not sent out CR1 yet. I'm still testing. I had some new test failures
on Friday. It looks like adding more install_displaced_markword_in_object()
calls has revealed yet another race. I'm testing a fix now. Once I have 
stable
bits again, I'll start the CR1 cycle...

Dan

>
> thanks for walking us through this,
> Karen
>> Dan
>>
>>
>>
>>>
>>> I hope this helps.
>>>
>>> Best,
>>> Carsten
>>>
>>>>     On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty
>>>>     <daniel.daugherty at oracle.com
>>>>     <mailto:daniel.daugherty at oracle.com>> wrote:
>>>>
>>>>     Filed:
>>>>
>>>>         JDK-8222034 Thread-SMR functions should be updated to
>>>>     remove work around
>>>>     https://bugs.openjdk.java.net/browse/JDK-8222034
>>>>
>>>>     Martin and Robbin, please check it out and make sure that I
>>>>     captured
>>>>     things correctly...
>>>>
>>>>     Dan
>>>>
>>>>
>>>>
>>>>     On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>>>>     On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>>>>     Hi everybody,
>>>>>>
>>>>>>>     I think was fixed with:
>>>>>>>     8202080: Introduce ordering semantics for Atomic::add/inc
>>>>>>>     and other RMW atomics
>>>>>>>     You should get a leading sync and trailing one with the
>>>>>>>     default conservative
>>>>>>>     model and thus get proper memory ordering.
>>>>>>>     Martin, I'm I correct?
>>>>>>     Exactly. Thanks for pointing this out. PPC uses the strongest
>>>>>>     possible ordering semantics with memory_order_conservative
>>>>>>     (default parameter).
>>>>>>     I've seen that comment about PPC in "void
>>>>>>     ThreadsList::inc_nested_handle_cnt()". This function could
>>>>>>     get replaced.
>>>>>
>>>>>     Okay so we need a new bug to update these two Thread-SMR
>>>>>     functions:
>>>>>
>>>>>     src/hotspot/share/runtime/threadSMR.cpp:
>>>>>
>>>>>     void ThreadsList::dec_nested_handle_cnt() {
>>>>>       // The decrement needs to be MO_ACQ_REL. At the moment, the
>>>>>     Atomic::dec
>>>>>       // backend on PPC does not yet conform to these
>>>>>     requirements. Therefore
>>>>>       // the decrement is simulated with an Atomic::sub(1, &addr).
>>>>>       // Without this MO_ACQ_REL Atomic::dec simulation, the
>>>>>     nested SMR mechanism
>>>>>       // is not generally safe to use.
>>>>>       Atomic::sub(1, &_nested_handle_cnt);
>>>>>     }
>>>>>
>>>>>     void ThreadsList::inc_nested_handle_cnt() {
>>>>>       // The increment needs to be MO_SEQ_CST. At the moment, the
>>>>>     Atomic::inc
>>>>>       // backend on PPC does not yet conform to these
>>>>>     requirements. Therefore
>>>>>       // the increment is simulated with a load phi; cas phi + 1;
>>>>>     loop.
>>>>>       // Without this MO_SEQ_CST Atomic::inc simulation, the
>>>>>     nested SMR mechanism
>>>>>       // is not generally safe to use.
>>>>>       intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>>       for (;;) {
>>>>>         if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt,
>>>>>     sample) == sample) {
>>>>>           return;
>>>>>         } else {
>>>>>           sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>>         }
>>>>>       }
>>>>>     }
>>>>>
>>>>>     I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>>>>     sure we're all in agreement. Once we decide that Thread-SMR's
>>>>>     functions look like, I'll adapt my Async Monitor Deflation
>>>>>     functions...
>>>>>
>>>>>     Dan
>>>>>
>>>>>
>>>>>>
>>>>>>     Best regards,
>>>>>>     Martin
>>>>>>
>>>>>>
>>>>>>     -----Original Message-----
>>>>>>     From: Robbin Ehn <robbin.ehn at oracle.com
>>>>>>     <mailto:robbin.ehn at oracle.com>>
>>>>>>     Sent: Freitag, 5. April 2019 14:07
>>>>>>     To: daniel.daugherty at oracle.com
>>>>>>     <mailto:daniel.daugherty at oracle.com>;
>>>>>>     hotspot-runtime-dev at openjdk.java.net
>>>>>>     <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten
>>>>>>     Varming <varming at gmail.com <mailto:varming at gmail.com>>; Roman
>>>>>>     Kennke <rkennke at redhat.com <mailto:rkennke at redhat.com>>;
>>>>>>     Doerr, Martin <martin.doerr at sap.com
>>>>>>     <mailto:martin.doerr at sap.com>>
>>>>>>     Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>>>
>>>>>>     Hi Dan,
>>>>>>
>>>>>>     (Martin there is question for you last in this email)
>>>>>>
>>>>>>     After first pass I did not find any real issues.
>>>>>>     Considering what you had to work with, it looks good!
>>>>>>
>>>>>>     #1
>>>>>>     There are some assert which are redundant (to me at least) like:
>>>>>>     src/hotspot/share/runtime/objectMonitor.cpp
>>>>>>     L445
>>>>>>         if (!dmw->is_marked() && dmw->hash() == 0) {
>>>>>>           // This dmw is neutral and has not yet started the
>>>>>>     restoration
>>>>>>           // protocol so we mark a copy of the dmw to begin the
>>>>>>     protocol.
>>>>>>           markOop marked_dmw = dmw->set_marked();
>>>>>>     assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>>>>                  "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>>>>     marked_dmw->is_marked(), marked_dmw->hash());
>>>>>>
>>>>>>     That assert is basically a test that set_marked worked?
>>>>>>
>>>>>>     L505
>>>>>>           if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>>>>     DEFLATER_MARKER) {
>>>>>>             assert(_succ != Self, "invariant");
>>>>>>             assert(_owner == Self, "invariant");
>>>>>>
>>>>>>     Assert on _owner checks that our cmpxchg is not broken?
>>>>>>
>>>>>>     I think it's easier to read the code if some on the most
>>>>>>     obvious asserts are
>>>>>>     removed. Maybe comments instead.
>>>>>>
>>>>>>     #2
>>>>>>     Not your doing but I think we should remove TRAPS/Thread *
>>>>>>     Self and use
>>>>>>     JavaThread* instead.
>>>>>>     E.g. so we can change:
>>>>>>     void ObjectMonitor::EnterI(TRAPS) {
>>>>>>         Thread * const Self = THREAD;
>>>>>>     assert(Self->is_Java_thread(), "invariant");
>>>>>>         assert(((JavaThread *) Self)->thread_state() ==
>>>>>>     _thread_blocked, "invariant");
>>>>>>
>>>>>>     to:
>>>>>>
>>>>>>     void ObjectMonitor::EnterI(JavaThread* Self) {
>>>>>>     assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>>>
>>>>>>     #3
>>>>>>     src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>>        164 inline void ObjectMonitor::inc_ref_count() {
>>>>>>        165   // The increment needs to be MO_SEQ_CST. At the
>>>>>>     moment, the Atomic::inc
>>>>>>        166   // backend on PPC does not yet conform to these
>>>>>>     requirements. Therefore
>>>>>>        167   // the increment is simulated with a load phi; cas
>>>>>>     phi + 1; loop.
>>>>>>        168   // Without this MO_SEQ_CST Atomic::inc simulation,
>>>>>>     AsyncDeflateIdleMonitors
>>>>>>        169   // is not safe.
>>>>>>
>>>>>>     I think was fixed with:
>>>>>>     8202080: Introduce ordering semantics for Atomic::add/inc and
>>>>>>     other RMW atomics
>>>>>>     You should get a leading sync and trailing one with the
>>>>>>     default conservative
>>>>>>     model and thus get proper memory ordering.
>>>>>>     Martin, I'm I correct?
>>>>>>
>>>>>>     Thanks, Robbin
>>>>>>
>>>>>>     On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>>>>     Greetings,
>>>>>>>
>>>>>>>     Welcome to the OpenJDK review thread for my port of
>>>>>>>     Carsten's work on:
>>>>>>>
>>>>>>>           JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>>     Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>
>>>>>>>     https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>>     Here's the webrev URL:
>>>>>>>
>>>>>>>     http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>
>>>>>>>     Here's a link to Carsten's original webrev:
>>>>>>>
>>>>>>>     http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>
>>>>>>>     Earlier versions of this patch have been through several
>>>>>>>     rounds of
>>>>>>>     preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>     Roman for their preliminary code review comments. A very special
>>>>>>>     thanks to Robbin and Roman for building and testing the patch in
>>>>>>>     their own environments (including specJBB2015).
>>>>>>>
>>>>>>>     This version of the patch has been thru Mach5 tier[1-8]
>>>>>>>     testing on
>>>>>>>     Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>     through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>     (product, fastdebug, slowdebug).Earlier versions have run
>>>>>>>     Kitchensink
>>>>>>>     for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>     fastdebug
>>>>>>>     and slowdebug). Earlier versions have run my monitor
>>>>>>>     inflation stress
>>>>>>>     tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64
>>>>>>>     (product,
>>>>>>>     fastdebug and slowdebug).
>>>>>>>
>>>>>>>     All of the testing done on earlier versions will be redone
>>>>>>>     on the
>>>>>>>     latest version of the patch.
>>>>>>>
>>>>>>>     Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>>     Dan
>>>>>>>
>>>>>>>     P.S.
>>>>>>>     One subtest in
>>>>>>>     gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>     is currently failing in -Xcomp mode on Win* only. I've been
>>>>>>>     trying
>>>>>>>     to characterize/analyze this failure for more than a week
>>>>>>>     now. At
>>>>>>>     this point I'm convinced that Async Monitor Deflation is
>>>>>>>     aggravating
>>>>>>>     an existing bug. However, I plan to have a better handle on that
>>>>>>>     failure before these bits are pushed to the jdk/jdk repo.
>>>>>
>>>>>
>>>>
>>>
>>
>