RFR(L) 8153224 Monitor deflation prolong safepoints

Mon Apr 15 16:17:08 UTC 2019

Dan,
Sorry to be so slow to get back to you,

> On Apr 8, 2019, at 9:04 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
> 
> On 4/5/19 4:59 PM, Karen Kinnear wrote:
>> Dan,
>> 
>> Some more minor comments from reading the code:
>  
> Thanks for the additional comments. I'm gathering changes for the next
> round of code review (CR1) so these will be resolved in that round...
> 
> More below...
> 
> 
>> 1. Could you add comments to markOop.hpp about
>> the use in the displaced_mark_word of is_marked to prevent any users of is_marked
>> here from needing to have that information saved/restored?
>  
> I _think_ I know what you're looking for here... Perhaps this:
> 
> src/hotspot/share/oops/markOop.hpp:
>   // ObjectMonitor::install_displaced_markword_in_object() uses
>   // is_marked() on ObjectMonitor::_header as part of the restoration
>   // protocol for an object's header. In this usage, the mark bit is
>   // only ever set (and cleared) on the ObjectMonitor::_header field.
>   bool is_marked()   const {
>     return (mask_bits(value(), lock_mask_in_place) == marked_value);
>   }
Yes - thank you. Potentially prevents future overlaps.
> 
> 
>> 2. In objectMonitor.hpp
>>   in is_busy you clarify the difference in use between _count (which I think you may be changing
>> to _contended) and _ref_count. Could you possibly also comment where you declare them?
>  
> I'll do the rename of _count -> _contentions in a subtask of 8153224
> like the other cleanups of the monitor subsystem.
> 
> Here's the comment in question:
> 
> src/hotspot/share/runtime/objectMonitor.hpp:
>   intptr_t is_busy() const {
>     // TODO-FIXME: merge _count and _waiters.
>     // TODO-FIXME: assert _owner == null implies _recursions = 0
>     // TODO-FIXME: assert _WaitSet != null implies _count > 0
>     // We do not include _ref_count in the is_busy() check because
>     // _ref_count is for indicating that the ObjectMonitor* is in
>     // use which is orthogonal to whether the ObjectMonitor itself
>     // is in use for a locking operation.
>     return _count|_waiters|intptr_t(_owner)|intptr_t(_cxq)|intptr_t(_EntryList);
>   }
> 
> I don't think this comment clarifies _count vs. _ref_count.
> I added the last four lines of the comment and their purpose is to
> describe why _ref_count isn't used by is_busy(). The TODO-FIXME lines
> need to be revisited since (at least) the third one is wrong.
> 
> Here's the existing comment for _ref_count:
> 
>   volatile jint _ref_count;         // ref count for ObjectMonitor*
> 
> Here's the existing comment for _count:
> 
>   volatile jint  _count;            // reference count to prevent reclamation/deflation
>                                     // at stop-the-world time. See ObjectSynchronizer::deflate_monitor().
>                                     // _count is approximately |_WaitSet| + |_EntryList|
> 
> And here's what I proposed to change it to in my reply to your design
> review notes:
> 
>   volatile jint  _contentions;      // Number of active contentions in enter(). It is used by is_busy()
>                                     // along with other fields to determine if an ObjectMonitor can be
>                                     // deflated. See ObjectSynchronizer::deflate_monitor().
> 
> I think we're good here with the proposed change of comment (and the
> rename) for the _contentions field along with existing comment for
> _ref_count and the existing comment for is_busy(). I may delete the
> third TODO-FIXME line as part of the next cleanup.
Thank you for the rename and comment
> 
> 
>> 3. clear_using_JT: would it make sense to have an assertion that  _owner is either null or DEFLATER_MARKER?
>  
> We could add something like:
> 
>   assert(_owner == NULL ||
>          (AsyncDeflateIdleMonitors && _owner == DEFLATER_MARKER),
>          "Fatal logic error in ObjectMonitor owner!");
> 
> and that will catch any races in async monitor deflation where the
> _owner field is set to a monitor owner value (stack addr or thread*).
> For monitor deflation at a safepoint, the non-NULL _owner field is
> caught in clear() (which calls clear_using_JT()).
That covers my concern.
> 
> 
>> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees that 0 < _count
>> with comments that caller ensured _count <= 0
>> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>> — Am I missing something subtle here or should they be the same guarantees?
>  
> Here's the code in question:
> 
> src/hotspot/share/runtime/objectMonitor.cpp:
> 
> void ObjectMonitor::EnterI(TRAPS) {
> <snip>
>   if (_owner == DEFLATER_MARKER) {
>     guarantee(0 < _count, "_owner == DEFLATER_MARKER && _count <= 0 should have been handled by the caller");
>     // Deflater thread tried to lock this monitor, but it failed to make _count negative and gave up.
> 
> void ObjectMonitor::ReenterI(Thread * Self, ObjectWaiter * SelfNode) {
> <snip>
>     if (_owner == DEFLATER_MARKER) {
>       guarantee(0 <= _count, "Impossible: _owner == DEFLATER_MARKER && _count < 0, monitor must not be owned by deflater thread here");
> 
> 
> Reading these two guarantee() calls always throws me off stride
> because I would have written them like this:
> 
>     guarantee(_count > 0, "_owner == DEFLATER_MARKER && _count <= 0 should have been handled by the caller");
> 
> and
> 
>       guarantee(_count >= 0, "Impossible: _owner == DEFLATER_MARKER && _count < 0, monitor must not be owned by deflater thread here");
> 
> When rewritten like the above, you have:
> 
>     "_count > 0" ... _count <= 0
> 
> and:
> 
>     "_count >= 0" ... "_count < 0"
> 
> which is easier for my brain to read... okay... enough sidebar...
> 
> Short answer: No the guarantees should not be the same.
> 
> Longer answer: EnterI() is called by enter() after enter() has
> incremented the _count field to indicate the contended state of
> things. So in EnterI(), "_count > 0" is the right check.
> ReenterI() is called after wait() has returned (notified or
> timedout), and the _count field is not used on reentry ops so
> "_count >= 0" is the right check.
Thank you for walking this through.
Wait also can cause a call to enter() -> EnterI(), but in that case _waiters was set while
the monitor was still owned, so we should never get to the logic where EnterI sees
DEFLATER_MARKER.
> 
> I'm going to tweak the ObjectMonitor::EnterI() code like this (yes,
> there are two places in EnterI() that do this):
> 
>     L501:   if (_owner == DEFLATER_MARKER) {
>               // The deflation protocol finished the first part (setting _owner),
>               // but it failed the second part (making _count negative) and bailed.
>               // Because we're called from enter() we have at least one contention.
>               guarantee(count > 0, "_owner == DEFLATER_MARKER && _count <= 0 should have been handled by the caller");
>     L504:     // Try to acquire monitor.
>     L505:     if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == DEFLATER_MARKER) {
> 
>     L629:     if (_owner == DEFLATER_MARKER) {
>                 // The deflation protocol finished the first part (setting _owner),
>                 // but it failed the second part (making _count negative) and bailed.
>                 // Because we're called from enter() we have at least one contention.
>                 guarantee(count> 0 , "_owner == DEFLATER_MARKER && _count <= 0 should have been handled by the caller");
>     L632:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == DEFLATER_MARKER) {
> 
> And I'm going to tweak the ReenterI() code like this:
> 
>     L759:     if (_owner == DEFLATER_MARKER) {
>                 // The deflation protocol finished the first part (setting _owner),
>                 // but it will observe _waiters != 0 and will bail out. Because we're
>                 // called from wait() we may or may not have any contentions.
>                 guarantee(count >= 0, "Impossible: _owner == DEFLATER_MARKER && _count < 0 should have been handled by the caller");
>     L761:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == DEFLATER_MARKER) {
> 
> 
> You didn't ask this, but it is okay that _count is only used to track
> contentions in enter()/EnterI() and is not used to track contentions
> in wait()/ReenterI(). For the wait()/ReenterI() code path, _waiters is
> used by is_busy() to observe the busy state for an ObjectMonitor that
> is being wait()'ed for. The _waiters field is decremented after a
> waiter has returned from ReenterI() so the _owner field takes over
> answering the is_busy() question…
Yes - that was my confusion and I had not walked it through carefully enough.
And thank you - the guarantees are easier to read this way.
> 
> 
>> 5. I could use a little help with allocation state transitions, 
>> e.g. in deflate_monitor_list_using_JT
>>   you see is_new with object set so you mark it as old so next deflation will check it
>  
> Here's the code in question:
> 
> src/hotspot/share/runtime/synchronizer.cpp:
> 
> int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor** listHeadp,
>                                                       ObjectMonitor** freeHeadp,
>                                                       ObjectMonitor** freeTailp,
>                                                       ObjectMonitor** savedMidInUsep) {
> <snip>
>     // Only try to deflate if there is an associated Java object and if
>     // mid is old (is not newly allocated and is not newly freed).
>     if (mid->object() != NULL && mid->is_old() &&
>         deflate_monitor_using_JT(mid, freeHeadp, freeTailp)) {
>       // Deflation succeeded so update the in-use list.
> <snip>
>     } else {
>       // mid is considered in-use if it does not have an associated
>       // Java object or mid is not old or deflation did not succeed.
>       // A mid->is_new() node can be seen here when it is freshly returned
>       // by omAlloc() (and skips the deflation code path).
>       // A mid->is_old() node can be seen here when deflation failed.
>       // A mid->is_free() node can be seen here when a fresh node from
>       // omAlloc() is released by omRelease() due to losing the race
>       // in inflate().
> 
>       if (mid->object() != NULL && mid->is_new()) {
>         // mid has an associated Java object and has now been seen
>         // as newly allocated so mark it as "old".
>         mid->set_allocation_state(ObjectMonitor::Old);
>       }
> 
>>   - why do you set it to old here rather than in inflate once we set values?
> 
> Inflation is used in quite a few places. If we marked the
> ObjectMonitor as "Old" in inflate(), then that would make the
> ObjectMonitor available for deflation by deflate_monitor_using_JT()
> earlier:
> 
> src/hotspot/share/runtime/synchronizer.cpp:
>> bool ObjectSynchronizer::deflate_monitor_using_JT(ObjectMonitor* mid,
>>                                                   ObjectMonitor** freeHeadp,
>>                                                   ObjectMonitor** freeTailp) {
>>   assert(AsyncDeflateIdleMonitors, "sanity check");
>>   assert(Thread::current()->is_Java_thread(), "precondition");
>>   // A newly allocated ObjectMonitor should not be seen here so we
>>   // avoid an endless inflate/deflate cycle.
>>   assert(mid->is_old(), "precondition");
> 
> So the idea behind only deflating ObjectMonitors that have reached
> allocation state "Old" is to prevent "an endless inflate/deflate cycle".
> Here's the relevant section from Carsten's JEP:
> 
>> To avoid endless inflation / deflation cycles in the prototype, monitor
>> deflation is only attempted the second time a monitor is seen by the
>> thread marking monitors as deflatable: If the thread (the only thread
>> marking monitors as deflatable; might be service thread or some GC
>> related thread or even a dedicated thread) sees a monitor in state New,
>> then the thread marks the monitor as Old and moves on. So there is
>> little interaction between a thread inflating a lock to a monitor and
>> the deflating thread, the inflating thread just has to make sure the
>> monitor is marked New and this marker is published using appropriate
>> barriers.
> 
> There isn't an explicit example in the JEP of what Carsten was thinking
> of with "an endless inflate/deflate cycle". I didn't try to think of
> such an example for the OpenJDK wiki either. I simple wrote:
> 
>> ObjectMonitor has a new allocation_state field that supports three
>> states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied
>> to ObjectMonitors that have reached the 'Old' state. When the Async
>> Monitor Deflation code sees an ObjectMonitor in the 'New' state, it
>> is changed to the 'Old' state, but is not deflated. This prevents a
>> newly allocated ObjectMonitor from being immediately deflated which
>> could cause an inflation<->deflation oscillation.
> 
> So let's think about what might happen if an ObjectMonitor is marked
> as "Old" in inflate(). Here's an example use of inflate() in the
> "slow enter" code path:
> 
> src/hotspot/share/runtime/synchronizer.cpp:
> > void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
> <snip>
> base<    inflate(THREAD, obj(), inflate_cause_monitor_enter)->enter(THREAD);
> 
> new>     ObjectMonitorHandle omh;
> new>     inflate(&omh, THREAD, obj(), inflate_cause_monitor_enter);
> new>     do_loop = !omh.om_ptr()->enter(THREAD);
> 
> In the "base" code, we took the return from inflate() and used it to call
> ObjectMonitor::enter(). If we never changed that bit of code and inflate()
> marked the ObjectMonitor as "Old", then deflate_monitor_using_JT() could
> async deflate the ObjectMonitor while we were trying to call enter() on
> it... Boom! So we might think that holding off marking an ObjectMonitor
> as "Old" can save us... and it can, but not in all cases... :-(
> 
> It is entirely possible that our call to slow_enter() is made on an
> ObjectMonitor that's already marked "Old". In that case, our thread
> (T-enter) calls inflate() which returns the existing ObjectMonitor*
> and we use it to call enter(). If the thread (T-deflate) calling
> deflate_monitor_using_JT() does its magic before T-enter sets the
> owner field or the count field... Boom!
> 
> The previous paragraph is exactly what motivated the _ref_count field,
> the ObjectMonitorHandle helper, and adding an ObjectMonitorHandle*
> parameter to inflate(). inflate() calls ObjectMonitorHandle::save_om_ptr()
> which increments the ObjectMonitor's ref_count and then checks for async
> deflation protocol collisions. If there's a collision, then save_om_ptr()
> returns false and the caller (inflate() in this case) has to retry. When
> inflate() returns, the ObjectMonitor in the ObjectMonitorHandle cannot
> be deflated and is safe until the ObjectMonitorHandle is destroyed.
> 
> So by changing T-enter to use an ObjectMonitorHandle, T-deflate cannot
> deflate the ObjectMonitor in the window after inflate() returns and
> before T-enter sets the owner field or increments the count field. But
> you know all that already!
> So let's bring this back to having inflate() mark the ObjectMonitor as
> "Old"... Since inflate() returns an ObjectMonitor with the ref_count > 0,
> it doesn't matter if the ObjectMonitor is marked as "Old" in inflate().
> T-deflate cannot deflate it due to ref_count > 0.
> 
> Here's another crazy thought... inflate() is the only function that
> calls omAlloc(), and omAlloc() is the only function that sets "New".
> If we move the setting of "Old" from deflate_monitor_list_using_JT()
> to inflate(), then the change from "New" -> "Old" never happens
> outside of the inflate() call so why do we need the allocation state?
That was my next question.
> 
> Small dose of reality: I've found having the allocation state to be
> very helpful when debugging race related crashes. We could make the
> allocation state be DEBUG_ONLY, but then what about race debugging of
> product bits... sigh...
> 
> 
>> 6. Could you get rid of the new goto’s? 
>  
> I believe there is only one left from Carsten's prototype:
> 
> src/hotspot/share/runtime/synchronizer.cpp:
> 
>> intptr_t ObjectSynchronizer::FastHashCode(Thread * Self, oop obj) {
> <snip>
>>   } else if (mark->has_monitor()) {
>>     ObjectMonitorHandle omh;
>>     if (!omh.save_om_ptr(obj, mark)) {
>>       // Lost a race with async deflation so try again.
>>       assert(AsyncDeflateIdleMonitors, "sanity check");
>>       goto Retry;
>>     }
> 
> I can change FastHashCode() to use the same "while (do_loop)" as the
> other code that needs to do retries...
> 
Thank you.
> 
>> 7. On the updated wiki for the hash race example:
>> Racing Threads: “T-hash is about to inc the ref_count field”
>> actually - T-hash just did - ref_count == 1 - so maybe change middle values
>  
> Actually, we're talking about the set up for the race and the
> diagram shows "ref_count == 1" and should show "ref_count == 0".
> So I have fixed that on the "Racing Threads" diagram.
> 
> In the following "T-deflate Wins" and "T-hash Wins" diagrams,
> "ref_count == 1" is shown in both initial race results ObjectMonitor
> box. In "T-deflate Wins", it shows ref_count being restored to
> 0 in the second ObjectMonitor box.
> 
> Thanks for catching this error. I've fixed it on the wiki.
Thanks.
> 
> 
>> 
>> 8. There is an old comment in FastHashCode 
>> that 
>>  // WARNING:
>>     //   The displaced header is strictly immutable.
>>     // It can NOT be changed in ANY cases.
>> 
>> I presume that only applies to the displaced header for a stack lock - could you
>> possibly update that while you are in the code?
>  
> Here's the whole comment:
> 
>>     // WARNING:
>>     //   The displaced header is strictly immutable.
>>     // It can NOT be changed in ANY cases. So we have
>>     // to inflate the header into heavyweight monitor
>>     // even the current thread owns the lock. The reason
>>     // is the BasicLock (stack slot) will be asynchronously
>>     // read by other threads during the inflate() function.
>>     // Any change to stack may not propagate to other threads
>>     // correctly.
> 
> That comment applies the displaced header that's in the BasicLock
> on the thread's stack and it definitely needs some cleaning up
> independent of the Async Monitor Deflation project.
Thank you.
> 
> 
>> Also in FastHashCode
>>       // The only update to the header in the monitor (outside GC)
>>  823       // is install the hash code. If someone add new usage of
>>  824       // displaced header, please update this code
>> Can you update that comment as well? I know you’ve already updated the code logic.
>  
> I'll revisit that comment as well. I believe Carsten updated it in
> his prototype, but when I backed out that change when I simplified
> the hashcode stuff due to ObjectMonitorHandles/ref_count.
Thank you. And if you are revisiting this to potentially call install_displaced_markword then this 
would change yet again.
> 
> 
>> So I walked the logic for the hashcode interactions - I didn’t find any holes. Thank you for walking most of it in email/wiki.
>> In particular, inflate does the save_om_ptr dance to inc_ref_count, so this code above will
>> be called while preventing async deflation.
>  
> Right.
> 
> 
>> 9. install_displaced_markword_in_object
>> What happens if the cas_set_mark fails?
>  
> Here's the code in question:
> 
> src/hotspot/share/runtime/objectMonitor.cpp:
> 
>> void ObjectMonitor::install_displaced_markword_in_object() {
> <snip>
>>   if (dmw->is_marked()) {
>>     // The dmw copy is marked which means a hash was not set by a racing
>>     // thread. Clear the mark from the copy in preparation for possible
>>     // restoration from this thread.
>>     assert(dmw->hash() == 0, "must be 0: hash=" INTPTR_FORMAT, dmw->hash());
>>     dmw = dmw->set_unmarked();
>>   }
>>   assert(dmw->is_neutral(), "must be a neutral markword");
>> 
>>   oop const obj = (oop) object();
>>   // Install displaced markword if object markword still points to this
>>   // monitor. Both the mutator trying to enter() and the thread deflating
>>   // the monitor will reach this point, but only one can win.
>>   // Note: If a mutator won the cmpxchg() race above and installed a hash
>>   // in _header, then the updated dmw contains that hash and we'll install
>>   // it in the object's markword here.
>>   obj->cas_set_mark(dmw, markOopDesc::encode(this));
> 
> We don't check the return from cas_set_mark() here intentionally.
> If we have just T-enter and T-deflate racing through this code,
> then after the "if (dmw->is_marked()) {" block, both threads
> will have the same 'dmw' value. One thread will set it and the
> other thread will fail to set it, but we don't care because both
> threads wanted to set the same value... As a result of the
> cas_set_mark() call in both threads, both threads will see the
> same value in the object's header (if they happen to look).
> 
> I talk about this in the "Either Wins the Second Race" sub-section
> on the wiki.
Yes for the current model of only these two callers, neither modifying the dmw other than
set/clear is_marked) bit. If we extend to FastHashCode - this gets trickier. 
> 
> 
>> I get that today this handles the race with enter and deflate_monitor_using_JT. If we remove
>> the call from enter, is the expectation that we’ve blocked all others who did not set is_marked themselves?
>> If we remove the call from enter would it make sense to ensure that the cas_set_mark succeeds here?
>  
> If we remove the install_displaced_markword_in_object() call from enter(),
> then I don't think we need install_displaced_markword_in_object() at
> all and can restore the object's header with:
> 
>     // Restore the header back to obj
>     obj->release_set_mark(mid->header());
> 
> just like ObjectSynchronizer::deflate_monitor(). The question is
> whether we think install_displaced_markword_in_object() buys us
> something other than "help" in restoring the object's header.
Yes - that was the question - it adds complexity. Does it help threads make progress.
Thinking about this more and reading Carsten’s reply - I think it does - especially
since seeing DEFLATION_MARKER can make the checker loop, but in the meantime someone
else can acquire the monitor - so that might be a non-trivial spin time.
> 
> 
>> 10. Is there any benefit in a bit of stress testing with something like a temporary flag that deflates in
>> mAlloc each time it is called?
>  
> Maybe? :-) Something like DeflateAsyncMonitorsALot? Can you eloborate
> on your thinking a bit?
Just if you thought it might help shake multi-thread timing testing. If you think it is all shaken out
already - no need.
> 
> 
>> Looking forward to the performance runs as well as the latency numbers.
>  
> I posted the SPECjbb2015 numbers from this past weekend earlier today.
> Rather disappointing on my T7600's... Neutral on my MacMini…
Thank you. And Claus’ response was to be sure they are meaningful before deep-diving - so
reproduce multiple times kind of thing.
> 
> When you say "latency numbers", what do you mean? Do you mean how long
> ObjectMonitors that could be deflated are kept inflated? Or do you mean
> something else?
latency I was thinking about was the reduced time during safepoint cleanup - which is one of the goals of this
exercise.
> 
> I think I've responded to everything. Please let me know if I missed
> something…
You got it all.

many thanks,
Karen
> 
> Dan
> 
> 
> 
>> 
>> thanks,
>> Karen
>> 
>>> On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>>> 
>>> Filed:
>>> 
>>>     JDK-8222034 Thread-SMR functions should be updated to remove work around
>>>     https://bugs.openjdk.java.net/browse/JDK-8222034 <https://bugs.openjdk.java.net/browse/JDK-8222034>
>>> 
>>> Martin and Robbin, please check it out and make sure that I captured
>>> things correctly...
>>> 
>>> Dan
>>> 
>>> 
>>> 
>>> On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>>> On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>>> Hi everybody,
>>>>> 
>>>>>> I think was fixed with:
>>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and other RMW atomics
>>>>>> You should get a leading sync and trailing one with the default conservative
>>>>>> model and thus get proper memory ordering.
>>>>>> Martin, I'm I correct?
>>>>> Exactly. Thanks for pointing this out. PPC uses the strongest possible ordering semantics with memory_order_conservative (default parameter).
>>>>> I've seen that comment about PPC in "void ThreadsList::inc_nested_handle_cnt()". This function could get replaced.
>>>> 
>>>> Okay so we need a new bug to update these two Thread-SMR functions:
>>>> 
>>>> src/hotspot/share/runtime/threadSMR.cpp:
>>>> 
>>>> void ThreadsList::dec_nested_handle_cnt() {
>>>>   // The decrement needs to be MO_ACQ_REL. At the moment, the Atomic::dec
>>>>   // backend on PPC does not yet conform to these requirements. Therefore
>>>>   // the decrement is simulated with an Atomic::sub(1, &addr).
>>>>   // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR mechanism
>>>>   // is not generally safe to use.
>>>>   Atomic::sub(1, &_nested_handle_cnt);
>>>> }
>>>> 
>>>> void ThreadsList::inc_nested_handle_cnt() {
>>>>   // The increment needs to be MO_SEQ_CST. At the moment, the Atomic::inc
>>>>   // backend on PPC does not yet conform to these requirements. Therefore
>>>>   // the increment is simulated with a load phi; cas phi + 1; loop.
>>>>   // Without this MO_SEQ_CST Atomic::inc simulation, the nested SMR mechanism
>>>>   // is not generally safe to use.
>>>>   intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>   for (;;) {
>>>>     if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt, sample) == sample) {
>>>>       return;
>>>>     } else {
>>>>       sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>     }
>>>>   }
>>>> }
>>>> 
>>>> I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>>> sure we're all in agreement. Once we decide that Thread-SMR's
>>>> functions look like, I'll adapt my Async Monitor Deflation
>>>> functions...
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>>> 
>>>>> Best regards,
>>>>> Martin
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>
>>>>> Sent: Freitag, 5. April 2019 14:07
>>>>> To: daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>; hotspot-runtime-dev at openjdk.java.net <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten Varming <varming at gmail.com <mailto:varming at gmail.com>>; Roman Kennke <rkennke at redhat.com <mailto:rkennke at redhat.com>>; Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>>> Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>> 
>>>>> Hi Dan,
>>>>> 
>>>>> (Martin there is question for you last in this email)
>>>>> 
>>>>> After first pass I did not find any real issues.
>>>>> Considering what you had to work with, it looks good!
>>>>> 
>>>>> #1
>>>>> There are some assert which are redundant (to me at least) like:
>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>> L445
>>>>>     if (!dmw->is_marked() && dmw->hash() == 0) {
>>>>>       // This dmw is neutral and has not yet started the restoration
>>>>>       // protocol so we mark a copy of the dmw to begin the protocol.
>>>>>       markOop marked_dmw = dmw->set_marked();
>>>>>       assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>>>              "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>>>              marked_dmw->is_marked(), marked_dmw->hash());
>>>>> 
>>>>> That assert is basically a test that set_marked worked?
>>>>> 
>>>>> L505
>>>>>       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == DEFLATER_MARKER) {
>>>>>         assert(_succ != Self, "invariant");
>>>>>         assert(_owner == Self, "invariant");
>>>>> 
>>>>> Assert on _owner checks that our cmpxchg is not broken?
>>>>> 
>>>>> I think it's easier to read the code if some on the most obvious asserts are
>>>>> removed. Maybe comments instead.
>>>>> 
>>>>> #2
>>>>> Not your doing but I think we should remove TRAPS/Thread * Self and use
>>>>> JavaThread* instead.
>>>>> E.g. so we can change:
>>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>>>     Thread * const Self = THREAD;
>>>>>     assert(Self->is_Java_thread(), "invariant");
>>>>>     assert(((JavaThread *) Self)->thread_state() == _thread_blocked, "invariant");
>>>>> 
>>>>> to:
>>>>> 
>>>>> void ObjectMonitor::EnterI(JavaThread* Self) {
>>>>>     assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>> 
>>>>> #3
>>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>    164 inline void ObjectMonitor::inc_ref_count() {
>>>>>    165   // The increment needs to be MO_SEQ_CST. At the moment, the Atomic::inc
>>>>>    166   // backend on PPC does not yet conform to these requirements. Therefore
>>>>>    167   // the increment is simulated with a load phi; cas phi + 1; loop.
>>>>>    168   // Without this MO_SEQ_CST Atomic::inc simulation, AsyncDeflateIdleMonitors
>>>>>    169   // is not safe.
>>>>> 
>>>>> I think was fixed with:
>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and other RMW atomics
>>>>> You should get a leading sync and trailing one with the default conservative
>>>>> model and thus get proper memory ordering.
>>>>> Martin, I'm I correct?
>>>>> 
>>>>> Thanks, Robbin
>>>>> 
>>>>> On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>> 
>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>> 
>>>>>>       JDK-8153224 Monitor deflation prolong safepoints
>>>>>>       https://bugs.openjdk.java.net/browse/JDK-8153224 <https://bugs.openjdk.java.net/browse/JDK-8153224>
>>>>>> 
>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>> 
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation <https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation>
>>>>>> 
>>>>>> Here's the webrev URL:
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/ <http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/>
>>>>>> 
>>>>>> Here's a link to Carsten's original webrev:
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/ <http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/>
>>>>>> 
>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>> Roman for their preliminary code review comments. A very special
>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>> their own environments (including specJBB2015).
>>>>>> 
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>> fastdebug and slowdebug).
>>>>>> 
>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>> latest version of the patch.
>>>>>> 
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>> 
>>>>>> Dan
>>>>>> 
>>>>>> P.S.
>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>> 
>>>> 
>>> 
>> 
>