RFR(L) 8153224 Monitor deflation prolong safepoints

Tue Apr 9 01:04:34 UTC 2019

On 4/5/19 4:59 PM, Karen Kinnear wrote:
> Dan,
>
> Some more minor comments from reading the code:

Thanks for the additional comments. I'm gathering changes for the next
round of code review (CR1) so these will be resolved in that round...

More below...

> 1. Could you add comments to markOop.hpp about
> the use in the displaced_mark_word of is_marked to prevent any users 
> of is_marked
> here from needing to have that information saved/restored?

I _think_ I know what you're looking for here... Perhaps this:

src/hotspot/share/oops/markOop.hpp:
   // ObjectMonitor::install_displaced_markword_in_object() uses
   // is_marked() on ObjectMonitor::_header as part of the restoration
   // protocol for an object's header. In this usage, the mark bit is
   // only ever set (and cleared) on the ObjectMonitor::_header field.
   bool is_marked()   const {
     return (mask_bits(value(), lock_mask_in_place) == marked_value);
   }

> 2. In objectMonitor.hpp
>   in is_busy you clarify the difference in use between _count (which I 
> think you may be changing
> to _contended) and _ref_count. Could you possibly also comment where 
> you declare them?

I'll do the rename of _count -> _contentions in a subtask of 8153224
like the other cleanups of the monitor subsystem.

Here's the comment in question:

src/hotspot/share/runtime/objectMonitor.hpp:
   intptr_t is_busy() const {
     // TODO-FIXME: merge _count and _waiters.
     // TODO-FIXME: assert _owner == null implies _recursions = 0
     // TODO-FIXME: assert _WaitSet != null implies _count > 0
     // We do not include _ref_count in the is_busy() check because
     // _ref_count is for indicating that the ObjectMonitor* is in
     // use which is orthogonal to whether the ObjectMonitor itself
     // is in use for a locking operation.
     return 
_count|_waiters|intptr_t(_owner)|intptr_t(_cxq)|intptr_t(_EntryList);
   }

I don't think this comment clarifies _count vs. _ref_count.
I added the last four lines of the comment and their purpose is to
describe why _ref_count isn't used by is_busy(). The TODO-FIXME lines
need to be revisited since (at least) the third one is wrong.

Here's the existing comment for _ref_count:

   volatile jint _ref_count;         // ref count for ObjectMonitor*

Here's the existing comment for _count:

   volatile jint  _count;            // reference count to prevent 
reclamation/deflation
                                     // at stop-the-world time. See 
ObjectSynchronizer::deflate_monitor().
                                     // _count is approximately 
|_WaitSet| + |_EntryList|

And here's what I proposed to change it to in my reply to your design
review notes:

   volatile jint  _contentions;      // Number of active contentions in 
enter(). It is used by is_busy()
                                     // along with other fields to 
determine if an ObjectMonitor can be
                                     // deflated. See 
ObjectSynchronizer::deflate_monitor().

I think we're good here with the proposed change of comment (and the
rename) for the _contentions field along with existing comment for
_ref_count and the existing comment for is_busy(). I may delete the
third TODO-FIXME line as part of the next cleanup.

> 3. clear_using_JT: would it make sense to have an assertion that 
>  _owner is either null or DEFLATER_MARKER?

We could add something like:

   assert(_owner == NULL ||
          (AsyncDeflateIdleMonitors && _owner == DEFLATER_MARKER),
          "Fatal logic error in ObjectMonitor owner!");

and that will catch any races in async monitor deflation where the
_owner field is set to a monitor owner value (stack addr or thread*).
For monitor deflation at a safepoint, the non-NULL _owner field is
caught in clear() (which calls clear_using_JT()).

> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees that 0 
> < _count
> with comments that caller ensured _count <= 0
> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
> — Am I missing something subtle here or should they be the same 
> guarantees?

Here's the code in question:

src/hotspot/share/runtime/objectMonitor.cpp:

void ObjectMonitor::EnterI(TRAPS) {
<snip>
   if (_owner == DEFLATER_MARKER) {
     guarantee(0 < _count, "_owner == DEFLATER_MARKER && _count <= 0 
should have been handled by the caller");
     // Deflater thread tried to lock this monitor, but it failed to 
make _count negative and gave up.

void ObjectMonitor::ReenterI(Thread * Self, ObjectWaiter * SelfNode) {
<snip>
     if (_owner == DEFLATER_MARKER) {
       guarantee(0 <= _count, "Impossible: _owner == DEFLATER_MARKER && 
_count < 0, monitor must not be owned by deflater thread here");

Reading these two guarantee() calls always throws me off stride
because I would have written them like this:

     guarantee(_count > 0, "_owner == DEFLATER_MARKER && _count <= 0 
should have been handled by the caller");

and

       guarantee(_count >= 0, "Impossible: _owner == DEFLATER_MARKER && 
_count < 0, monitor must not be owned by deflater thread here");

When rewritten like the above, you have:

     "_count > 0" ... _count <= 0

and:

     "_count >= 0" ... "_count < 0"

which is easier for my brain to read... okay... enough sidebar...

Short answer: No the guarantees should not be the same.

Longer answer: EnterI() is called by enter() after enter() has
incremented the _count field to indicate the contended state of
things. So in EnterI(), "_count > 0" is the right check.
ReenterI() is called after wait() has returned (notified or
timedout), and the _count field is not used on reentry ops so
"_count >= 0" is the right check.

I'm going to tweak the ObjectMonitor::EnterI() code like this (yes,
there are two places in EnterI() that do this):

     L501:   if (_owner == DEFLATER_MARKER) {
               // The deflation protocol finished the first part 
(setting _owner),
               // but it failed the second part (making _count negative) 
and bailed.
               // Because we're called from enter() we have at least one 
contention.
               guarantee(count > 0, "_owner == DEFLATER_MARKER && _count 
<= 0 should have been handled by the caller");
     L504:     // Try to acquire monitor.
     L505:     if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == 
DEFLATER_MARKER) {

     L629:     if (_owner == DEFLATER_MARKER) {
                 // The deflation protocol finished the first part 
(setting _owner),
                 // but it failed the second part (making _count 
negative) and bailed.
                 // Because we're called from enter() we have at least 
one contention.
                 guarantee(count> 0 , "_owner == DEFLATER_MARKER && 
_count <= 0 should have been handled by the caller");
     L632:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == 
DEFLATER_MARKER) {

And I'm going to tweak the ReenterI() code like this:

     L759:     if (_owner == DEFLATER_MARKER) {
                 // The deflation protocol finished the first part 
(setting _owner),
                 // but it will observe _waiters != 0 and will bail out. 
Because we're
                 // called from wait() we may or may not have any 
contentions.
                 guarantee(count >= 0, "Impossible: _owner == 
DEFLATER_MARKER && _count < 0 should have been handled by the caller");
     L761:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == 
DEFLATER_MARKER) {

You didn't ask this, but it is okay that _count is only used to track
contentions in enter()/EnterI() and is not used to track contentions
in wait()/ReenterI(). For the wait()/ReenterI() code path, _waiters is
used by is_busy() to observe the busy state for an ObjectMonitor that
is being wait()'ed for. The _waiters field is decremented after a
waiter has returned from ReenterI() so the _owner field takes over
answering the is_busy() question...

> 5. I could use a little help with allocation state transitions,
> e.g. in deflate_monitor_list_using_JT
>   you see is_new with object set so you mark it as old so next 
> deflation will check it

Here's the code in question:

src/hotspot/share/runtime/synchronizer.cpp:

int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor** 
listHeadp,
ObjectMonitor** freeHeadp,
ObjectMonitor** freeTailp,
ObjectMonitor** savedMidInUsep) {
<snip>
     // Only try to deflate if there is an associated Java object and if
     // mid is old (is not newly allocated and is not newly freed).
     if (mid->object() != NULL && mid->is_old() &&
         deflate_monitor_using_JT(mid, freeHeadp, freeTailp)) {
       // Deflation succeeded so update the in-use list.
<snip>
     } else {
       // mid is considered in-use if it does not have an associated
       // Java object or mid is not old or deflation did not succeed.
       // A mid->is_new() node can be seen here when it is freshly returned
       // by omAlloc() (and skips the deflation code path).
       // A mid->is_old() node can be seen here when deflation failed.
       // A mid->is_free() node can be seen here when a fresh node from
       // omAlloc() is released by omRelease() due to losing the race
       // in inflate().

       if (mid->object() != NULL && mid->is_new()) {
         // mid has an associated Java object and has now been seen
         // as newly allocated so mark it as "old".
         mid->set_allocation_state(ObjectMonitor::Old);
       }

>   - why do you set it to old here rather than in inflate once we set 
> values?

Inflation is used in quite a few places. If we marked the
ObjectMonitor as "Old" in inflate(), then that would make the
ObjectMonitor available for deflation by deflate_monitor_using_JT()
earlier:

src/hotspot/share/runtime/synchronizer.cpp:
> bool ObjectSynchronizer::deflate_monitor_using_JT(ObjectMonitor* mid,
> ObjectMonitor** freeHeadp,
> ObjectMonitor** freeTailp) {
>   assert(AsyncDeflateIdleMonitors, "sanity check");
>   assert(Thread::current()->is_Java_thread(), "precondition");
>   // A newly allocated ObjectMonitor should not be seen here so we
>   // avoid an endless inflate/deflate cycle.
>   assert(mid->is_old(), "precondition");

So the idea behind only deflating ObjectMonitors that have reached
allocation state "Old" is to prevent "an endless inflate/deflate cycle".
Here's the relevant section from Carsten's JEP:

> To avoid endless inflation / deflation cycles in the prototype, monitor
> deflation is only attempted the second time a monitor is seen by the
> thread marking monitors as deflatable: If the thread (the only thread
> marking monitors as deflatable; might be service thread or some GC
> related thread or even a dedicated thread) sees a monitor in state New,
> then the thread marks the monitor as Old and moves on. So there is
> little interaction between a thread inflating a lock to a monitor and
> the deflating thread, the inflating thread just has to make sure the
> monitor is marked New and this marker is published using appropriate
> barriers.

There isn't an explicit example in the JEP of what Carsten was thinking
of with "an endless inflate/deflate cycle". I didn't try to think of
such an example for the OpenJDK wiki either. I simple wrote:

> ObjectMonitor has a new allocation_state field that supports three
> states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied
> to ObjectMonitors that have reached the 'Old' state. When the Async
> Monitor Deflation code sees an ObjectMonitor in the 'New' state, it
> is changed to the 'Old' state, but is not deflated. This prevents a
> newly allocated ObjectMonitor from being immediately deflated which
> could cause an inflation<->deflation oscillation.

So let's think about what might happen if an ObjectMonitor is marked
as "Old" in inflate(). Here's an example use of inflate() in the
"slow enter" code path:

src/hotspot/share/runtime/synchronizer.cpp:
 > void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
<snip>
base<    inflate(THREAD, obj(), inflate_cause_monitor_enter)->enter(THREAD);

new>     ObjectMonitorHandle omh;
new>     inflate(&omh, THREAD, obj(), inflate_cause_monitor_enter);
new>     do_loop = !omh.om_ptr()->enter(THREAD);

In the "base" code, we took the return from inflate() and used it to call
ObjectMonitor::enter(). If we never changed that bit of code and inflate()
marked the ObjectMonitor as "Old", then deflate_monitor_using_JT() could
async deflate the ObjectMonitor while we were trying to call enter() on
it... Boom! So we might think that holding off marking an ObjectMonitor
as "Old" can save us... and it can, but not in all cases... :-(

It is entirely possible that our call to slow_enter() is made on an
ObjectMonitor that's already marked "Old". In that case, our thread
(T-enter) calls inflate() which returns the existing ObjectMonitor*
and we use it to call enter(). If the thread (T-deflate) calling
deflate_monitor_using_JT() does its magic before T-enter sets the
owner field or the count field... Boom!

The previous paragraph is exactly what motivated the _ref_count field,
the ObjectMonitorHandle helper, and adding an ObjectMonitorHandle*
parameter to inflate(). inflate() calls ObjectMonitorHandle::save_om_ptr()
which increments the ObjectMonitor's ref_count and then checks for async
deflation protocol collisions. If there's a collision, then save_om_ptr()
returns false and the caller (inflate() in this case) has to retry. When
inflate() returns, the ObjectMonitor in the ObjectMonitorHandle cannot
be deflated and is safe until the ObjectMonitorHandle is destroyed.

So by changing T-enter to use an ObjectMonitorHandle, T-deflate cannot
deflate the ObjectMonitor in the window after inflate() returns and
before T-enter sets the owner field or increments the count field. But
you know all that already!

So let's bring this back to having inflate() mark the ObjectMonitor as
"Old"... Since inflate() returns an ObjectMonitor with the ref_count > 0,
it doesn't matter if the ObjectMonitor is marked as "Old" in inflate().
T-deflate cannot deflate it due to ref_count > 0.

Here's another crazy thought... inflate() is the only function that
calls omAlloc(), and omAlloc() is the only function that sets "New".
If we move the setting of "Old" from deflate_monitor_list_using_JT()
to inflate(), then the change from "New" -> "Old" never happens
outside of the inflate() call so why do we need the allocation state?

Small dose of reality: I've found having the allocation state to be
very helpful when debugging race related crashes. We could make the
allocation state be DEBUG_ONLY, but then what about race debugging of
product bits... sigh...

> 6. Could you get rid of the new goto’s?

I believe there is only one left from Carsten's prototype:

src/hotspot/share/runtime/synchronizer.cpp:

> intptr_t ObjectSynchronizer::FastHashCode(Thread * Self, oop obj) {
<snip>
>   } else if (mark->has_monitor()) {
>     ObjectMonitorHandle omh;
>     if (!omh.save_om_ptr(obj, mark)) {
>       // Lost a race with async deflation so try again.
>       assert(AsyncDeflateIdleMonitors, "sanity check");
>       goto Retry;
>     }

I can change FastHashCode() to use the same "while (do_loop)" as the
other code that needs to do retries...

> 7. On the updated wiki for the hash race example:
> Racing Threads: “T-hash is about to inc the ref_count field”
> actually - T-hash just did - ref_count == 1 - so maybe change middle 
> values

Actually, we're talking about the set up for the race and the
diagram shows "ref_count == 1" and should show "ref_count == 0".
So I have fixed that on the "Racing Threads" diagram.

In the following "T-deflate Wins" and "T-hash Wins" diagrams,
"ref_count == 1" is shown in both initial race results ObjectMonitor
box. In "T-deflate Wins", it shows ref_count being restored to
0 in the second ObjectMonitor box.

Thanks for catching this error. I've fixed it on the wiki.

>
> 8. There is an old comment in FastHashCode
> that
>  // WARNING:
>     //   The displaced header is strictly immutable.
>     // It can NOT be changed in ANY cases.
>
> I presume that only applies to the displaced header for a stack lock - 
> could you
> possibly update that while you are in the code?

Here's the whole comment:

>     // WARNING:
>     //   The displaced header is strictly immutable.
>     // It can NOT be changed in ANY cases. So we have
>     // to inflate the header into heavyweight monitor
>     // even the current thread owns the lock. The reason
>     // is the BasicLock (stack slot) will be asynchronously
>     // read by other threads during the inflate() function.
>     // Any change to stack may not propagate to other threads
>     // correctly.

That comment applies the displaced header that's in the BasicLock
on the thread's stack and it definitely needs some cleaning up
independent of the Async Monitor Deflation project.

> Also in FastHashCode
>       // The only update to the header in the monitor (outside GC)
>   823       // is install the hash code. If someone add new usage of
>   824       // displaced header, please update this code
> Can you update that comment as well? I know you’ve already updated the 
> code logic.

I'll revisit that comment as well. I believe Carsten updated it in
his prototype, but when I backed out that change when I simplified
the hashcode stuff due to ObjectMonitorHandles/ref_count.

> So I walked the logic for the hashcode interactions - I didn’t find 
> any holes. Thank you for walking most of it in email/wiki.
> In particular, inflate does the save_om_ptr dance to inc_ref_count, so 
> this code above will
> be called while preventing async deflation.

Right.

> 9. install_displaced_markword_in_object
> What happens if the cas_set_mark fails?

Here's the code in question:

src/hotspot/share/runtime/objectMonitor.cpp:

> void ObjectMonitor::install_displaced_markword_in_object() {
<snip>
>   if (dmw->is_marked()) {
>     // The dmw copy is marked which means a hash was not set by a racing
>     // thread. Clear the mark from the copy in preparation for possible
>     // restoration from this thread.
>     assert(dmw->hash() == 0, "must be 0: hash=" INTPTR_FORMAT, 
> dmw->hash());
>     dmw = dmw->set_unmarked();
>   }
>   assert(dmw->is_neutral(), "must be a neutral markword");
>
>   oop const obj = (oop) object();
>   // Install displaced markword if object markword still points to this
>   // monitor. Both the mutator trying to enter() and the thread deflating
>   // the monitor will reach this point, but only one can win.
>   // Note: If a mutator won the cmpxchg() race above and installed a hash
>   // in _header, then the updated dmw contains that hash and we'll install
>   // it in the object's markword here.
>   obj->cas_set_mark(dmw, markOopDesc::encode(this));

We don't check the return from cas_set_mark() here intentionally.
If we have just T-enter and T-deflate racing through this code,
then after the "if (dmw->is_marked()) {" block, both threads
will have the same 'dmw' value. One thread will set it and the
other thread will fail to set it, but we don't care because both
threads wanted to set the same value... As a result of the
cas_set_mark() call in both threads, both threads will see the
same value in the object's header (if they happen to look).

I talk about this in the "Either Wins the Second Race" sub-section
on the wiki.

> I get that today this handles the race with enter and 
> deflate_monitor_using_JT. If we remove
> the call from enter, is the expectation that we’ve blocked all others 
> who did not set is_marked themselves?
> If we remove the call from enter would it make sense to ensure that 
> the cas_set_mark succeeds here?

If we remove the install_displaced_markword_in_object() call from enter(),
then I don't think we need install_displaced_markword_in_object() at
all and can restore the object's header with:

     // Restore the header back to obj
     obj->release_set_mark(mid->header());

just like ObjectSynchronizer::deflate_monitor(). The question is
whether we think install_displaced_markword_in_object() buys us
something other than "help" in restoring the object's header.

> 10. Is there any benefit in a bit of stress testing with something 
> like a temporary flag that deflates in
> mAlloc each time it is called?

Maybe? :-) Something like DeflateAsyncMonitorsALot? Can you eloborate
on your thinking a bit?

> Looking forward to the performance runs as well as the latency numbers.

I posted the SPECjbb2015 numbers from this past weekend earlier today.
Rather disappointing on my T7600's... Neutral on my MacMini...

When you say "latency numbers", what do you mean? Do you mean how long
ObjectMonitors that could be deflated are kept inflated? Or do you mean
something else?

I think I've responded to everything. Please let me know if I missed
something...

Dan

>
> thanks,
> Karen
>
>> On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty 
>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>> Filed:
>>
>>     JDK-8222034 Thread-SMR functions should be updated to remove work 
>> around
>> https://bugs.openjdk.java.net/browse/JDK-8222034
>>
>> Martin and Robbin, please check it out and make sure that I captured
>> things correctly...
>>
>> Dan
>>
>>
>>
>> On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>> On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>> Hi everybody,
>>>>
>>>>> I think was fixed with:
>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and 
>>>>> other RMW atomics
>>>>> You should get a leading sync and trailing one with the default 
>>>>> conservative
>>>>> model and thus get proper memory ordering.
>>>>> Martin, I'm I correct?
>>>> Exactly. Thanks for pointing this out. PPC uses the strongest 
>>>> possible ordering semantics with memory_order_conservative (default 
>>>> parameter).
>>>> I've seen that comment about PPC in "void 
>>>> ThreadsList::inc_nested_handle_cnt()". This function could get 
>>>> replaced.
>>>
>>> Okay so we need a new bug to update these two Thread-SMR functions:
>>>
>>> src/hotspot/share/runtime/threadSMR.cpp:
>>>
>>> void ThreadsList::dec_nested_handle_cnt() {
>>>   // The decrement needs to be MO_ACQ_REL. At the moment, the 
>>> Atomic::dec
>>>   // backend on PPC does not yet conform to these requirements. 
>>> Therefore
>>>   // the decrement is simulated with an Atomic::sub(1, &addr).
>>>   // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR 
>>> mechanism
>>>   // is not generally safe to use.
>>>   Atomic::sub(1, &_nested_handle_cnt);
>>> }
>>>
>>> void ThreadsList::inc_nested_handle_cnt() {
>>>   // The increment needs to be MO_SEQ_CST. At the moment, the 
>>> Atomic::inc
>>>   // backend on PPC does not yet conform to these requirements. 
>>> Therefore
>>>   // the increment is simulated with a load phi; cas phi + 1; loop.
>>>   // Without this MO_SEQ_CST Atomic::inc simulation, the nested SMR 
>>> mechanism
>>>   // is not generally safe to use.
>>>   intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>   for (;;) {
>>>     if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt, sample) == 
>>> sample) {
>>>       return;
>>>     } else {
>>>       sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>     }
>>>   }
>>> }
>>>
>>> I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>> sure we're all in agreement. Once we decide that Thread-SMR's
>>> functions look like, I'll adapt my Async Monitor Deflation
>>> functions...
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>
>>>> Sent: Freitag, 5. April 2019 14:07
>>>> To: daniel.daugherty at oracle.com 
>>>> <mailto:daniel.daugherty at oracle.com>; 
>>>> hotspot-runtime-dev at openjdk.java.net 
>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten Varming 
>>>> <varming at gmail.com <mailto:varming at gmail.com>>; Roman Kennke 
>>>> <rkennke at redhat.com <mailto:rkennke at redhat.com>>; Doerr, Martin 
>>>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>> Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>
>>>> Hi Dan,
>>>>
>>>> (Martin there is question for you last in this email)
>>>>
>>>> After first pass I did not find any real issues.
>>>> Considering what you had to work with, it looks good!
>>>>
>>>> #1
>>>> There are some assert which are redundant (to me at least) like:
>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>> L445
>>>>     if (!dmw->is_marked() && dmw->hash() == 0) {
>>>>       // This dmw is neutral and has not yet started the restoration
>>>>       // protocol so we mark a copy of the dmw to begin the protocol.
>>>>       markOop marked_dmw = dmw->set_marked();
>>>>       assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>>              "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>>              marked_dmw->is_marked(), marked_dmw->hash());
>>>>
>>>> That assert is basically a test that set_marked worked?
>>>>
>>>> L505
>>>>       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == 
>>>> DEFLATER_MARKER) {
>>>>         assert(_succ != Self, "invariant");
>>>>         assert(_owner == Self, "invariant");
>>>>
>>>> Assert on _owner checks that our cmpxchg is not broken?
>>>>
>>>> I think it's easier to read the code if some on the most obvious 
>>>> asserts are
>>>> removed. Maybe comments instead.
>>>>
>>>> #2
>>>> Not your doing but I think we should remove TRAPS/Thread * Self and use
>>>> JavaThread* instead.
>>>> E.g. so we can change:
>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>>     Thread * const Self = THREAD;
>>>>     assert(Self->is_Java_thread(), "invariant");
>>>>     assert(((JavaThread *) Self)->thread_state() == 
>>>> _thread_blocked, "invariant");
>>>>
>>>> to:
>>>>
>>>> void ObjectMonitor::EnterI(JavaThread* Self) {
>>>>     assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>
>>>> #3
>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>    164 inline void ObjectMonitor::inc_ref_count() {
>>>>    165   // The increment needs to be MO_SEQ_CST. At the moment, 
>>>> the Atomic::inc
>>>>    166   // backend on PPC does not yet conform to these 
>>>> requirements. Therefore
>>>>    167   // the increment is simulated with a load phi; cas phi + 
>>>> 1; loop.
>>>>    168   // Without this MO_SEQ_CST Atomic::inc simulation, 
>>>> AsyncDeflateIdleMonitors
>>>>    169   // is not safe.
>>>>
>>>> I think was fixed with:
>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and other 
>>>> RMW atomics
>>>> You should get a leading sync and trailing one with the default 
>>>> conservative
>>>> model and thus get proper memory ordering.
>>>> Martin, I'm I correct?
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>
>>>>>       JDK-8153224 Monitor deflation prolong safepoints
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>
>>>>> Here's a link to Carsten's original webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>
>>>>> Earlier versions of this patch have been through several rounds of
>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>> Roman for their preliminary code review comments. A very special
>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>> their own environments (including specJBB2015).
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>> fastdebug and slowdebug).
>>>>>
>>>>> All of the testing done on earlier versions will be redone on the
>>>>> latest version of the patch.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> P.S.
>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>> to characterize/analyze this failure for more than a week now. At
>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>> an existing bug. However, I plan to have a better handle on that
>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>
>>>
>>
>