RFR(L) 8153224 Monitor deflation prolong safepoints

Mon Apr 15 17:40:31 UTC 2019

On 4/15/19 12:17 PM, Karen Kinnear wrote:
> Dan,
> Sorry to be so slow to get back to you,

Just when I think I've caught up on all the email threads... :-)

No worries... I'm still testing my prelim CR1 changes...

>
>> On Apr 8, 2019, at 9:04 PM, Daniel D. Daugherty 
>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>> On 4/5/19 4:59 PM, Karen Kinnear wrote:
>>> Dan,
>>>
>>> Some more minor comments from reading the code:
>>
>> Thanks for the additional comments. I'm gathering changes for the next
>> round of code review (CR1) so these will be resolved in that round...
>>
>> More below...
>>
>>
>>> 1. Could you add comments to markOop.hpp about
>>> the use in the displaced_mark_word of is_marked to prevent any users 
>>> of is_marked
>>> here from needing to have that information saved/restored?
>>
>> I _think_ I know what you're looking for here... Perhaps this:
>>
>> src/hotspot/share/oops/markOop.hpp:
>>   // ObjectMonitor::install_displaced_markword_in_object() uses
>>   // is_marked() on ObjectMonitor::_header as part of the restoration
>>   // protocol for an object's header. In this usage, the mark bit is
>>   // only ever set (and cleared) on the ObjectMonitor::_header field.
>>   bool is_marked()   const {
>>     return (mask_bits(value(), lock_mask_in_place) == marked_value);
>>   }
> Yes - thank you. Potentially prevents future overlaps.
>>
>>
>>> 2. In objectMonitor.hpp
>>>   in is_busy you clarify the difference in use between _count (which 
>>> I think you may be changing
>>> to _contended) and _ref_count. Could you possibly also comment where 
>>> you declare them?
>>
>> I'll do the rename of _count -> _contentions in a subtask of 8153224
>> like the other cleanups of the monitor subsystem.
>>
>> Here's the comment in question:
>>
>> src/hotspot/share/runtime/objectMonitor.hpp:
>>   intptr_t is_busy() const {
>>     // TODO-FIXME: merge _count and _waiters.
>>     // TODO-FIXME: assert _owner == null implies _recursions = 0
>>     // TODO-FIXME: assert _WaitSet != null implies _count > 0
>>     // We do not include _ref_count in the is_busy() check because
>>     // _ref_count is for indicating that the ObjectMonitor* is in
>>     // use which is orthogonal to whether the ObjectMonitor itself
>>     // is in use for a locking operation.
>>     return 
>> _count|_waiters|intptr_t(_owner)|intptr_t(_cxq)|intptr_t(_EntryList);
>>   }
>>
>> I don't think this comment clarifies _count vs. _ref_count.
>> I added the last four lines of the comment and their purpose is to
>> describe why _ref_count isn't used by is_busy(). The TODO-FIXME lines
>> need to be revisited since (at least) the third one is wrong.
>>
>> Here's the existing comment for _ref_count:
>>
>>   volatile jint _ref_count;         // ref count for ObjectMonitor*
>>
>> Here's the existing comment for _count:
>>
>>   volatile jint  _count;            // reference count to prevent 
>> reclamation/deflation
>>                                     // at stop-the-world time. See 
>> ObjectSynchronizer::deflate_monitor().
>>                                     // _count is approximately 
>> |_WaitSet| + |_EntryList|
>>
>> And here's what I proposed to change it to in my reply to your design
>> review notes:
>>
>>   volatile jint  _contentions;      // Number of active contentions 
>> in enter(). It is used by is_busy()
>>                                     // along with other fields to 
>> determine if an ObjectMonitor can be
>>                                     // deflated. See 
>> ObjectSynchronizer::deflate_monitor().
>>
>> I think we're good here with the proposed change of comment (and the
>> rename) for the _contentions field along with existing comment for
>> _ref_count and the existing comment for is_busy(). I may delete the
>> third TODO-FIXME line as part of the next cleanup.
> Thank you for the rename and comment
>>
>>
>>> 3. clear_using_JT: would it make sense to have an assertion that 
>>>  _owner is either null or DEFLATER_MARKER?
>>
>> We could add something like:
>>
>>   assert(_owner == NULL ||
>>          (AsyncDeflateIdleMonitors && _owner == DEFLATER_MARKER),
>>          "Fatal logic error in ObjectMonitor owner!");
>>
>> and that will catch any races in async monitor deflation where the
>> _owner field is set to a monitor owner value (stack addr or thread*).
>> For monitor deflation at a safepoint, the non-NULL _owner field is
>> caught in clear() (which calls clear_using_JT()).
> That covers my concern.
>>
>>
>>> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees that 
>>> 0 < _count
>>> with comments that caller ensured _count <= 0
>>> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>>> — Am I missing something subtle here or should they be the same 
>>> guarantees?
>>
>> Here's the code in question:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp:
>>
>> void ObjectMonitor::EnterI(TRAPS) {
>> <snip>
>>   if (_owner == DEFLATER_MARKER) {
>>     guarantee(0 < _count, "_owner == DEFLATER_MARKER && _count <= 0 
>> should have been handled by the caller");
>>     // Deflater thread tried to lock this monitor, but it failed to 
>> make _count negative and gave up.
>>
>> void ObjectMonitor::ReenterI(Thread * Self, ObjectWaiter * SelfNode) {
>> <snip>
>>     if (_owner == DEFLATER_MARKER) {
>>       guarantee(0 <= _count, "Impossible: _owner == DEFLATER_MARKER 
>> && _count < 0, monitor must not be owned by deflater thread here");
>>
>>
>> Reading these two guarantee() calls always throws me off stride
>> because I would have written them like this:
>>
>>     guarantee(_count > 0, "_owner == DEFLATER_MARKER && _count <= 0 
>> should have been handled by the caller");
>>
>> and
>>
>>       guarantee(_count >= 0, "Impossible: _owner == DEFLATER_MARKER 
>> && _count < 0, monitor must not be owned by deflater thread here");
>>
>> When rewritten like the above, you have:
>>
>>     "_count > 0" ... _count <= 0
>>
>> and:
>>
>>     "_count >= 0" ... "_count < 0"
>>
>> which is easier for my brain to read... okay... enough sidebar...
>>
>> Short answer: No the guarantees should not be the same.
>>
>> Longer answer: EnterI() is called by enter() after enter() has
>> incremented the _count field to indicate the contended state of
>> things. So in EnterI(), "_count > 0" is the right check.
>> ReenterI() is called after wait() has returned (notified or
>> timedout), and the _count field is not used on reentry ops so
>> "_count >= 0" is the right check.
> Thank you for walking this through.
> Wait also can cause a call to enter() -> EnterI(), but in that case 
> _waiters was set while
> the monitor was still owned, so we should never get to the logic where 
> EnterI sees
> DEFLATER_MARKER.

Correct. So in the "wait() -> enter() -> EnterI()" chain we won't get
to that new guarantee() because of the _waiters value. I think we're
in agreement that the EnterI() and ReenterI() guarantees are different
and that's okay... :-)

>>
>> I'm going to tweak the ObjectMonitor::EnterI() code like this (yes,
>> there are two places in EnterI() that do this):
>>
>>     L501:   if (_owner == DEFLATER_MARKER) {
>>               // The deflation protocol finished the first part 
>> (setting _owner),
>>               // but it failed the second part (making _count 
>> negative) and bailed.
>>               // Because we're called from enter() we have at least 
>> one contention.
>>               guarantee(count > 0, "_owner == DEFLATER_MARKER && 
>> _count <= 0 should have been handled by the caller");
>>     L504:     // Try to acquire monitor.
>>     L505:     if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == 
>> DEFLATER_MARKER) {
>>
>>     L629:     if (_owner == DEFLATER_MARKER) {
>>                 // The deflation protocol finished the first part 
>> (setting _owner),
>>                 // but it failed the second part (making _count 
>> negative) and bailed.
>>                 // Because we're called from enter() we have at least 
>> one contention.
>>                 guarantee(count> 0 , "_owner == DEFLATER_MARKER && 
>> _count <= 0 should have been handled by the caller");
>>     L632:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) 
>> == DEFLATER_MARKER) {
>>
>> And I'm going to tweak the ReenterI() code like this:
>>
>>     L759:     if (_owner == DEFLATER_MARKER) {
>>                 // The deflation protocol finished the first part 
>> (setting _owner),
>>                 // but it will observe _waiters != 0 and will bail 
>> out. Because we're
>>                 // called from wait() we may or may not have any 
>> contentions.
>>                 guarantee(count >= 0, "Impossible: _owner == 
>> DEFLATER_MARKER && _count < 0 should have been handled by the caller");
>>     L761:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) 
>> == DEFLATER_MARKER) {
>>
>>
>> You didn't ask this, but it is okay that _count is only used to track
>> contentions in enter()/EnterI() and is not used to track contentions
>> in wait()/ReenterI(). For the wait()/ReenterI() code path, _waiters is
>> used by is_busy() to observe the busy state for an ObjectMonitor that
>> is being wait()'ed for. The _waiters field is decremented after a
>> waiter has returned from ReenterI() so the _owner field takes over
>> answering the is_busy() question…
> Yes - that was my confusion and I had not walked it through carefully 
> enough.
> And thank you - the guarantees are easier to read this way.

Thanks.

>>
>>
>>> 5. I could use a little help with allocation state transitions,
>>> e.g. in deflate_monitor_list_using_JT
>>>   you see is_new with object set so you mark it as old so next 
>>> deflation will check it
>>
>> Here's the code in question:
>>
>> src/hotspot/share/runtime/synchronizer.cpp:
>>
>> int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor** 
>> listHeadp,
>> ObjectMonitor** freeHeadp,
>> ObjectMonitor** freeTailp,
>> ObjectMonitor** savedMidInUsep) {
>> <snip>
>>     // Only try to deflate if there is an associated Java object and if
>>     // mid is old (is not newly allocated and is not newly freed).
>>     if (mid->object() != NULL && mid->is_old() &&
>>         deflate_monitor_using_JT(mid, freeHeadp, freeTailp)) {
>>       // Deflation succeeded so update the in-use list.
>> <snip>
>>     } else {
>>       // mid is considered in-use if it does not have an associated
>>       // Java object or mid is not old or deflation did not succeed.
>>       // A mid->is_new() node can be seen here when it is freshly 
>> returned
>>       // by omAlloc() (and skips the deflation code path).
>>       // A mid->is_old() node can be seen here when deflation failed.
>>       // A mid->is_free() node can be seen here when a fresh node from
>>       // omAlloc() is released by omRelease() due to losing the race
>>       // in inflate().
>>
>>       if (mid->object() != NULL && mid->is_new()) {
>>         // mid has an associated Java object and has now been seen
>>         // as newly allocated so mark it as "old".
>> mid->set_allocation_state(ObjectMonitor::Old);
>>       }
>>
>>>   - why do you set it to old here rather than in inflate once we set 
>>> values?
>>
>> Inflation is used in quite a few places. If we marked the
>> ObjectMonitor as "Old" in inflate(), then that would make the
>> ObjectMonitor available for deflation by deflate_monitor_using_JT()
>> earlier:
>>
>> src/hotspot/share/runtime/synchronizer.cpp:
>>> bool ObjectSynchronizer::deflate_monitor_using_JT(ObjectMonitor* mid,
>>> ObjectMonitor** freeHeadp,
>>> ObjectMonitor** freeTailp) {
>>>   assert(AsyncDeflateIdleMonitors, "sanity check");
>>>   assert(Thread::current()->is_Java_thread(), "precondition");
>>>   // A newly allocated ObjectMonitor should not be seen here so we
>>>   // avoid an endless inflate/deflate cycle.
>>>   assert(mid->is_old(), "precondition");
>>
>> So the idea behind only deflating ObjectMonitors that have reached
>> allocation state "Old" is to prevent "an endless inflate/deflate cycle".
>> Here's the relevant section from Carsten's JEP:
>>
>>> To avoid endless inflation / deflation cycles in the prototype, monitor
>>> deflation is only attempted the second time a monitor is seen by the
>>> thread marking monitors as deflatable: If the thread (the only thread
>>> marking monitors as deflatable; might be service thread or some GC
>>> related thread or even a dedicated thread) sees a monitor in state New,
>>> then the thread marks the monitor as Old and moves on. So there is
>>> little interaction between a thread inflating a lock to a monitor and
>>> the deflating thread, the inflating thread just has to make sure the
>>> monitor is marked New and this marker is published using appropriate
>>> barriers.
>>
>> There isn't an explicit example in the JEP of what Carsten was thinking
>> of with "an endless inflate/deflate cycle". I didn't try to think of
>> such an example for the OpenJDK wiki either. I simple wrote:
>>
>>> ObjectMonitor has a new allocation_state field that supports three
>>> states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied
>>> to ObjectMonitors that have reached the 'Old' state. When the Async
>>> Monitor Deflation code sees an ObjectMonitor in the 'New' state, it
>>> is changed to the 'Old' state, but is not deflated. This prevents a
>>> newly allocated ObjectMonitor from being immediately deflated which
>>> could cause an inflation<->deflation oscillation.
>>
>> So let's think about what might happen if an ObjectMonitor is marked
>> as "Old" in inflate(). Here's an example use of inflate() in the
>> "slow enter" code path:
>>
>> src/hotspot/share/runtime/synchronizer.cpp:
>> > void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, 
>> TRAPS) {
>> <snip>
>> base<    inflate(THREAD, obj(), 
>> inflate_cause_monitor_enter)->enter(THREAD);
>>
>> new>     ObjectMonitorHandle omh;
>> new>     inflate(&omh, THREAD, obj(), inflate_cause_monitor_enter);
>> new>     do_loop = !omh.om_ptr()->enter(THREAD);
>>
>> In the "base" code, we took the return from inflate() and used it to call
>> ObjectMonitor::enter(). If we never changed that bit of code and 
>> inflate()
>> marked the ObjectMonitor as "Old", then deflate_monitor_using_JT() could
>> async deflate the ObjectMonitor while we were trying to call enter() on
>> it... Boom! So we might think that holding off marking an ObjectMonitor
>> as "Old" can save us... and it can, but not in all cases... :-(
>>
>> It is entirely possible that our call to slow_enter() is made on an
>> ObjectMonitor that's already marked "Old". In that case, our thread
>> (T-enter) calls inflate() which returns the existing ObjectMonitor*
>> and we use it to call enter(). If the thread (T-deflate) calling
>> deflate_monitor_using_JT() does its magic before T-enter sets the
>> owner field or the count field... Boom!
>>
>> The previous paragraph is exactly what motivated the _ref_count field,
>> the ObjectMonitorHandle helper, and adding an ObjectMonitorHandle*
>> parameter to inflate(). inflate() calls 
>> ObjectMonitorHandle::save_om_ptr()
>> which increments the ObjectMonitor's ref_count and then checks for async
>> deflation protocol collisions. If there's a collision, then save_om_ptr()
>> returns false and the caller (inflate() in this case) has to retry. When
>> inflate() returns, the ObjectMonitor in the ObjectMonitorHandle cannot
>> be deflated and is safe until the ObjectMonitorHandle is destroyed.
>>
>> So by changing T-enter to use an ObjectMonitorHandle, T-deflate cannot
>> deflate the ObjectMonitor in the window after inflate() returns and
>> before T-enter sets the owner field or increments the count field. But
>> you know all that already!
>> So let's bring this back to having inflate() mark the ObjectMonitor as
>> "Old"... Since inflate() returns an ObjectMonitor with the ref_count > 0,
>> it doesn't matter if the ObjectMonitor is marked as "Old" in inflate().
>> T-deflate cannot deflate it due to ref_count > 0.
>>
>> Here's another crazy thought... inflate() is the only function that
>> calls omAlloc(), and omAlloc() is the only function that sets "New".
>> If we move the setting of "Old" from deflate_monitor_list_using_JT()
>> to inflate(), then the change from "New" -> "Old" never happens
>> outside of the inflate() call so why do we need the allocation state?
> That was my next question.

I'm leaving in allocation state for now, but I moved when it's set to
"Old". We'll see what the testing shakes out... I have some failures
right now, but I don't think they are related... but I'm not done
hunting yet...

>>
>> Small dose of reality: I've found having the allocation state to be
>> very helpful when debugging race related crashes. We could make the
>> allocation state be DEBUG_ONLY, but then what about race debugging of
>> product bits... sigh...
>>
>>
>>> 6. Could you get rid of the new goto’s?
>>
>> I believe there is only one left from Carsten's prototype:
>>
>> src/hotspot/share/runtime/synchronizer.cpp:
>>
>>> intptr_t ObjectSynchronizer::FastHashCode(Thread * Self, oop obj) {
>> <snip>
>>>   } else if (mark->has_monitor()) {
>>>     ObjectMonitorHandle omh;
>>>     if (!omh.save_om_ptr(obj, mark)) {
>>>       // Lost a race with async deflation so try again.
>>>       assert(AsyncDeflateIdleMonitors, "sanity check");
>>>       goto Retry;
>>>     }
>>
>> I can change FastHashCode() to use the same "while (do_loop)" as the
>> other code that needs to do retries...
>>
> Thank you.
>>
>>> 7. On the updated wiki for the hash race example:
>>> Racing Threads: “T-hash is about to inc the ref_count field”
>>> actually - T-hash just did - ref_count == 1 - so maybe change middle 
>>> values
>>
>> Actually, we're talking about the set up for the race and the
>> diagram shows "ref_count == 1" and should show "ref_count == 0".
>> So I have fixed that on the "Racing Threads" diagram.
>>
>> In the following "T-deflate Wins" and "T-hash Wins" diagrams,
>> "ref_count == 1" is shown in both initial race results ObjectMonitor
>> box. In "T-deflate Wins", it shows ref_count being restored to
>> 0 in the second ObjectMonitor box.
>>
>> Thanks for catching this error. I've fixed it on the wiki.
> Thanks.
>>
>>
>>>
>>> 8. There is an old comment in FastHashCode
>>> that
>>>  // WARNING:
>>>     //   The displaced header is strictly immutable.
>>>     // It can NOT be changed in ANY cases.
>>>
>>> I presume that only applies to the displaced header for a stack lock 
>>> - could you
>>> possibly update that while you are in the code?
>>
>> Here's the whole comment:
>>
>>>     // WARNING:
>>>     //   The displaced header is strictly immutable.
>>>     // It can NOT be changed in ANY cases. So we have
>>>     // to inflate the header into heavyweight monitor
>>>     // even the current thread owns the lock. The reason
>>>     // is the BasicLock (stack slot) will be asynchronously
>>>     // read by other threads during the inflate() function.
>>>     // Any change to stack may not propagate to other threads
>>>     // correctly.
>>
>> That comment applies the displaced header that's in the BasicLock
>> on the thread's stack and it definitely needs some cleaning up
>> independent of the Async Monitor Deflation project.
> Thank you.
>>
>>
>>> Also in FastHashCode
>>>       // The only update to the header in the monitor (outside GC)
>>>   823       // is install the hash code. If someone add new usage of
>>>   824       // displaced header, please update this code
>>> Can you update that comment as well? I know you’ve already updated 
>>> the code logic.
>>
>> I'll revisit that comment as well. I believe Carsten updated it in
>> his prototype, but when I backed out that change when I simplified
>> the hashcode stuff due to ObjectMonitorHandles/ref_count.
> Thank you. And if you are revisiting this to potentially call 
> install_displaced_markword then this
> would change yet again.
>>
>>
>>> So I walked the logic for the hashcode interactions - I didn’t find 
>>> any holes. Thank you for walking most of it in email/wiki.
>>> In particular, inflate does the save_om_ptr dance to inc_ref_count, 
>>> so this code above will
>>> be called while preventing async deflation.
>>
>> Right.
>>
>>
>>> 9. install_displaced_markword_in_object
>>> What happens if the cas_set_mark fails?
>>
>> Here's the code in question:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp:
>>
>>> void ObjectMonitor::install_displaced_markword_in_object() {
>> <snip>
>>>   if (dmw->is_marked()) {
>>>     // The dmw copy is marked which means a hash was not set by a racing
>>>     // thread. Clear the mark from the copy in preparation for possible
>>>     // restoration from this thread.
>>>     assert(dmw->hash() == 0, "must be 0: hash=" INTPTR_FORMAT, 
>>> dmw->hash());
>>>     dmw = dmw->set_unmarked();
>>>   }
>>>   assert(dmw->is_neutral(), "must be a neutral markword");
>>>
>>>   oop const obj = (oop) object();
>>>   // Install displaced markword if object markword still points to this
>>>   // monitor. Both the mutator trying to enter() and the thread 
>>> deflating
>>>   // the monitor will reach this point, but only one can win.
>>>   // Note: If a mutator won the cmpxchg() race above and installed a 
>>> hash
>>>   // in _header, then the updated dmw contains that hash and we'll 
>>> install
>>>   // it in the object's markword here.
>>>   obj->cas_set_mark(dmw, markOopDesc::encode(this));
>>
>> We don't check the return from cas_set_mark() here intentionally.
>> If we have just T-enter and T-deflate racing through this code,
>> then after the "if (dmw->is_marked()) {" block, both threads
>> will have the same 'dmw' value. One thread will set it and the
>> other thread will fail to set it, but we don't care because both
>> threads wanted to set the same value... As a result of the
>> cas_set_mark() call in both threads, both threads will see the
>> same value in the object's header (if they happen to look).
>>
>> I talk about this in the "Either Wins the Second Race" sub-section
>> on the wiki.
> Yes for the current model of only these two callers, neither modifying 
> the dmw other than
> set/clear is_marked) bit. If we extend to FastHashCode - this gets 
> trickier.

We'll have to see if the CR1 version of the comments and code
makes more sense... :-)

>>
>>
>>> I get that today this handles the race with enter and 
>>> deflate_monitor_using_JT. If we remove
>>> the call from enter, is the expectation that we’ve blocked all 
>>> others who did not set is_marked themselves?
>>> If we remove the call from enter would it make sense to ensure that 
>>> the cas_set_mark succeeds here?
>>
>> If we remove the install_displaced_markword_in_object() call from 
>> enter(),
>> then I don't think we need install_displaced_markword_in_object() at
>> all and can restore the object's header with:
>>
>>     // Restore the header back to obj
>>     obj->release_set_mark(mid->header());
>>
>> just like ObjectSynchronizer::deflate_monitor(). The question is
>> whether we think install_displaced_markword_in_object() buys us
>> something other than "help" in restoring the object's header.
> Yes - that was the question - it adds complexity. Does it help threads 
> make progress.
> Thinking about this more and reading Carsten’s reply - I think it does 
> - especially
> since seeing DEFLATION_MARKER can make the checker loop, but in the 
> meantime someone
> else can acquire the monitor - so that might be a non-trivial spin time.

Yup. I'm convinced that keeping install_displaced_markword_in_object()
is a good thing now; I just have to shake out the test results...

>>
>>
>>> 10. Is there any benefit in a bit of stress testing with something 
>>> like a temporary flag that deflates in
>>> mAlloc each time it is called?
>>
>> Maybe? :-) Something like DeflateAsyncMonitorsALot? Can you eloborate
>> on your thinking a bit?
> Just if you thought it might help shake multi-thread timing testing. 
> If you think it is all shaken out
> already - no need.

I'll check it out... :-)

>>
>>
>>> Looking forward to the performance runs as well as the latency numbers.
>>
>> I posted the SPECjbb2015 numbers from this past weekend earlier today.
>> Rather disappointing on my T7600's... Neutral on my MacMini…
> Thank you. And Claus’ response was to be sure they are meaningful 
> before deep-diving - so
> reproduce multiple times kind of thing.

Yup. And the changes I'm making in CR1 will change the deflation
rate (for the better) so what I've measured will no longer be valid...

>>
>> When you say "latency numbers", what do you mean? Do you mean how long
>> ObjectMonitors that could be deflated are kept inflated? Or do you mean
>> something else?
> latency I was thinking about was the reduced time during safepoint 
> cleanup - which is one of the goals of this
> exercise.

Ahhh... that part is easy. It's the increase in number of safepoints
that I need to mull on, but that'll change with the earlier "Old"
setting so I have to remeasure anyway...

>>
>> I think I've responded to everything. Please let me know if I missed
>> something…
> You got it all.

Good. Thanks for the sanity check.

Dan

>
> many thanks,
> Karen
>>
>> Dan
>>
>>
>>
>>>
>>> thanks,
>>> Karen
>>>
>>>> On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty 
>>>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> 
>>>> wrote:
>>>>
>>>> Filed:
>>>>
>>>>     JDK-8222034 Thread-SMR functions should be updated to remove 
>>>> work around
>>>> https://bugs.openjdk.java.net/browse/JDK-8222034
>>>>
>>>> Martin and Robbin, please check it out and make sure that I captured
>>>> things correctly...
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>>>> On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>>>> Hi everybody,
>>>>>>
>>>>>>> I think was fixed with:
>>>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and 
>>>>>>> other RMW atomics
>>>>>>> You should get a leading sync and trailing one with the default 
>>>>>>> conservative
>>>>>>> model and thus get proper memory ordering.
>>>>>>> Martin, I'm I correct?
>>>>>> Exactly. Thanks for pointing this out. PPC uses the strongest 
>>>>>> possible ordering semantics with memory_order_conservative 
>>>>>> (default parameter).
>>>>>> I've seen that comment about PPC in "void 
>>>>>> ThreadsList::inc_nested_handle_cnt()". This function could get 
>>>>>> replaced.
>>>>>
>>>>> Okay so we need a new bug to update these two Thread-SMR functions:
>>>>>
>>>>> src/hotspot/share/runtime/threadSMR.cpp:
>>>>>
>>>>> void ThreadsList::dec_nested_handle_cnt() {
>>>>>   // The decrement needs to be MO_ACQ_REL. At the moment, the 
>>>>> Atomic::dec
>>>>>   // backend on PPC does not yet conform to these requirements. 
>>>>> Therefore
>>>>>   // the decrement is simulated with an Atomic::sub(1, &addr).
>>>>>   // Without this MO_ACQ_REL Atomic::dec simulation, the nested 
>>>>> SMR mechanism
>>>>>   // is not generally safe to use.
>>>>>   Atomic::sub(1, &_nested_handle_cnt);
>>>>> }
>>>>>
>>>>> void ThreadsList::inc_nested_handle_cnt() {
>>>>>   // The increment needs to be MO_SEQ_CST. At the moment, the 
>>>>> Atomic::inc
>>>>>   // backend on PPC does not yet conform to these requirements. 
>>>>> Therefore
>>>>>   // the increment is simulated with a load phi; cas phi + 1; loop.
>>>>>   // Without this MO_SEQ_CST Atomic::inc simulation, the nested 
>>>>> SMR mechanism
>>>>>   // is not generally safe to use.
>>>>>   intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>>   for (;;) {
>>>>>     if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt, sample) 
>>>>> == sample) {
>>>>>       return;
>>>>>     } else {
>>>>>       sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>>     }
>>>>>   }
>>>>> }
>>>>>
>>>>> I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>>>> sure we're all in agreement. Once we decide that Thread-SMR's
>>>>> functions look like, I'll adapt my Async Monitor Deflation
>>>>> functions...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Robbin Ehn <robbin.ehn at oracle.com 
>>>>>> <mailto:robbin.ehn at oracle.com>>
>>>>>> Sent: Freitag, 5. April 2019 14:07
>>>>>> To: daniel.daugherty at oracle.com 
>>>>>> <mailto:daniel.daugherty at oracle.com>; 
>>>>>> hotspot-runtime-dev at openjdk.java.net 
>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten Varming 
>>>>>> <varming at gmail.com <mailto:varming at gmail.com>>; Roman Kennke 
>>>>>> <rkennke at redhat.com <mailto:rkennke at redhat.com>>; Doerr, Martin 
>>>>>> <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>>>> Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> (Martin there is question for you last in this email)
>>>>>>
>>>>>> After first pass I did not find any real issues.
>>>>>> Considering what you had to work with, it looks good!
>>>>>>
>>>>>> #1
>>>>>> There are some assert which are redundant (to me at least) like:
>>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>> L445
>>>>>>     if (!dmw->is_marked() && dmw->hash() == 0) {
>>>>>>       // This dmw is neutral and has not yet started the restoration
>>>>>>       // protocol so we mark a copy of the dmw to begin the protocol.
>>>>>>       markOop marked_dmw = dmw->set_marked();
>>>>>> assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>>>>              "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>>>> marked_dmw->is_marked(), marked_dmw->hash());
>>>>>>
>>>>>> That assert is basically a test that set_marked worked?
>>>>>>
>>>>>> L505
>>>>>>       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == 
>>>>>> DEFLATER_MARKER) {
>>>>>>         assert(_succ != Self, "invariant");
>>>>>>         assert(_owner == Self, "invariant");
>>>>>>
>>>>>> Assert on _owner checks that our cmpxchg is not broken?
>>>>>>
>>>>>> I think it's easier to read the code if some on the most obvious 
>>>>>> asserts are
>>>>>> removed. Maybe comments instead.
>>>>>>
>>>>>> #2
>>>>>> Not your doing but I think we should remove TRAPS/Thread * Self 
>>>>>> and use
>>>>>> JavaThread* instead.
>>>>>> E.g. so we can change:
>>>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>>>>     Thread * const Self = THREAD;
>>>>>> assert(Self->is_Java_thread(), "invariant");
>>>>>>     assert(((JavaThread *) Self)->thread_state() == 
>>>>>> _thread_blocked, "invariant");
>>>>>>
>>>>>> to:
>>>>>>
>>>>>> void ObjectMonitor::EnterI(JavaThread* Self) {
>>>>>>     assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>>>
>>>>>> #3
>>>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>>    164 inline void ObjectMonitor::inc_ref_count() {
>>>>>>    165   // The increment needs to be MO_SEQ_CST. At the moment, 
>>>>>> the Atomic::inc
>>>>>>    166   // backend on PPC does not yet conform to these 
>>>>>> requirements. Therefore
>>>>>>    167   // the increment is simulated with a load phi; cas phi + 
>>>>>> 1; loop.
>>>>>>    168   // Without this MO_SEQ_CST Atomic::inc simulation, 
>>>>>> AsyncDeflateIdleMonitors
>>>>>>    169   // is not safe.
>>>>>>
>>>>>> I think was fixed with:
>>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and 
>>>>>> other RMW atomics
>>>>>> You should get a leading sync and trailing one with the default 
>>>>>> conservative
>>>>>> model and thus get proper memory ordering.
>>>>>> Martin, I'm I correct?
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>> On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>> work on:
>>>>>>>
>>>>>>>       JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>> Here's the webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>
>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>
>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>> their own environments (including specJBB2015).
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>> Kitchensink
>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>> fastdebug
>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>> stress
>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>> fastdebug and slowdebug).
>>>>>>>
>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>> latest version of the patch.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> P.S.
>>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>
>>>>>
>>>>
>>>
>>
>