RFR(L) 8153224 Monitor deflation prolong safepoints

Fri Apr 12 13:35:09 UTC 2019

Sounds like a plan...

Dan

On 4/12/19 5:23 AM, David Holmes wrote:
> Hi Dan,
>
> Thanks for the response. I think I need some gory details here, so 
> I'll take it up with you separately.
>
> David
>
> On 11/04/2019 3:09 am, Daniel D. Daugherty wrote:
>> On 4/10/19 1:15 AM, David Holmes wrote:
>>> Hi Carsten, Dan,
>>>
>>> I'd like to pick up on one topic - a higher-level discussion about 
>>> the timing of the ObjectMonitor lifecycle as they currently are and 
>>> with these changes:
>>>
>>> Carsten wrote:
>>>> I think I was thinking about a cycle where a Java object exhibits
>>>> a monitor inflation, then deflation, then inflation, then deflation.
>>>> Each inflation will be with a new monitor. This behavior could
>>>> increase the number of monitors allocated, especially with my
>>>> original patch as I recycled monitors only after a safepoint. Now
>>>> that I think about it again, such a cycle is incredible unlikely as
>>>> it would require repeated contention on the> java object, yet the
>>>> monitors must not be busy when the deflator thread comes by. And this
>>>> scenario has to repeat itself. This all seems pretty unlikely.
>>>
>>> So logically every Object has associated with it an ObjectMonitor 
>>> but if we created the ObjectMonitor at the same time as the Object 
>>> and kept it alive while the Object was alive then we would double 
>>> our memory use (if not worse).
>>
>> Generally worse. In one of my recent debug sessions on MacOSX with 
>> product
>> bits, I had to figure out sizeof(ObjectMonitor) for memory dumping 
>> purposes
>> and it was 224 bytes.
>>
>>
>>> So we lazily create ObjectMonitors only when we need them: 
>>> contention, Object.wait() use, hashcode use.
>>
>> Clarification: hashcode with contention. hashcode by itself does not
>> require inflation.
>>
>>
>>> We could then leave the ObjectMonitors around as long as the Objects 
>>> are alive, but again this has implications for memory use.
>>>
>>> So we deflate idle ObjectMonitors to reclaim memory (though in 
>>> practice it is more complex and we maintain pools of them to speed 
>>> up allocation).
>>>
>>> If we aggressively deflate as soon as an ObjectMonitor is idle then 
>>> we risk getting into inflate->deflate->inflate cycles. The 
>>> likelihood may be low but if you hit this pathology in your code you 
>>> will probably be unhappy about the effects on performance.
>>>
>>> So instead, IIUC, we use some measure of "memory pressure" and only 
>>> try to deflate under certain conditions. But I'm unclear exactly 
>>> what those conditions are today, and whether they change with async 
>>> monitor deflation. Can you enlighten me please?
>>
>> Without trying to describe the existing trigger mechanisms for monitor
>> deflation (lots of details), Async Monitor Deflation uses the same
>> safepoint cleanup trigger points for _initiating_ monitor deflation.
>> However, unlike safepoint cleanup work which will finish the job during
>> the current safepoint, Async Monitor Deflation will start after the
>> safepoint that initiated the monitor deflation, but there is no 
>> guarantee
>> when the ServiceThread or the JavaThreads will finish their deflation 
>> work
>> (maybe not before the next safepoint).
>>
>> The v2.00/3-for-jdk13 webrev has the "must be seen twice to deflate"
>> algorithm in place. So for any given ObjectMonitor, the first time it
>> is seen by ObjectSynchronizer::deflate_monitor_list_using_JT(), it will
>> not be deflated even if eligible (allocation state == "New"). The second
>> time that it is seen by deflate_monitor_list_using_JT() (allocation 
>> state
>> == "Old"), it is eligible for deflation.
>>
>> In this round of code review, we are talking about setting the 
>> allocation
>> state to "Old" in inflate() which would make an ObjectMonitor eligible
>> for deflation in the next round of deflation.
>>
>>
>> One of my tasks is to update the existing comments about ObjectMonitor
>> life cycle (I'll use another base monitor subsystem subtask); I don't
>> think they were updated with the monitor list changes so there's a bit
>> of catch up work to do there. I also need to update them for Async
>> Monitor Deflation life cycle changes and that will be part of this
>> project.
>>
>> Dan
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 10/04/2019 12:25 pm, Carsten Varming wrote:
>>>> Hi Dan,
>>>>
>>>> On Mon, Apr 8, 2019 at 9:04 PM Daniel D. Daugherty <
>>>> daniel.daugherty at oracle.com> wrote:
>>>>
>>>>> On 4/5/19 4:59 PM, Karen Kinnear wrote:
>>>>>
>>>>> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees 
>>>>> that 0 <
>>>>> _count
>>>>> with comments that caller ensured _count <= 0
>>>>> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>>>>> — Am I missing something subtle here or should they be the same 
>>>>> guarantees?
>>>>>
>>>>>
>>>>> Here's the code in question:
>>>>>
>>>>> src/hotspot/share/runtime/objectMonitor.cpp:
>>>>>
>>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>>> <snip>
>>>>>    if (_owner == DEFLATER_MARKER) {
>>>>>      guarantee(0 < _count, "_owner == DEFLATER_MARKER && _count <= 
>>>>> 0 should
>>>>> have been handled by the caller");
>>>>>      // Deflater thread tried to lock this monitor, but it failed 
>>>>> to make
>>>>> _count negative and gave up.
>>>>>
>>>>> void ObjectMonitor::ReenterI(Thread * Self, ObjectWaiter * 
>>>>> SelfNode) {
>>>>> <snip>
>>>>>      if (_owner == DEFLATER_MARKER) {
>>>>>        guarantee(0 <= _count, "Impossible: _owner == 
>>>>> DEFLATER_MARKER &&
>>>>> _count < 0, monitor must not be owned by deflater thread here");
>>>>>
>>>>>
>>>>> Reading these two guarantee() calls always throws me off stride
>>>>> because I would have written them like this:
>>>>>
>>>>>      guarantee(_count > 0, "_owner == DEFLATER_MARKER && _count <= 
>>>>> 0 should
>>>>> have been handled by the caller");
>>>>>
>>>>> and
>>>>>
>>>>>        guarantee(_count >= 0, "Impossible: _owner == 
>>>>> DEFLATER_MARKER &&
>>>>> _count < 0, monitor must not be owned by deflater thread here");
>>>>>
>>>>> When rewritten like the above, you have:
>>>>>
>>>>>      "_count > 0" ... _count <= 0
>>>>>
>>>>> and:
>>>>>
>>>>>      "_count >= 0" ... "_count < 0"
>>>>>
>>>>> which is easier for my brain to read... okay... enough sidebar...
>>>>>
>>>>
>>>> He he. I have pretty much eliminated > and >= from my written 
>>>> vocabulary.
>>>> It makes life simpler. Trust me. :)
>>>>
>>>>
>>>>> Short answer: No the guarantees should not be the same.
>>>>>
>>>>> Longer answer: EnterI() is called by enter() after enter() has
>>>>> incremented the _count field to indicate the contended state of
>>>>> things. So in EnterI(), "_count > 0" is the right check.
>>>>> ReenterI() is called after wait() has returned (notified or
>>>>> timedout), and the _count field is not used on reentry ops so
>>>>> "_count >= 0" is the right check.
>>>>>
>>>>> I'm going to tweak the ObjectMonitor::EnterI() code like this (yes,
>>>>> there are two places in EnterI() that do this):
>>>>>
>>>>>      L501:   if (_owner == DEFLATER_MARKER) {
>>>>>                // The deflation protocol finished the first part 
>>>>> (setting
>>>>> _owner),
>>>>>                // but it failed the second part (making _count 
>>>>> negative)
>>>>> and bailed.
>>>>>                // Because we're called from enter() we have at 
>>>>> least one
>>>>> contention.
>>>>>                guarantee(count > 0, "_owner == DEFLATER_MARKER && 
>>>>> _count <=
>>>>> 0 should have been handled by the caller");
>>>>>      L504:     // Try to acquire monitor.
>>>>>      L505:     if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>>> DEFLATER_MARKER) {
>>>>>
>>>>>      L629:     if (_owner == DEFLATER_MARKER) {
>>>>>                  // The deflation protocol finished the first part 
>>>>> (setting
>>>>> _owner),
>>>>>                  // but it failed the second part (making _count 
>>>>> negative)
>>>>> and bailed.
>>>>>                  // Because we're called from enter() we have at 
>>>>> least one
>>>>> contention.
>>>>>                  guarantee(count> 0 , "_owner == DEFLATER_MARKER 
>>>>> && _count
>>>>> <= 0 should have been handled by the caller");
>>>>>      L632:       if (Atomic::cmpxchg(Self, &_owner, 
>>>>> DEFLATER_MARKER) ==
>>>>> DEFLATER_MARKER) {
>>>>>
>>>>> And I'm going to tweak the ReenterI() code like this:
>>>>>
>>>>>      L759:     if (_owner == DEFLATER_MARKER) {
>>>>>                  // The deflation protocol finished the first part 
>>>>> (setting
>>>>> _owner),
>>>>>                  // but it will observe _waiters != 0 and will 
>>>>> bail out.
>>>>> Because we're
>>>>>                  // called from wait() we may or may not have any
>>>>> contentions.
>>>>>                  guarantee(count >= 0, "Impossible: _owner ==
>>>>> DEFLATER_MARKER && _count < 0 should have been handled by the 
>>>>> caller");
>>>>>      L761:       if (Atomic::cmpxchg(Self, &_owner, 
>>>>> DEFLATER_MARKER) ==
>>>>> DEFLATER_MARKER) {
>>>>>
>>>>>
>>>>> You didn't ask this, but it is okay that _count is only used to track
>>>>> contentions in enter()/EnterI() and is not used to track contentions
>>>>> in wait()/ReenterI(). For the wait()/ReenterI() code path, 
>>>>> _waiters is
>>>>> used by is_busy() to observe the busy state for an ObjectMonitor that
>>>>> is being wait()'ed for. The _waiters field is decremented after a
>>>>> waiter has returned from ReenterI() so the _owner field takes over
>>>>> answering the is_busy() question...
>>>>>
>>>>>
>>>>> 5. I could use a little help with allocation state transitions,
>>>>> e.g. in deflate_monitor_list_using_JT
>>>>>    you see is_new with object set so you mark it as old so next 
>>>>> deflation
>>>>> will check it
>>>>>
>>>>>
>>>>> Here's the code in question:
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>>
>>>>> int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor**
>>>>> listHeadp,
>>>>> ObjectMonitor**
>>>>> freeHeadp,
>>>>> ObjectMonitor**
>>>>> freeTailp,
>>>>> ObjectMonitor**
>>>>> savedMidInUsep) {
>>>>> <snip>
>>>>>      // Only try to deflate if there is an associated Java object 
>>>>> and if
>>>>>      // mid is old (is not newly allocated and is not newly freed).
>>>>>      if (mid->object() != NULL && mid->is_old() &&
>>>>>          deflate_monitor_using_JT(mid, freeHeadp, freeTailp)) {
>>>>>        // Deflation succeeded so update the in-use list.
>>>>> <snip>
>>>>>      } else {
>>>>>        // mid is considered in-use if it does not have an associated
>>>>>        // Java object or mid is not old or deflation did not succeed.
>>>>>        // A mid->is_new() node can be seen here when it is freshly 
>>>>> returned
>>>>>        // by omAlloc() (and skips the deflation code path).
>>>>>        // A mid->is_old() node can be seen here when deflation 
>>>>> failed.
>>>>>        // A mid->is_free() node can be seen here when a fresh node 
>>>>> from
>>>>>        // omAlloc() is released by omRelease() due to losing the race
>>>>>        // in inflate().
>>>>>
>>>>>        if (mid->object() != NULL && mid->is_new()) {
>>>>>          // mid has an associated Java object and has now been seen
>>>>>          // as newly allocated so mark it as "old".
>>>>>          mid->set_allocation_state(ObjectMonitor::Old);
>>>>>        }
>>>>>
>>>>>    - why do you set it to old here rather than in inflate once we set
>>>>> values?
>>>>>
>>>>>
>>>>> Inflation is used in quite a few places. If we marked the
>>>>> ObjectMonitor as "Old" in inflate(), then that would make the
>>>>> ObjectMonitor available for deflation by deflate_monitor_using_JT()
>>>>> earlier:
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>>
>>>>> bool ObjectSynchronizer::deflate_monitor_using_JT(ObjectMonitor* mid,
>>>>> ObjectMonitor**
>>>>> freeHeadp,
>>>>> ObjectMonitor**
>>>>> freeTailp) {
>>>>>    assert(AsyncDeflateIdleMonitors, "sanity check");
>>>>>    assert(Thread::current()->is_Java_thread(), "precondition");
>>>>>    // A newly allocated ObjectMonitor should not be seen here so we
>>>>>    // avoid an endless inflate/deflate cycle.
>>>>>    assert(mid->is_old(), "precondition");
>>>>>
>>>>>
>>>>> So the idea behind only deflating ObjectMonitors that have reached
>>>>> allocation state "Old" is to prevent "an endless inflate/deflate 
>>>>> cycle".
>>>>> Here's the relevant section from Carsten's JEP:
>>>>>
>>>>> To avoid endless inflation / deflation cycles in the prototype, 
>>>>> monitor
>>>>>
>>>>> deflation is only attempted the second time a monitor is seen by the
>>>>>
>>>>> thread marking monitors as deflatable: If the thread (the only thread
>>>>>
>>>>> marking monitors as deflatable; might be service thread or some GC
>>>>>
>>>>> related thread or even a dedicated thread) sees a monitor in state 
>>>>> New,
>>>>>
>>>>> then the thread marks the monitor as Old and moves on. So there is
>>>>>
>>>>> little interaction between a thread inflating a lock to a monitor and
>>>>>
>>>>> the deflating thread, the inflating thread just has to make sure the
>>>>>
>>>>> monitor is marked New and this marker is published using appropriate
>>>>>
>>>>> barriers.
>>>>>
>>>>>
>>>>> There isn't an explicit example in the JEP of what Carsten was 
>>>>> thinking
>>>>> of with "an endless inflate/deflate cycle". I didn't try to think of
>>>>> such an example for the OpenJDK wiki either. I simple wrote:
>>>>>
>>>>
>>>> I think I was thinking about a cycle where a Java object exhibits a 
>>>> monitor
>>>> inflation, then deflation, then inflation, then deflation. Each 
>>>> inflation
>>>> will be with a new monitor. This behavior could increase the number of
>>>> monitors allocated, especially with my original patch as I recycled
>>>> monitors only after a safepoint. Now that I think about it again, 
>>>> such a
>>>> cycle is incredible unlikely as it would require repeated 
>>>> contention on the
>>>> java object, yet the monitors must not be busy when the deflator 
>>>> thread
>>>> comes by. And this scenario has to repeat itself. This all seems 
>>>> pretty
>>>> unlikely.
>>>>
>>>> ObjectMonitor has a new allocation_state field that supports three
>>>>>
>>>>> states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied
>>>>>
>>>>> to ObjectMonitors that have reached the 'Old' state. When the Async
>>>>>
>>>>> Monitor Deflation code sees an ObjectMonitor in the 'New' state, it
>>>>>
>>>>> is changed to the 'Old' state, but is not deflated. This prevents a
>>>>>
>>>>> newly allocated ObjectMonitor from being immediately deflated which
>>>>>
>>>>> could cause an inflation<->deflation oscillation.
>>>>>
>>>>>
>>>>> So let's think about what might happen if an ObjectMonitor is marked
>>>>> as "Old" in inflate(). Here's an example use of inflate() in the
>>>>> "slow enter" code path:
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>>> void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, 
>>>>>> TRAPS) {
>>>>> <snip>
>>>>> base<    inflate(THREAD, obj(),
>>>>> inflate_cause_monitor_enter)->enter(THREAD);
>>>>>
>>>>> new>     ObjectMonitorHandle omh;
>>>>> new>     inflate(&omh, THREAD, obj(), inflate_cause_monitor_enter);
>>>>> new>     do_loop = !omh.om_ptr()->enter(THREAD);
>>>>>
>>>>> In the "base" code, we took the return from inflate() and used it 
>>>>> to call
>>>>> ObjectMonitor::enter(). If we never changed that bit of code and 
>>>>> inflate()
>>>>> marked the ObjectMonitor as "Old", then deflate_monitor_using_JT() 
>>>>> could
>>>>> async deflate the ObjectMonitor while we were trying to call 
>>>>> enter() on
>>>>> it... Boom! So we might think that holding off marking an 
>>>>> ObjectMonitor
>>>>> as "Old" can save us... and it can, but not in all cases... :-(
>>>>>
>>>>> It is entirely possible that our call to slow_enter() is made on an
>>>>> ObjectMonitor that's already marked "Old". In that case, our thread
>>>>> (T-enter) calls inflate() which returns the existing ObjectMonitor*
>>>>> and we use it to call enter(). If the thread (T-deflate) calling
>>>>> deflate_monitor_using_JT() does its magic before T-enter sets the
>>>>> owner field or the count field... Boom!
>>>>>
>>>>> The previous paragraph is exactly what motivated the _ref_count 
>>>>> field,
>>>>> the ObjectMonitorHandle helper, and adding an ObjectMonitorHandle*
>>>>> parameter to inflate(). inflate() calls 
>>>>> ObjectMonitorHandle::save_om_ptr()
>>>>> which increments the ObjectMonitor's ref_count and then checks for 
>>>>> async
>>>>> deflation protocol collisions. If there's a collision, then 
>>>>> save_om_ptr()
>>>>> returns false and the caller (inflate() in this case) has to 
>>>>> retry. When
>>>>> inflate() returns, the ObjectMonitor in the ObjectMonitorHandle 
>>>>> cannot
>>>>> be deflated and is safe until the ObjectMonitorHandle is destroyed.
>>>>>
>>>>> So by changing T-enter to use an ObjectMonitorHandle, T-deflate 
>>>>> cannot
>>>>> deflate the ObjectMonitor in the window after inflate() returns and
>>>>> before T-enter sets the owner field or increments the count field. 
>>>>> But
>>>>> you know all that already!
>>>>>
>>>>> So let's bring this back to having inflate() mark the 
>>>>> ObjectMonitor as
>>>>> "Old"... Since inflate() returns an ObjectMonitor with the 
>>>>> ref_count > 0,
>>>>> it doesn't matter if the ObjectMonitor is marked as "Old" in 
>>>>> inflate().
>>>>> T-deflate cannot deflate it due to ref_count > 0.
>>>>>
>>>>> Here's another crazy thought... inflate() is the only function that
>>>>> calls omAlloc(), and omAlloc() is the only function that sets "New".
>>>>> If we move the setting of "Old" from deflate_monitor_list_using_JT()
>>>>> to inflate(), then the change from "New" -> "Old" never happens
>>>>> outside of the inflate() call so why do we need the allocation state?
>>>>>
>>>>> Small dose of reality: I've found having the allocation state to be
>>>>> very helpful when debugging race related crashes. We could make the
>>>>> allocation state be DEBUG_ONLY, but then what about race debugging of
>>>>> product bits... sigh...
>>>>>
>>>>>
>>>>> 6. Could you get rid of the new goto’s?
>>>>>
>>>>>
>>>>> I believe there is only one left from Carsten's prototype:
>>>>>
>>>>
>>>> You make it sound like I was throwing gotos around left and right. 
>>>> :) If
>>>> you count continue and break statements, then you might have been 
>>>> right.
>>>>
>>>> I'll break my response here, so we can return to regular structured
>>>> programming, ;-)
>>>> Carsten
>>>>
>>