RFR(L) 8153224 Monitor deflation prolong safepoints

Fri Apr 12 09:23:39 UTC 2019

Hi Dan,

Thanks for the response. I think I need some gory details here, so I'll 
take it up with you separately.

David

On 11/04/2019 3:09 am, Daniel D. Daugherty wrote:
> On 4/10/19 1:15 AM, David Holmes wrote:
>> Hi Carsten, Dan,
>>
>> I'd like to pick up on one topic - a higher-level discussion about the 
>> timing of the ObjectMonitor lifecycle as they currently are and with 
>> these changes:
>>
>> Carsten wrote:
>>> I think I was thinking about a cycle where a Java object exhibits
>>> a monitor inflation, then deflation, then inflation, then deflation.
>>> Each inflation will be with a new monitor. This behavior could
>>> increase the number of monitors allocated, especially with my
>>> original patch as I recycled monitors only after a safepoint. Now
>>> that I think about it again, such a cycle is incredible unlikely as
>>> it would require repeated contention on the> java object, yet the
>>> monitors must not be busy when the deflator thread comes by. And this
>>> scenario has to repeat itself. This all seems pretty unlikely.
>>
>> So logically every Object has associated with it an ObjectMonitor but 
>> if we created the ObjectMonitor at the same time as the Object and 
>> kept it alive while the Object was alive then we would double our 
>> memory use (if not worse).
> 
> Generally worse. In one of my recent debug sessions on MacOSX with product
> bits, I had to figure out sizeof(ObjectMonitor) for memory dumping purposes
> and it was 224 bytes.
> 
> 
>> So we lazily create ObjectMonitors only when we need them: contention, 
>> Object.wait() use, hashcode use.
> 
> Clarification: hashcode with contention. hashcode by itself does not
> require inflation.
> 
> 
>> We could then leave the ObjectMonitors around as long as the Objects 
>> are alive, but again this has implications for memory use.
>>
>> So we deflate idle ObjectMonitors to reclaim memory (though in 
>> practice it is more complex and we maintain pools of them to speed up 
>> allocation).
>>
>> If we aggressively deflate as soon as an ObjectMonitor is idle then we 
>> risk getting into inflate->deflate->inflate cycles. The likelihood may 
>> be low but if you hit this pathology in your code you will probably be 
>> unhappy about the effects on performance.
>>
>> So instead, IIUC, we use some measure of "memory pressure" and only 
>> try to deflate under certain conditions. But I'm unclear exactly what 
>> those conditions are today, and whether they change with async monitor 
>> deflation. Can you enlighten me please?
> 
> Without trying to describe the existing trigger mechanisms for monitor
> deflation (lots of details), Async Monitor Deflation uses the same
> safepoint cleanup trigger points for _initiating_ monitor deflation.
> However, unlike safepoint cleanup work which will finish the job during
> the current safepoint, Async Monitor Deflation will start after the
> safepoint that initiated the monitor deflation, but there is no guarantee
> when the ServiceThread or the JavaThreads will finish their deflation work
> (maybe not before the next safepoint).
> 
> The v2.00/3-for-jdk13 webrev has the "must be seen twice to deflate"
> algorithm in place. So for any given ObjectMonitor, the first time it
> is seen by ObjectSynchronizer::deflate_monitor_list_using_JT(), it will
> not be deflated even if eligible (allocation state == "New"). The second
> time that it is seen by deflate_monitor_list_using_JT() (allocation state
> == "Old"), it is eligible for deflation.
> 
> In this round of code review, we are talking about setting the allocation
> state to "Old" in inflate() which would make an ObjectMonitor eligible
> for deflation in the next round of deflation.
> 
> 
> One of my tasks is to update the existing comments about ObjectMonitor
> life cycle (I'll use another base monitor subsystem subtask); I don't
> think they were updated with the monitor list changes so there's a bit
> of catch up work to do there. I also need to update them for Async
> Monitor Deflation life cycle changes and that will be part of this
> project.
> 
> Dan
> 
>>
>> Thanks,
>> David
>>
>> On 10/04/2019 12:25 pm, Carsten Varming wrote:
>>> Hi Dan,
>>>
>>> On Mon, Apr 8, 2019 at 9:04 PM Daniel D. Daugherty <
>>> daniel.daugherty at oracle.com> wrote:
>>>
>>>> On 4/5/19 4:59 PM, Karen Kinnear wrote:
>>>>
>>>> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees that 
>>>> 0 <
>>>> _count
>>>> with comments that caller ensured _count <= 0
>>>> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>>>> — Am I missing something subtle here or should they be the same 
>>>> guarantees?
>>>>
>>>>
>>>> Here's the code in question:
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.cpp:
>>>>
>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>> <snip>
>>>>    if (_owner == DEFLATER_MARKER) {
>>>>      guarantee(0 < _count, "_owner == DEFLATER_MARKER && _count <= 0 
>>>> should
>>>> have been handled by the caller");
>>>>      // Deflater thread tried to lock this monitor, but it failed to 
>>>> make
>>>> _count negative and gave up.
>>>>
>>>> void ObjectMonitor::ReenterI(Thread * Self, ObjectWaiter * SelfNode) {
>>>> <snip>
>>>>      if (_owner == DEFLATER_MARKER) {
>>>>        guarantee(0 <= _count, "Impossible: _owner == DEFLATER_MARKER &&
>>>> _count < 0, monitor must not be owned by deflater thread here");
>>>>
>>>>
>>>> Reading these two guarantee() calls always throws me off stride
>>>> because I would have written them like this:
>>>>
>>>>      guarantee(_count > 0, "_owner == DEFLATER_MARKER && _count <= 0 
>>>> should
>>>> have been handled by the caller");
>>>>
>>>> and
>>>>
>>>>        guarantee(_count >= 0, "Impossible: _owner == DEFLATER_MARKER &&
>>>> _count < 0, monitor must not be owned by deflater thread here");
>>>>
>>>> When rewritten like the above, you have:
>>>>
>>>>      "_count > 0" ... _count <= 0
>>>>
>>>> and:
>>>>
>>>>      "_count >= 0" ... "_count < 0"
>>>>
>>>> which is easier for my brain to read... okay... enough sidebar...
>>>>
>>>
>>> He he. I have pretty much eliminated > and >= from my written 
>>> vocabulary.
>>> It makes life simpler. Trust me. :)
>>>
>>>
>>>> Short answer: No the guarantees should not be the same.
>>>>
>>>> Longer answer: EnterI() is called by enter() after enter() has
>>>> incremented the _count field to indicate the contended state of
>>>> things. So in EnterI(), "_count > 0" is the right check.
>>>> ReenterI() is called after wait() has returned (notified or
>>>> timedout), and the _count field is not used on reentry ops so
>>>> "_count >= 0" is the right check.
>>>>
>>>> I'm going to tweak the ObjectMonitor::EnterI() code like this (yes,
>>>> there are two places in EnterI() that do this):
>>>>
>>>>      L501:   if (_owner == DEFLATER_MARKER) {
>>>>                // The deflation protocol finished the first part 
>>>> (setting
>>>> _owner),
>>>>                // but it failed the second part (making _count 
>>>> negative)
>>>> and bailed.
>>>>                // Because we're called from enter() we have at least 
>>>> one
>>>> contention.
>>>>                guarantee(count > 0, "_owner == DEFLATER_MARKER && 
>>>> _count <=
>>>> 0 should have been handled by the caller");
>>>>      L504:     // Try to acquire monitor.
>>>>      L505:     if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>> DEFLATER_MARKER) {
>>>>
>>>>      L629:     if (_owner == DEFLATER_MARKER) {
>>>>                  // The deflation protocol finished the first part 
>>>> (setting
>>>> _owner),
>>>>                  // but it failed the second part (making _count 
>>>> negative)
>>>> and bailed.
>>>>                  // Because we're called from enter() we have at 
>>>> least one
>>>> contention.
>>>>                  guarantee(count> 0 , "_owner == DEFLATER_MARKER && 
>>>> _count
>>>> <= 0 should have been handled by the caller");
>>>>      L632:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>> DEFLATER_MARKER) {
>>>>
>>>> And I'm going to tweak the ReenterI() code like this:
>>>>
>>>>      L759:     if (_owner == DEFLATER_MARKER) {
>>>>                  // The deflation protocol finished the first part 
>>>> (setting
>>>> _owner),
>>>>                  // but it will observe _waiters != 0 and will bail 
>>>> out.
>>>> Because we're
>>>>                  // called from wait() we may or may not have any
>>>> contentions.
>>>>                  guarantee(count >= 0, "Impossible: _owner ==
>>>> DEFLATER_MARKER && _count < 0 should have been handled by the caller");
>>>>      L761:       if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>> DEFLATER_MARKER) {
>>>>
>>>>
>>>> You didn't ask this, but it is okay that _count is only used to track
>>>> contentions in enter()/EnterI() and is not used to track contentions
>>>> in wait()/ReenterI(). For the wait()/ReenterI() code path, _waiters is
>>>> used by is_busy() to observe the busy state for an ObjectMonitor that
>>>> is being wait()'ed for. The _waiters field is decremented after a
>>>> waiter has returned from ReenterI() so the _owner field takes over
>>>> answering the is_busy() question...
>>>>
>>>>
>>>> 5. I could use a little help with allocation state transitions,
>>>> e.g. in deflate_monitor_list_using_JT
>>>>    you see is_new with object set so you mark it as old so next 
>>>> deflation
>>>> will check it
>>>>
>>>>
>>>> Here's the code in question:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>
>>>> int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor**
>>>> listHeadp,
>>>> ObjectMonitor**
>>>> freeHeadp,
>>>> ObjectMonitor**
>>>> freeTailp,
>>>> ObjectMonitor**
>>>> savedMidInUsep) {
>>>> <snip>
>>>>      // Only try to deflate if there is an associated Java object 
>>>> and if
>>>>      // mid is old (is not newly allocated and is not newly freed).
>>>>      if (mid->object() != NULL && mid->is_old() &&
>>>>          deflate_monitor_using_JT(mid, freeHeadp, freeTailp)) {
>>>>        // Deflation succeeded so update the in-use list.
>>>> <snip>
>>>>      } else {
>>>>        // mid is considered in-use if it does not have an associated
>>>>        // Java object or mid is not old or deflation did not succeed.
>>>>        // A mid->is_new() node can be seen here when it is freshly 
>>>> returned
>>>>        // by omAlloc() (and skips the deflation code path).
>>>>        // A mid->is_old() node can be seen here when deflation failed.
>>>>        // A mid->is_free() node can be seen here when a fresh node from
>>>>        // omAlloc() is released by omRelease() due to losing the race
>>>>        // in inflate().
>>>>
>>>>        if (mid->object() != NULL && mid->is_new()) {
>>>>          // mid has an associated Java object and has now been seen
>>>>          // as newly allocated so mark it as "old".
>>>>          mid->set_allocation_state(ObjectMonitor::Old);
>>>>        }
>>>>
>>>>    - why do you set it to old here rather than in inflate once we set
>>>> values?
>>>>
>>>>
>>>> Inflation is used in quite a few places. If we marked the
>>>> ObjectMonitor as "Old" in inflate(), then that would make the
>>>> ObjectMonitor available for deflation by deflate_monitor_using_JT()
>>>> earlier:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>
>>>> bool ObjectSynchronizer::deflate_monitor_using_JT(ObjectMonitor* mid,
>>>> ObjectMonitor**
>>>> freeHeadp,
>>>> ObjectMonitor**
>>>> freeTailp) {
>>>>    assert(AsyncDeflateIdleMonitors, "sanity check");
>>>>    assert(Thread::current()->is_Java_thread(), "precondition");
>>>>    // A newly allocated ObjectMonitor should not be seen here so we
>>>>    // avoid an endless inflate/deflate cycle.
>>>>    assert(mid->is_old(), "precondition");
>>>>
>>>>
>>>> So the idea behind only deflating ObjectMonitors that have reached
>>>> allocation state "Old" is to prevent "an endless inflate/deflate 
>>>> cycle".
>>>> Here's the relevant section from Carsten's JEP:
>>>>
>>>> To avoid endless inflation / deflation cycles in the prototype, monitor
>>>>
>>>> deflation is only attempted the second time a monitor is seen by the
>>>>
>>>> thread marking monitors as deflatable: If the thread (the only thread
>>>>
>>>> marking monitors as deflatable; might be service thread or some GC
>>>>
>>>> related thread or even a dedicated thread) sees a monitor in state New,
>>>>
>>>> then the thread marks the monitor as Old and moves on. So there is
>>>>
>>>> little interaction between a thread inflating a lock to a monitor and
>>>>
>>>> the deflating thread, the inflating thread just has to make sure the
>>>>
>>>> monitor is marked New and this marker is published using appropriate
>>>>
>>>> barriers.
>>>>
>>>>
>>>> There isn't an explicit example in the JEP of what Carsten was thinking
>>>> of with "an endless inflate/deflate cycle". I didn't try to think of
>>>> such an example for the OpenJDK wiki either. I simple wrote:
>>>>
>>>
>>> I think I was thinking about a cycle where a Java object exhibits a 
>>> monitor
>>> inflation, then deflation, then inflation, then deflation. Each 
>>> inflation
>>> will be with a new monitor. This behavior could increase the number of
>>> monitors allocated, especially with my original patch as I recycled
>>> monitors only after a safepoint. Now that I think about it again, such a
>>> cycle is incredible unlikely as it would require repeated contention 
>>> on the
>>> java object, yet the monitors must not be busy when the deflator thread
>>> comes by. And this scenario has to repeat itself. This all seems pretty
>>> unlikely.
>>>
>>> ObjectMonitor has a new allocation_state field that supports three
>>>>
>>>> states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied
>>>>
>>>> to ObjectMonitors that have reached the 'Old' state. When the Async
>>>>
>>>> Monitor Deflation code sees an ObjectMonitor in the 'New' state, it
>>>>
>>>> is changed to the 'Old' state, but is not deflated. This prevents a
>>>>
>>>> newly allocated ObjectMonitor from being immediately deflated which
>>>>
>>>> could cause an inflation<->deflation oscillation.
>>>>
>>>>
>>>> So let's think about what might happen if an ObjectMonitor is marked
>>>> as "Old" in inflate(). Here's an example use of inflate() in the
>>>> "slow enter" code path:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>> void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, 
>>>>> TRAPS) {
>>>> <snip>
>>>> base<    inflate(THREAD, obj(),
>>>> inflate_cause_monitor_enter)->enter(THREAD);
>>>>
>>>> new>     ObjectMonitorHandle omh;
>>>> new>     inflate(&omh, THREAD, obj(), inflate_cause_monitor_enter);
>>>> new>     do_loop = !omh.om_ptr()->enter(THREAD);
>>>>
>>>> In the "base" code, we took the return from inflate() and used it to 
>>>> call
>>>> ObjectMonitor::enter(). If we never changed that bit of code and 
>>>> inflate()
>>>> marked the ObjectMonitor as "Old", then deflate_monitor_using_JT() 
>>>> could
>>>> async deflate the ObjectMonitor while we were trying to call enter() on
>>>> it... Boom! So we might think that holding off marking an ObjectMonitor
>>>> as "Old" can save us... and it can, but not in all cases... :-(
>>>>
>>>> It is entirely possible that our call to slow_enter() is made on an
>>>> ObjectMonitor that's already marked "Old". In that case, our thread
>>>> (T-enter) calls inflate() which returns the existing ObjectMonitor*
>>>> and we use it to call enter(). If the thread (T-deflate) calling
>>>> deflate_monitor_using_JT() does its magic before T-enter sets the
>>>> owner field or the count field... Boom!
>>>>
>>>> The previous paragraph is exactly what motivated the _ref_count field,
>>>> the ObjectMonitorHandle helper, and adding an ObjectMonitorHandle*
>>>> parameter to inflate(). inflate() calls 
>>>> ObjectMonitorHandle::save_om_ptr()
>>>> which increments the ObjectMonitor's ref_count and then checks for 
>>>> async
>>>> deflation protocol collisions. If there's a collision, then 
>>>> save_om_ptr()
>>>> returns false and the caller (inflate() in this case) has to retry. 
>>>> When
>>>> inflate() returns, the ObjectMonitor in the ObjectMonitorHandle cannot
>>>> be deflated and is safe until the ObjectMonitorHandle is destroyed.
>>>>
>>>> So by changing T-enter to use an ObjectMonitorHandle, T-deflate cannot
>>>> deflate the ObjectMonitor in the window after inflate() returns and
>>>> before T-enter sets the owner field or increments the count field. But
>>>> you know all that already!
>>>>
>>>> So let's bring this back to having inflate() mark the ObjectMonitor as
>>>> "Old"... Since inflate() returns an ObjectMonitor with the ref_count 
>>>> > 0,
>>>> it doesn't matter if the ObjectMonitor is marked as "Old" in inflate().
>>>> T-deflate cannot deflate it due to ref_count > 0.
>>>>
>>>> Here's another crazy thought... inflate() is the only function that
>>>> calls omAlloc(), and omAlloc() is the only function that sets "New".
>>>> If we move the setting of "Old" from deflate_monitor_list_using_JT()
>>>> to inflate(), then the change from "New" -> "Old" never happens
>>>> outside of the inflate() call so why do we need the allocation state?
>>>>
>>>> Small dose of reality: I've found having the allocation state to be
>>>> very helpful when debugging race related crashes. We could make the
>>>> allocation state be DEBUG_ONLY, but then what about race debugging of
>>>> product bits... sigh...
>>>>
>>>>
>>>> 6. Could you get rid of the new goto’s?
>>>>
>>>>
>>>> I believe there is only one left from Carsten's prototype:
>>>>
>>>
>>> You make it sound like I was throwing gotos around left and right. :) If
>>> you count continue and break statements, then you might have been right.
>>>
>>> I'll break my response here, so we can return to regular structured
>>> programming, ;-)
>>> Carsten
>>>
>