RFR(L) 8153224 Monitor deflation prolong safepoints
Daniel D. Daugherty
daniel.daugherty at oracle.com
Mon Apr 15 19:25:01 UTC 2019
On 4/15/19 3:19 PM, Karen Kinnear wrote:
> Dan,
>
> Thank you for delving into this
>
>> On Apr 9, 2019, at 11:53 AM, Daniel D. Daugherty
>> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>> Hi Carsten,
>>
>> Thanks for responding to Karen's code review comments.
>>
>> Karen,
>>
>> I have a query for you down at the end of my reply...
>>
>> More below...
>>
>> On 4/5/19 11:01 PM, Carsten Varming wrote:
>>> Dear Karen,
>>>
>>> Please see inline answers.
>>>
>>> On Fri, Apr 5, 2019 at 4:59 PM Karen Kinnear
>>> <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>>>
>>>
>>> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees
>>> that 0 < _count
>>> with comments that caller ensured _count <= 0
>>> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
>>> — Am I missing something subtle here or should they be the same
>>> guarantees?
>>>
>>>
>>> In ::enter _count is incremented when the thread is trying to
>>> acquire the monitor and decremented after the monitor has been
>>> acquired. The 0 < _count assertion is between those two point in the
>>> code. A thread acquiring a monitor and then calling wait will
>>> increment _count and then decrement _count as part of acquiring the
>>> monitor, thus _count can be 0 by the time the thread calls wait and
>>> when ReenterI is called.
>>
>> I had a similar answer and I'm planning to tweak the comments
>> and the guarantees a bit in the next round of code review (CR1);
>> please see my reply to Karen's CR for the proposed changes.
> Thanks again.
>>
>>
>>>
>>> 9. install_displaced_markword_in_object
>>> What happens if the cas_set_mark fails?
>>> I get that today this handles the race with enter and
>>> deflate_monitor_using_JT. If we remove
>>> the call from enter, is the expectation that we’ve blocked all
>>> others who did not set is_marked themselves?
>>> If we remove the call from enter would it make sense to ensure
>>> that the cas_set_mark succeeds here?
>>>
>>>
>>> I designed my original patch such that no thread would ever wait for
>>> the the deflating thread to finish deflating a monitor. If you
>>> remove install_displaced_markword_in_object from enter, then the
>>> entering thread can end up busy waiting by continuously reading the
>>> monitor pointer from the object mark word and then realizing that
>>> the monitor is being deflated and it should retry by going back to
>>> reading the object mark word. This bad behavior is completely
>>> avoided by calling install_displaced_markword_in_object.
>>
>> Here's the code in question:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp:
>>
>>> bool ObjectMonitor::enter(TRAPS) {
>> <snip>
>>> // Prevent deflation. See ObjectSynchronizer::deflate_monitor()
>>> and is_busy().
>>> // Ensure the object-monitor relationship remains stable while
>>> there's contention.
>>> const jint count = Atomic::add(1, &_count);
>>> if (count <= 0 && _owner == DEFLATER_MARKER) {
>>> // Async deflation in progress. Help deflater thread install
>>> // the mark word (in case deflater thread is slow).
>>> install_displaced_markword_in_object();
>>> Self->_Stalled = 0;
>>> return false; // Caller should retry. Never mind about _count
>>> as this monitor has been deflated.
>>> }
>>
>> Our thread (T-enter) observes that the ObjectMonitor is being deflated
>> by T-deflate, calls install_displaced_markword_in_object() and returns
>> false to the caller which causes a retry.
>>
>> Restoring the header/dmw from the ObjectMonitor to the object's header
>> here isn't needed for correctness so it could be dropped (and would
>> simplify the code). Your counterpoint is if we drop the call, then
>> T-enter could do retry after retry if T-deflate is slow to get to its
>> install_displaced_markword_in_object() call.
>>
>> If T-enter calls install_displaced_markword_in_object(), then T-enter
>> will do a single retry because the object T-enter is trying to lock
>> will no longer have an ObjectMonitor. Okay I finally grok it...
>>
>> I think we need to clarify the comment a bit:
>>
>> > if (count <= 0 && _owner == DEFLATER_MARKER) {
>> > // Async deflation is in progress. Attempt to restore the
>> > // header/dmw to the object's header so that we only retry once
>> > // if the deflater thread happens to be slow.
>> > install_displaced_markword_in_object();
>>
>>
>>> In my original patch no thread would ever wait for a deflating
>>> thread to finish. This property got lost in FastHashCode as that
>>> function evolved since I wrote my patch, but I think this property
>>> is worth preserving where possible. It might even be worth looking
>>> at FastHashCode to see if we can re-establish this property.
>>
>> Async Monitor Deflation causes races with FastHashCode() when the target
>> object has an existing ObjectMonitor. Here's the base code:
>>
>>> 768 } else if (mark->has_monitor()) {
>>> 769 monitor = mark->monitor();
>>> 770 temp = monitor->header();
>>> 771 assert(temp->is_neutral(), "invariant");
>>> 772 hash = temp->hash();
>>> 773 if (hash) {
>>> 774 return hash;
>>> 775 }
>>> 776 // Skip to the following code to reduce code size
>>
>> The 'monitor' fetched on L769 is unstable due to Async Monitor
>> Deflation and can cause an incorrect hash value to be returned.
>> The solution is to protect the ObjectMonitor*:
>>
>>> 775 } else if (mark->has_monitor()) {
>>> 776 ObjectMonitorHandle omh;
>>> 777 if (!omh.save_om_ptr(obj, mark)) {
>>> 778 // Lost a race with async deflation so try again.
>>> 779 assert(AsyncDeflateIdleMonitors, "sanity check");
>>> 780 goto Retry;
>>> 781 }
>>> 782 monitor = omh.om_ptr();
>>> 783 temp = monitor->header();
>>> 784 assert(temp->is_neutral(), "invariant: header=" INTPTR_FORMAT, p2i((address)temp));
>>> 785 hash = temp->hash();
>>> 786 if (hash != 0) {
>>> 787 return hash;
>>> 788 }
>>> 789 // Skip to the following code to reduce code size
>>
>> where L776-L782 handle the protection duty and possible retry. So we
>> have to protect the ObjectMonitor*, but, like enter(), we could call
>> install_displaced_markword_in_object() when we retry which would limit
>> T-hash to a single retry.
>
>>
>> ObjectSynchronizer::inflate() has a similar collision and retry issue:
>>
>>> 1456 // CASE: inflated
>>> 1457 if (mark->has_monitor()) {
>>> 1458 if (!omh_p->save_om_ptr(object, mark)) {
>>> 1459 // Lost a race with async deflation so try again.
>>> 1460 assert(AsyncDeflateIdleMonitors, "sanity check");
>>> 1461 continue;
>>> 1462 }
>>
>> In this situation, inflate() discovers that the object already has an
>> ObjectMonitor; the object may not have had one when inflate() was
>> called, but it has one now. That particular race predates this project.
>>
>> In any case, inflate() wants to return a stable ObjectMonitor* in the
>> ObjectMonitorHandle, but if save_om_ptr() returns false, then inflate()
>> has to retry. The only reason for save_om_ptr() to return false is due
>> to a collision with Async Monitor Deflation. Like enter, we could call
>> install_displaced_markword_in_object() when we retry which would limit
>> inflate() to a single retry.
>>
>>
>> Okay, I've evolved from thinking we could simplify the code by dropping
>> install_displaced_markword_in_object() to thinking that I understand
>> what install_displaced_markword_in_object() brings to the party. And now
>> I'm proposing that we add 2 more install_displaced_markword_in_object()
>> calls to limit retries on two more code paths.
>>
>> Karen, are you convinced that install_displaced_markword_in_object() is
>> useful?
>
> Dan - between Carsten’s and your explanation - I get that
> 1) there is value in being able to make forward progress on the object
> itself sooner, once we know
> that DEFLATE_MARKER is trying to deflate it
> 2) since an enter() can CAS an _owner after DEFLATER_MARKER set but
> before _count -MAX_JINT,
> then anyone else waiting for deflation to finish could wait a while
> until they actually get a turn
> 3) If the other callers were to use install_dmw… then they could have
> shorter retry cycles (although
> with the enter, and now potentially other equivalent callers - more
> recontenders).
>
> So I see the benefit in others trying this approach as well - to free
> the object from the not-currently-used
> inflated monitor and retry.
Glad we're all on the same page.
> You may have already sent out a new webrev with that approach - I have
> not worked through in my
> head how that changes the details, since there are now more players,
> with different gates.
I have not sent out CR1 yet. I'm still testing. I had some new test failures
on Friday. It looks like adding more install_displaced_markword_in_object()
calls has revealed yet another race. I'm testing a fix now. Once I have
stable
bits again, I'll start the CR1 cycle...
Dan
>
> thanks for walking us through this,
> Karen
>> Dan
>>
>>
>>
>>>
>>> I hope this helps.
>>>
>>> Best,
>>> Carsten
>>>
>>>> On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty
>>>> <daniel.daugherty at oracle.com
>>>> <mailto:daniel.daugherty at oracle.com>> wrote:
>>>>
>>>> Filed:
>>>>
>>>> JDK-8222034 Thread-SMR functions should be updated to
>>>> remove work around
>>>> https://bugs.openjdk.java.net/browse/JDK-8222034
>>>>
>>>> Martin and Robbin, please check it out and make sure that I
>>>> captured
>>>> things correctly...
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>>>> On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>>>> Hi everybody,
>>>>>>
>>>>>>> I think was fixed with:
>>>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc
>>>>>>> and other RMW atomics
>>>>>>> You should get a leading sync and trailing one with the
>>>>>>> default conservative
>>>>>>> model and thus get proper memory ordering.
>>>>>>> Martin, I'm I correct?
>>>>>> Exactly. Thanks for pointing this out. PPC uses the strongest
>>>>>> possible ordering semantics with memory_order_conservative
>>>>>> (default parameter).
>>>>>> I've seen that comment about PPC in "void
>>>>>> ThreadsList::inc_nested_handle_cnt()". This function could
>>>>>> get replaced.
>>>>>
>>>>> Okay so we need a new bug to update these two Thread-SMR
>>>>> functions:
>>>>>
>>>>> src/hotspot/share/runtime/threadSMR.cpp:
>>>>>
>>>>> void ThreadsList::dec_nested_handle_cnt() {
>>>>> // The decrement needs to be MO_ACQ_REL. At the moment, the
>>>>> Atomic::dec
>>>>> // backend on PPC does not yet conform to these
>>>>> requirements. Therefore
>>>>> // the decrement is simulated with an Atomic::sub(1, &addr).
>>>>> // Without this MO_ACQ_REL Atomic::dec simulation, the
>>>>> nested SMR mechanism
>>>>> // is not generally safe to use.
>>>>> Atomic::sub(1, &_nested_handle_cnt);
>>>>> }
>>>>>
>>>>> void ThreadsList::inc_nested_handle_cnt() {
>>>>> // The increment needs to be MO_SEQ_CST. At the moment, the
>>>>> Atomic::inc
>>>>> // backend on PPC does not yet conform to these
>>>>> requirements. Therefore
>>>>> // the increment is simulated with a load phi; cas phi + 1;
>>>>> loop.
>>>>> // Without this MO_SEQ_CST Atomic::inc simulation, the
>>>>> nested SMR mechanism
>>>>> // is not generally safe to use.
>>>>> intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>> for (;;) {
>>>>> if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt,
>>>>> sample) == sample) {
>>>>> return;
>>>>> } else {
>>>>> sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>>>> sure we're all in agreement. Once we decide that Thread-SMR's
>>>>> functions look like, I'll adapt my Async Monitor Deflation
>>>>> functions...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Robbin Ehn <robbin.ehn at oracle.com
>>>>>> <mailto:robbin.ehn at oracle.com>>
>>>>>> Sent: Freitag, 5. April 2019 14:07
>>>>>> To: daniel.daugherty at oracle.com
>>>>>> <mailto:daniel.daugherty at oracle.com>;
>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten
>>>>>> Varming <varming at gmail.com <mailto:varming at gmail.com>>; Roman
>>>>>> Kennke <rkennke at redhat.com <mailto:rkennke at redhat.com>>;
>>>>>> Doerr, Martin <martin.doerr at sap.com
>>>>>> <mailto:martin.doerr at sap.com>>
>>>>>> Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> (Martin there is question for you last in this email)
>>>>>>
>>>>>> After first pass I did not find any real issues.
>>>>>> Considering what you had to work with, it looks good!
>>>>>>
>>>>>> #1
>>>>>> There are some assert which are redundant (to me at least) like:
>>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>> L445
>>>>>> if (!dmw->is_marked() && dmw->hash() == 0) {
>>>>>> // This dmw is neutral and has not yet started the
>>>>>> restoration
>>>>>> // protocol so we mark a copy of the dmw to begin the
>>>>>> protocol.
>>>>>> markOop marked_dmw = dmw->set_marked();
>>>>>> assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>>>> "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>>>> marked_dmw->is_marked(), marked_dmw->hash());
>>>>>>
>>>>>> That assert is basically a test that set_marked worked?
>>>>>>
>>>>>> L505
>>>>>> if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) ==
>>>>>> DEFLATER_MARKER) {
>>>>>> assert(_succ != Self, "invariant");
>>>>>> assert(_owner == Self, "invariant");
>>>>>>
>>>>>> Assert on _owner checks that our cmpxchg is not broken?
>>>>>>
>>>>>> I think it's easier to read the code if some on the most
>>>>>> obvious asserts are
>>>>>> removed. Maybe comments instead.
>>>>>>
>>>>>> #2
>>>>>> Not your doing but I think we should remove TRAPS/Thread *
>>>>>> Self and use
>>>>>> JavaThread* instead.
>>>>>> E.g. so we can change:
>>>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>>>> Thread * const Self = THREAD;
>>>>>> assert(Self->is_Java_thread(), "invariant");
>>>>>> assert(((JavaThread *) Self)->thread_state() ==
>>>>>> _thread_blocked, "invariant");
>>>>>>
>>>>>> to:
>>>>>>
>>>>>> void ObjectMonitor::EnterI(JavaThread* Self) {
>>>>>> assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>>>
>>>>>> #3
>>>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>> 164 inline void ObjectMonitor::inc_ref_count() {
>>>>>> 165 // The increment needs to be MO_SEQ_CST. At the
>>>>>> moment, the Atomic::inc
>>>>>> 166 // backend on PPC does not yet conform to these
>>>>>> requirements. Therefore
>>>>>> 167 // the increment is simulated with a load phi; cas
>>>>>> phi + 1; loop.
>>>>>> 168 // Without this MO_SEQ_CST Atomic::inc simulation,
>>>>>> AsyncDeflateIdleMonitors
>>>>>> 169 // is not safe.
>>>>>>
>>>>>> I think was fixed with:
>>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and
>>>>>> other RMW atomics
>>>>>> You should get a leading sync and trailing one with the
>>>>>> default conservative
>>>>>> model and thus get proper memory ordering.
>>>>>> Martin, I'm I correct?
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>> On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Welcome to the OpenJDK review thread for my port of
>>>>>>> Carsten's work on:
>>>>>>>
>>>>>>> JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>> Here's the webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>
>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>
>>>>>>> Earlier versions of this patch have been through several
>>>>>>> rounds of
>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>> their own environments (including specJBB2015).
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8]
>>>>>>> testing on
>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run
>>>>>>> Kitchensink
>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>> fastdebug
>>>>>>> and slowdebug). Earlier versions have run my monitor
>>>>>>> inflation stress
>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64
>>>>>>> (product,
>>>>>>> fastdebug and slowdebug).
>>>>>>>
>>>>>>> All of the testing done on earlier versions will be redone
>>>>>>> on the
>>>>>>> latest version of the patch.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> P.S.
>>>>>>> One subtest in
>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been
>>>>>>> trying
>>>>>>> to characterize/analyze this failure for more than a week
>>>>>>> now. At
>>>>>>> this point I'm convinced that Async Monitor Deflation is
>>>>>>> aggravating
>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list