RFR(L) 8153224 Monitor deflation prolong safepoints
Karen Kinnear
karen.kinnear at oracle.com
Mon Apr 15 16:20:28 UTC 2019
Carsten,
Thank you for the quick responses.
> On Apr 5, 2019, at 11:01 PM, Carsten Varming <varming at gmail.com> wrote:
>
> Dear Karen,
>
> Please see inline answers.
>
> On Fri, Apr 5, 2019 at 4:59 PM Karen Kinnear <karen.kinnear at oracle.com <mailto:karen.kinnear at oracle.com>> wrote:
>
> 4. In EnterI: if _owner == DEFLATER_MARKER there are guarantees that 0 < _count
> with comments that caller ensured _count <= 0
> In ReenterI: guarantee 0 <= _count, with comment not _count < 0
> — Am I missing something subtle here or should they be the same guarantees?
>
> In ::enter _count is incremented when the thread is trying to acquire the monitor and decremented after the monitor has been acquired. The 0 < _count assertion is between those two point in the code. A thread acquiring a monitor and then calling wait will increment _count and then decrement _count as part of acquiring the monitor, thus _count can be 0 by the time the thread calls wait and when ReenterI is called.
Yes, thank you. Wait/ReenterI uses the waiters count to prevent deflation.
>
> 9. install_displaced_markword_in_object
> What happens if the cas_set_mark fails?
> I get that today this handles the race with enter and deflate_monitor_using_JT. If we remove
> the call from enter, is the expectation that we’ve blocked all others who did not set is_marked themselves?
> If we remove the call from enter would it make sense to ensure that the cas_set_mark succeeds here?
>
> I designed my original patch such that no thread would ever wait for the the deflating thread to finish deflating a monitor. If you remove install_displaced_markword_in_object from enter, then the entering thread can end up busy waiting by continuously reading the monitor pointer from the object mark word and then realizing that the monitor is being deflated and it should retry by going back to reading the object mark word. This bad behavior is completely avoided by calling install_displaced_markword_in_object.
>
> In my original patch no thread would ever wait for a deflating thread to finish. This property got lost in FastHashCode as that function evolved since I wrote my patch, but I think this property is worth preserving where possible. It might even be worth looking at FastHashCode to see if we can re-establish this property.
Got it. So the model was that all threads that detected the deflating thread starting to operate on this monitor could
make progress rather than retry loops. And because enter can claim a monitor that has started deflating but not yet fully claimed it - this could be a non-trivial wait.
Makes sense - thank you for the explanation.
thanks,
Karen
>
> I hope this helps.
>
> Best,
> Carsten
>
>> On Apr 5, 2019, at 12:10 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>> Filed:
>>
>> JDK-8222034 Thread-SMR functions should be updated to remove work around
>> https://bugs.openjdk.java.net/browse/JDK-8222034 <https://bugs.openjdk.java.net/browse/JDK-8222034>
>>
>> Martin and Robbin, please check it out and make sure that I captured
>> things correctly...
>>
>> Dan
>>
>>
>>
>> On 4/5/19 12:01 PM, Daniel D. Daugherty wrote:
>>> On 4/5/19 8:37 AM, Doerr, Martin wrote:
>>>> Hi everybody,
>>>>
>>>>> I think was fixed with:
>>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and other RMW atomics
>>>>> You should get a leading sync and trailing one with the default conservative
>>>>> model and thus get proper memory ordering.
>>>>> Martin, I'm I correct?
>>>> Exactly. Thanks for pointing this out. PPC uses the strongest possible ordering semantics with memory_order_conservative (default parameter).
>>>> I've seen that comment about PPC in "void ThreadsList::inc_nested_handle_cnt()". This function could get replaced.
>>>
>>> Okay so we need a new bug to update these two Thread-SMR functions:
>>>
>>> src/hotspot/share/runtime/threadSMR.cpp:
>>>
>>> void ThreadsList::dec_nested_handle_cnt() {
>>> // The decrement needs to be MO_ACQ_REL. At the moment, the Atomic::dec
>>> // backend on PPC does not yet conform to these requirements. Therefore
>>> // the decrement is simulated with an Atomic::sub(1, &addr).
>>> // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR mechanism
>>> // is not generally safe to use.
>>> Atomic::sub(1, &_nested_handle_cnt);
>>> }
>>>
>>> void ThreadsList::inc_nested_handle_cnt() {
>>> // The increment needs to be MO_SEQ_CST. At the moment, the Atomic::inc
>>> // backend on PPC does not yet conform to these requirements. Therefore
>>> // the increment is simulated with a load phi; cas phi + 1; loop.
>>> // Without this MO_SEQ_CST Atomic::inc simulation, the nested SMR mechanism
>>> // is not generally safe to use.
>>> intx sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>> for (;;) {
>>> if (Atomic::cmpxchg(sample + 1, &_nested_handle_cnt, sample) == sample) {
>>> return;
>>> } else {
>>> sample = OrderAccess::load_acquire(&_nested_handle_cnt);
>>> }
>>> }
>>> }
>>>
>>> I'll file a new bug, loop in Robbin, Erik O and Martin, and make
>>> sure we're all in agreement. Once we decide that Thread-SMR's
>>> functions look like, I'll adapt my Async Monitor Deflation
>>> functions...
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>
>>>> Sent: Freitag, 5. April 2019 14:07
>>>> To: daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>; hotspot-runtime-dev at openjdk.java.net <mailto:hotspot-runtime-dev at openjdk.java.net>; Carsten Varming <varming at gmail.com <mailto:varming at gmail.com>>; Roman Kennke <rkennke at redhat.com <mailto:rkennke at redhat.com>>; Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
>>>> Subject: Re: RFR(L) 8153224 Monitor deflation prolong safepoints
>>>>
>>>> Hi Dan,
>>>>
>>>> (Martin there is question for you last in this email)
>>>>
>>>> After first pass I did not find any real issues.
>>>> Considering what you had to work with, it looks good!
>>>>
>>>> #1
>>>> There are some assert which are redundant (to me at least) like:
>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>> L445
>>>> if (!dmw->is_marked() && dmw->hash() == 0) {
>>>> // This dmw is neutral and has not yet started the restoration
>>>> // protocol so we mark a copy of the dmw to begin the protocol.
>>>> markOop marked_dmw = dmw->set_marked();
>>>> assert(marked_dmw->is_marked() && marked_dmw->hash() == 0,
>>>> "sanity_check: is_marked=%d, hash=" INTPTR_FORMAT,
>>>> marked_dmw->is_marked(), marked_dmw->hash());
>>>>
>>>> That assert is basically a test that set_marked worked?
>>>>
>>>> L505
>>>> if (Atomic::cmpxchg(Self, &_owner, DEFLATER_MARKER) == DEFLATER_MARKER) {
>>>> assert(_succ != Self, "invariant");
>>>> assert(_owner == Self, "invariant");
>>>>
>>>> Assert on _owner checks that our cmpxchg is not broken?
>>>>
>>>> I think it's easier to read the code if some on the most obvious asserts are
>>>> removed. Maybe comments instead.
>>>>
>>>> #2
>>>> Not your doing but I think we should remove TRAPS/Thread * Self and use
>>>> JavaThread* instead.
>>>> E.g. so we can change:
>>>> void ObjectMonitor::EnterI(TRAPS) {
>>>> Thread * const Self = THREAD;
>>>> assert(Self->is_Java_thread(), "invariant");
>>>> assert(((JavaThread *) Self)->thread_state() == _thread_blocked, "invariant");
>>>>
>>>> to:
>>>>
>>>> void ObjectMonitor::EnterI(JavaThread* Self) {
>>>> assert(Self->thread_state() == _thread_blocked, "invariant");
>>>>
>>>> #3
>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>> 164 inline void ObjectMonitor::inc_ref_count() {
>>>> 165 // The increment needs to be MO_SEQ_CST. At the moment, the Atomic::inc
>>>> 166 // backend on PPC does not yet conform to these requirements. Therefore
>>>> 167 // the increment is simulated with a load phi; cas phi + 1; loop.
>>>> 168 // Without this MO_SEQ_CST Atomic::inc simulation, AsyncDeflateIdleMonitors
>>>> 169 // is not safe.
>>>>
>>>> I think was fixed with:
>>>> 8202080: Introduce ordering semantics for Atomic::add/inc and other RMW atomics
>>>> You should get a leading sync and trailing one with the default conservative
>>>> model and thus get proper memory ordering.
>>>> Martin, I'm I correct?
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 3/24/19 2:57 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>
>>>>> JDK-8153224 Monitor deflation prolong safepoints
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224 <https://bugs.openjdk.java.net/browse/JDK-8153224>
>>>>>
>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation <https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation>
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/ <http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/>
>>>>>
>>>>> Here's a link to Carsten's original webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/ <http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/>
>>>>>
>>>>> Earlier versions of this patch have been through several rounds of
>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>> Roman for their preliminary code review comments. A very special
>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>> their own environments (including specJBB2015).
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>> fastdebug and slowdebug).
>>>>>
>>>>> All of the testing done on earlier versions will be redone on the
>>>>> latest version of the patch.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> P.S.
>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>> to characterize/analyze this failure for more than a week now. At
>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>> an existing bug. However, I plan to have a better handle on that
>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list