RFR(L) 8153224 Monitor deflation prolong safepoints (CR7/v2.07/10-for-jdk14)

Tue Nov 5 05:31:32 UTC 2019

Hi Dan,

On 5/11/2019 11:31 am, Daniel D. Daugherty wrote:
> Hi David,
> 
> Thanks for continuing to provide feedback on the Async Monitor Deflation
> project! I appreciate your reviews very much...
> 
> Responses embedded below (as usual)...

Ditto. :)

> 
> On 11/4/19 1:28 AM, David Holmes wrote:
>> Hi Dan,
>>
>> A few follow ups to your responses, with trimming ...
>>
>> On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>>  122 // Set _owner field to new_value; current value must match 
>>>> old_value.
>>>>  123 inline void ObjectMonitor::set_owner_from(void* new_value, 
>>>> void* old_value) {
>>>>  124   void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>>>  125   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>>>> INTPTR_FORMAT
>>>>
>>>> The use of cmpxchg seems a little strange here if you are asserting 
>>>> that when this is called _owner must equal old_value. That means you 
>>>> don't expect any race and if there is no race with another thread 
>>>> writing to _owner then you don't need the cmpxchg. A normal:
>>>>
>>>> if (_owner == old_value) {
>>>>    Atomic::store(&_owner, new_value);
>>>>    log(...);
>>>> } else {
>>>>    guarantee(false, " unexpected old owner ...");
>>>> }
>>>
>>> The two parameter version of set_owner_from() is only called from three
>>> places and we'll cover two of them here:
>>>
>>> src/hotspot/share/runtime/objectMonitor.cpp:
>>>
>>> 1041     if (AsyncDeflateIdleMonitors) {
>>> 1042       set_owner_from(NULL, Self);
>>> 1043     } else {
>>> 1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>>> the lock
>>> 1045       OrderAccess::storeload();                        // See if 
>>> we need to wake a successor
>>> 1046     }
>>>
>>> and:
>>>
>>> 1221   if (AsyncDeflateIdleMonitors) {
>>> 1222     set_owner_from(NULL, Self);
>>> 1223   } else {
>>> 1224     OrderAccess::release_store(&_owner, (void*)NULL);
>>> 1225     OrderAccess::fence();                               // ST 
>>> _owner vs LD in unpark()
>>> 1226   }
>>>
>>> So I've replaced the existing {release_store(), storeload()} combo 
>>> for one
>>> call site and the existing {release_store(), fence()} combo for the 
>>> other
>>> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
>>>
>>> 1) I wanted the same memory sync behavior at both call sites.
>>> 2) I wanted similar/same memory sync behavior as the original
>>>     code at those call sites.
>>
>> Why? The memory sync requirements for non-async deflation may be 
>> completely different to those required for async-delfation (given all 
>> the other bits if the protocol).
> 
> Good point!
> 
> For context, the first code block above (L1041-6) is in 
> ObjectMonitor::exit()
> and the second code block above (L1221-6) is in ObjectMonitor::ExitEpilog()
> which is called from two different places by ObjectMonitor::exit(). In both
> cases, we are setting the _owner field to NULL which will potentially make
> the ObjectMonitor async deflatible (depending on ref_count).
> 
> For async deflation, I want the full fence semantics after setting the
> _owner field to NULL in both locations:
> 
> src/hotspot/share/runtime/orderAccess.hpp:
> //                       Constraint     x86          sparc TSO          ppc
> // 
> ---------------------------------------------------------------------------
> // fence                 LoadStore  |   lock         membar #StoreLoad  
> sync
> //                       StoreStore |   addl 0,(sp)
> //                       LoadLoad   |
> //                       StoreLoad
> //
> // release               LoadStore | lwsync
> //                       StoreStore
> 
> I don't want any loads or stores floating into or out of the critical 
> region.
> 
> 
> *** Side bar here ****
> 
> I just noticed something with the original code:
> 
> 1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
> lock
> 1045       OrderAccess::storeload();                        // See if we 
> need to wake a successor
> 
> For constraints, this gives us:
>             {LoadStore | StoreStore}
>             {StoreLoad}
> at L1044-5. So the original code is just "missing" LoadLoad relative
> to a full fence(). I'm not sure why this kind of load is allowed to
> float into the critical region, but the code has been this way for a
> very long time.

You seem to overlooking the fact that your store appears between the 
various memory barriers e.g.

              {LoadStore | StoreStore}
              ST _owner, 0
              {StoreLoad}

which establishes the effects of those barriers with respect to that 
store. So loadload() would be superfluous as we've already ensured that 
no loads can float above the store, due to the storeload barrier.

A full fence is logically all 4 barriers in that it ensures all loads 
and all stores remain on their respective sides of the fence - nothing 
can cross it.

> And for this original code:
> 
> 1224     OrderAccess::release_store(&_owner, (void*)NULL);
> 1225     OrderAccess::fence();                               // ST 
> _owner vs LD in unpark()
> 
> For constraints, this gives us:
>           {LoadStore | StoreStore}
>           {LoadStore | StoreStore | LoadLoad | StoreLoad}
> at L1224-5. Again this code has been this way for a very long time.
> 
> It seems to me that L1224-5 could be written like this:
> 
> 1224     _owner = NULL;
> 1225     OrderAccess::fence();                               // ST 
> _owner vs LD in unpark()
> 
> with a plain store on L1224. Is that correct?

No, I don't believe so.

What we are also in danger of overlooking here is the presence of memory 
synchronization instructions related to the semantics of a synchronized 
code block in Java, and the presence of memory synchronization 
instructions needed for the correct implementation of the 
synchronization subsystem itself. Specifically given:

OrderAccess::release_store(&_owner, (void*)NULL);
OrderAccess::<some other sync op>

The release store ensures that the releasing of the monitor cannot be 
reordered with respect to any of the stores that occurred within the 
synchronized block at the Java-level. (And it _might_ also ensure some 
property of the sync implementation.) While "some other op" is typically 
needed only because of the way we implement the synchronization 
subsystem - as per the comments e.g.

storeload(); // See if we need to wake a successor

we don't want to load a potential successor before we set _owner to NULL 
else we might read the wrong value.

> *** End side bar ***
> 
> 
>>
>>> 3) I wanted the return value from cmpxchg() for my state machine
>>>     sanity check.
>>
>> I'm somewhat dubious about using cmpxchg just for the side-effect of 
>> getting the existing value.
> 
> But I'm not "using cmpxchg just for the side-effect of getting the 
> existing value".
> 
> That's the third thing on my list of three reasons. The most important
> thing is I want the full fence that cmpcxhg() gives me. Above I said:
> 
>  > 1) I wanted the same memory sync behavior at both call sites.
>  > 2) I wanted similar/same memory sync behavior as the original
>  >     code at those call sites.
> 
> Using cmpxchg() gives me the full fence I want and that's similar to
> this baseline code at this call site:

Yes but the memory sync effects of cmpxchg are secondary to it primary 
purpose: which is to provide an atomic compare-and-exchange in the face 
of concurrent updates to a variable.

> 
> 1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
> lock
> 1045       OrderAccess::storeload();                        // See if we 
> need to wake a successor
> 
> I'm getting the LoadLoad that the baseline site doesn't have.

And which it doesn't need.

> The cmpxchg() gives me the same memory constaints as this baseline code
> at this call site:
> 
> 1224     OrderAccess::release_store(&_owner, (void*)NULL);
> 1225     OrderAccess::fence();                               // ST 
> _owner vs LD in unpark()
> 
> Note: Actually, I don't have the extra {LoadStore | StoreStore} from
> the release_store() that I mentioned in the side bar above...
> 
> The last thing that I get is the existing value...
> 
> 
> Okay, so I thought it was a pretty cool use of cmpxchg(), but I'm
> obviously confusing code readers. So here's the v2.08 set_owner_from():
> 
>   124 // Set _owner field to new_value; current value must match old_value.
>   125 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
> old_value) {
>   126   void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>   127   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
>   128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(old_value));
>   129   log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
> INTPTR_FORMAT
>   130                                      ", prev=" INTPTR_FORMAT ", new="
>   131                                      INTPTR_FORMAT, p2i(this), 
> p2i(prev),
>   132                                      p2i(new_value));
>   133 }
> 
> I could change it like this:
> 
>   124 // Set _owner field to new_value; current value must match old_value.
>   125 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
> old_value) {
>   126   void* prev = _owner;
>   127   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
>   128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(old_value));
>   129   _owner = new_value;
>   130   OrderAccess::fence();
>   131   log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
> INTPTR_FORMAT
>   132                                      ", prev=" INTPTR_FORMAT ", new="
>   133                                      INTPTR_FORMAT, p2i(this), 
> p2i(prev),
>   134                                      p2i(new_value));
>   135 }
> 
> It's two lines longer, but it should require less head scratching to
> figure out what I'm trying to do. Would this be acceptable?

As per previous discussion I think you still need a release_store of 
_owner (at least in the case where you are releasing the monitor).

That's it on this thread. I still have to look at version 2.08 in full.

Thanks,
David
-----

> 
> 
>>
>>> I don't think that using 'Atomic::store(&_owner, new_value)' is the
>>> right choice for these two call sites.
>>
>> If you don't actually need the cmpxchg to handle concurrent updates to 
>> the _owner field, then a plain store (not an Atomic::store - that was 
>> an error on my part) does not seem unreasonable; or if there are still 
>> memory sync issues here, perhaps a release_store.
> 
> So in the above proposed code I switched to a plain store followed by
> a fence().
> 
> 
>> If you use cmpxchg then anyone reading the code will assume there is a 
>> concurrent update that you are guarding against.
> 
> Yup. I concede the point that I'm obviously confusing the other
> code readers... sorry about that...
> 
> 
>>
>>> The last two parameter set_owner_from() is talked about in the
>>> next reply.
>>>
>>>
>>>> Similarly for the old_value1/old_valuie2 version.
>>>
>>> The three parameter version of set_owner_from() is only called from one
>>> place and the last two parameter version is called from the same place:
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>
>>> 1903       if (AsyncDeflateIdleMonitors) {
>>> 1904         m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
>>> 1905       } else {
>>> 1906         m->set_owner_from(mark.locker(), NULL);
>>> 1907       }
>>>
>>> The original code was:
>>>
>>> 1399       m->set_owner(mark.locker());
>>>
>>> The original set_owner() code was defined like this:
>>>
>>>    87 inline void ObjectMonitor::set_owner(void* owner) {
>>>    88   _owner = owner;
>>>    89 }
>>>
>>> So the original code didn't do any memory sync'ing at all and I've
>>> changed that to a cmpxchg() on both code paths. That appears to be
>>> overkill for that callsite...
>>
>> Again I'm not sure any memory sync requirements from the non-async 
>> case should necessarily transfer over to the async case. Even if you 
>> end up requiring similar memory sync the reasoning would be quite 
>> different I would expect.
> 
> In this case, both async deflation and safepoint based deflation are
> happy with the same memory sync because the newly allocated ObjectMonitor
> isn't published yet so it is not deflatible by either mechanism. Also the
> act of publishing the ObjectMonitor* will take care of the memory sync.
> 
> 
>>
>>>
>>> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
>>> section of the code. We've gotten our ObjectMonitor from om_alloc()
>>> and are initializing a number of fields in the ObjectMonitor. The
>>> ObjectMonitor is not published until we do:
>>>
>>> 1916       object->release_set_mark(markWord::encode(m));
>>>
>>> So we don't need the memory sync'ing features of the cmpxchg() for
>>> either of the set_owner_from() calls and all that leaves is the
>>> state machine sanity check.
>>>
>>> I really like the state machine sanity check on the owner field but
>>> that's just because it came in handy when chasing the recent races.
>>> It would be easy to change the three parameter version of
>>> set_owner_from() to not do memory sync'ing, but still do the state
>>> machine sanity check.
>>>
>>> Update: Changing the three parameter version of set_owner_from()
>>> may impact the changes to owner_is_DEFLATER_MARKER() discussed
>>> above. Sigh...
>>> Update 2: Probably no impact because the three parameter version of
>>> set_owner_from() is only used before the ObjectMonitor is published
>>> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
>>> appeared on an in-use list.
>>>
>>> However, the two parameter version of set_owner_from() needs its
>>> memory sync'ing behavior for it's objectMonitor.cpp call sites so
>>> this call site would need something different.
>>>
>>> I'm not sure which solution I'm going to pick yet, but I definitely
>>> have to change something here since we don't need cmpxchg() at this
>>> call site. More thought is required.
>>
>> I will look to see where this ended up.
> 
> I'll wait to see if you can live with the v2.08 version. I hope so...
> 
> 
>>
>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>
>>>>
>>>>  267   if (AsyncDeflateIdleMonitors &&
>>>>  268       try_set_owner_from(Self, DEFLATER_MARKER) == 
>>>> DEFLATER_MARKER) {
>>>
>>> For more context, we are in:
>>>
>>>   241 void ObjectMonitor::enter(TRAPS) {
>>>
>>>
>>>> I don't see why you need to call try_set_owner_from again here as 
>>>> "cur" will already be DEFLATER_MARKER from the previous try_set_owner.
>>>
>>> I assume the previous try_set_owner() call you mean is this one:
>>>
>>>   248   void* cur = try_set_owner_from(Self, NULL);
>>>
>>> This first try_set_owner() is for the most common case of no owner.
>>>
>>> The second try_set_owner() call is for a different condition than the 
>>> first:
>>>
>>>   268       try_set_owner_from(Self, DEFLATER_MARKER) == 
>>> DEFLATER_MARKER) {
>>>
>>> L248 is trying to change the _owner field from NULL -> 'Self'.
>>> L268 is trying to change the _owner field from DEFLATER_MARKER to 
>>> 'Self'.
>>>
>>> If the try_set_owner() call on L248 fails, 'cur' can be several possible
>>> values:
>>>
>>>    - the calling thread (recursive enter is handled on L254-7)
>>>    - other owning thread value (BasicLock* or Thread*)
>>>    - DEFLATER_MARKER
>>
>> I'll give a caution okay to that explanation (the deficiency being in 
>> my understanding, not your explaining :) ).
> 
> Thanks. I'll take it!
> 
> 
>>
>>>> Further, I don't see how installing self as the _owner here is valid 
>>>> and means you acquired the monitor, as the fact it was 
>>>> DEFLATER_MARKER means it is still being deflated by another thread 
>>>> doesn't it ???
>>>
>>> I guess the comment after L268 didn't work for you:
>>>
>>>   269     // The deflation protocol finished the first part (setting 
>>> owner),
>>>   270     // but it failed the second part (making ref_count 
>>> negative) and
>>>   271     // bailed. Or the ObjectMonitor was async deflated and reused.
>>>
>>> It means that the deflater thread was racing with this enter and
>>> managed to set the owner field to DEFLATER_MARKER as the first step
>>> in the deflation protocol. Our entering thread actually won the race
>>> when it managed to set the ref_count to a positive value as part of
>>> the ObjectMonitorHandle stuff done in the inflate() call that preceded
>>> the enter() call. However, the deflater thread hasn't realized that it
>>> lost the race yet and hasn't restored the owner field back to NULL.
>>
>> You're right the comment didn't work for me as it required me to be 
>> holding too much of the protocol in my head. Makes more sense now.
> 
> Good to hear!
> 
> 
>>
>> Thanks,
>> David
>> -----
> 
> Thanks again for the thorough reviews!
> 
> Dan
>