RFR(L) 8153224 Monitor deflation prolong safepoints (CR7/v2.07/10-for-jdk14)

Tue Nov 5 15:36:37 UTC 2019

Hi David,

On 11/5/19 12:31 AM, David Holmes wrote:
> Hi Dan,
>
> On 5/11/2019 11:31 am, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Thanks for continuing to provide feedback on the Async Monitor Deflation
>> project! I appreciate your reviews very much...
>>
>> Responses embedded below (as usual)...
>
> Ditto. :)

And again...

>
>>
>> On 11/4/19 1:28 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> A few follow ups to your responses, with trimming ...
>>>
>>> On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
>>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>>>  122 // Set _owner field to new_value; current value must match 
>>>>> old_value.
>>>>>  123 inline void ObjectMonitor::set_owner_from(void* new_value, 
>>>>> void* old_value) {
>>>>>  124   void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>>>>  125   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>>>>> INTPTR_FORMAT
>>>>>
>>>>> The use of cmpxchg seems a little strange here if you are 
>>>>> asserting that when this is called _owner must equal old_value. 
>>>>> That means you don't expect any race and if there is no race with 
>>>>> another thread writing to _owner then you don't need the cmpxchg. 
>>>>> A normal:
>>>>>
>>>>> if (_owner == old_value) {
>>>>>    Atomic::store(&_owner, new_value);
>>>>>    log(...);
>>>>> } else {
>>>>>    guarantee(false, " unexpected old owner ...");
>>>>> }
>>>>
>>>> The two parameter version of set_owner_from() is only called from 
>>>> three
>>>> places and we'll cover two of them here:
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.cpp:
>>>>
>>>> 1041     if (AsyncDeflateIdleMonitors) {
>>>> 1042       set_owner_from(NULL, Self);
>>>> 1043     } else {
>>>> 1044       OrderAccess::release_store(&_owner, (void*)NULL); // 
>>>> drop the lock
>>>> 1045       OrderAccess::storeload(); // See if we need to wake a 
>>>> successor
>>>> 1046     }
>>>>
>>>> and:
>>>>
>>>> 1221   if (AsyncDeflateIdleMonitors) {
>>>> 1222     set_owner_from(NULL, Self);
>>>> 1223   } else {
>>>> 1224     OrderAccess::release_store(&_owner, (void*)NULL);
>>>> 1225     OrderAccess::fence(); // ST _owner vs LD in unpark()
>>>> 1226   }
>>>>
>>>> So I've replaced the existing {release_store(), storeload()} combo 
>>>> for one
>>>> call site and the existing {release_store(), fence()} combo for the 
>>>> other
>>>> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
>>>>
>>>> 1) I wanted the same memory sync behavior at both call sites.
>>>> 2) I wanted similar/same memory sync behavior as the original
>>>>     code at those call sites.
>>>
>>> Why? The memory sync requirements for non-async deflation may be 
>>> completely different to those required for async-delfation (given 
>>> all the other bits if the protocol).
>>
>> Good point!
>>
>> For context, the first code block above (L1041-6) is in 
>> ObjectMonitor::exit()
>> and the second code block above (L1221-6) is in 
>> ObjectMonitor::ExitEpilog()
>> which is called from two different places by ObjectMonitor::exit(). 
>> In both
>> cases, we are setting the _owner field to NULL which will potentially 
>> make
>> the ObjectMonitor async deflatible (depending on ref_count).
>>
>> For async deflation, I want the full fence semantics after setting the
>> _owner field to NULL in both locations:
>>
>> src/hotspot/share/runtime/orderAccess.hpp:
>> //                       Constraint     x86          sparc 
>> TSO          ppc
>> // 
>> ---------------------------------------------------------------------------
>> // fence                 LoadStore  |   lock         membar 
>> #StoreLoad  sync
>> //                       StoreStore |   addl 0,(sp)
>> //                       LoadLoad   |
>> //                       StoreLoad
>> //
>> // release               LoadStore | lwsync
>> //                       StoreStore
>>
>> I don't want any loads or stores floating into or out of the critical 
>> region.
>>
>>
>> *** Side bar here ****
>>
>> I just noticed something with the original code:
>>
>> 1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>> the lock
>> 1045       OrderAccess::storeload();                        // See if 
>> we need to wake a successor
>>
>> For constraints, this gives us:
>>             {LoadStore | StoreStore}
>>             {StoreLoad}
>> at L1044-5. So the original code is just "missing" LoadLoad relative
>> to a full fence(). I'm not sure why this kind of load is allowed to
>> float into the critical region, but the code has been this way for a
>> very long time.
>
> You seem to overlooking the fact that your store appears between the 
> various memory barriers e.g.
>
>              {LoadStore | StoreStore}
>              ST _owner, 0
>              {StoreLoad}
>
> which establishes the effects of those barriers with respect to that 
> store. So loadload() would be superfluous as we've already ensured 
> that no loads can float above the store, due to the storeload barrier.

Ouch! And yes I did overlook the store. This is what I get for writing
a reply late in the day and not letting it sit for re-review the next
morning. My apologies for being in a hurry.

> A full fence is logically all 4 barriers in that it ensures all loads 
> and all stores remain on their respective sides of the fence - nothing 
> can cross it.
>
>> And for this original code:
>>
>> 1224     OrderAccess::release_store(&_owner, (void*)NULL);
>> 1225     OrderAccess::fence();                               // ST 
>> _owner vs LD in unpark()
>>
>> For constraints, this gives us:
>>           {LoadStore | StoreStore}
>>           {LoadStore | StoreStore | LoadLoad | StoreLoad}
>> at L1224-5. Again this code has been this way for a very long time.
>>
>> It seems to me that L1224-5 could be written like this:
>>
>> 1224     _owner = NULL;
>> 1225     OrderAccess::fence();                               // ST 
>> _owner vs LD in unpark()
>>
>> with a plain store on L1224. Is that correct?
>
> No, I don't believe so.

And neither do I in the morning light. :-)

> What we are also in danger of overlooking here is the presence of 
> memory synchronization instructions related to the semantics of a 
> synchronized code block in Java, and the presence of memory 
> synchronization instructions needed for the correct implementation of 
> the synchronization subsystem itself. Specifically given:
>
> OrderAccess::release_store(&_owner, (void*)NULL);
> OrderAccess::<some other sync op>
>
> The release store ensures that the releasing of the monitor cannot be 
> reordered with respect to any of the stores that occurred within the 
> synchronized block at the Java-level. (And it _might_ also ensure some 
> property of the sync implementation.) While "some other op" is 
> typically needed only because of the way we implement the 
> synchronization subsystem - as per the comments e.g.
>
> storeload(); // See if we need to wake a successor
>
> we don't want to load a potential successor before we set _owner to 
> NULL else we might read the wrong value.

Agreed.

>
>> *** End side bar ***
>>
>>
>>>
>>>> 3) I wanted the return value from cmpxchg() for my state machine
>>>>     sanity check.
>>>
>>> I'm somewhat dubious about using cmpxchg just for the side-effect of 
>>> getting the existing value.
>>
>> But I'm not "using cmpxchg just for the side-effect of getting the 
>> existing value".
>>
>> That's the third thing on my list of three reasons. The most important
>> thing is I want the full fence that cmpcxhg() gives me. Above I said:
>>
>>  > 1) I wanted the same memory sync behavior at both call sites.
>>  > 2) I wanted similar/same memory sync behavior as the original
>>  >     code at those call sites.
>>
>> Using cmpxchg() gives me the full fence I want and that's similar to
>> this baseline code at this call site:
>
> Yes but the memory sync effects of cmpxchg are secondary to it primary 
> purpose: which is to provide an atomic compare-and-exchange in the 
> face of concurrent updates to a variable.

Agreed.

>
>> 1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>> the lock
>> 1045       OrderAccess::storeload();                        // See if 
>> we need to wake a successor
>>
>> I'm getting the LoadLoad that the baseline site doesn't have.
>
> And which it doesn't need.

And my brain returns to the more fundamental question of why do we have
OrderAccess::storeload() at L1045 and OrderAccess::fence() at L1225?
Both sites are trying to separate the release_store(&_owner, NULL) from
a subsequent load. In the first case:

1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
lock
1045       OrderAccess::storeload();                        // See if we 
need to wake a successor
<snip>
1047     if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != NULL) {

In the second case:

1224     OrderAccess::release_store(&_owner, (void*)NULL);
1225     OrderAccess::fence();                               // ST 
_owner vs LD in unpark()
<snip>
1229   Trigger->unpark();

The code has been this way for a very long time, but why?

Of course, this question about the baseline code is still a sidebar.
An interesting sidebar, but...

For Async Monitor Deflation, I think we need fence() at both locations
for proper interaction with the deflater thread.

>
>> The cmpxchg() gives me the same memory constaints as this baseline code
>> at this call site:
>>
>> 1224     OrderAccess::release_store(&_owner, (void*)NULL);
>> 1225     OrderAccess::fence();                               // ST 
>> _owner vs LD in unpark()
>>
>> Note: Actually, I don't have the extra {LoadStore | StoreStore} from
>> the release_store() that I mentioned in the side bar above...
>>
>> The last thing that I get is the existing value...
>>
>>
>> Okay, so I thought it was a pretty cool use of cmpxchg(), but I'm
>> obviously confusing code readers. So here's the v2.08 set_owner_from():
>>
>>   124 // Set _owner field to new_value; current value must match 
>> old_value.
>>   125 inline void ObjectMonitor::set_owner_from(void* new_value, 
>> void* old_value) {
>>   126   void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>   127   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>> INTPTR_FORMAT
>>   128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
>> p2i(old_value));
>>   129   log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
>> INTPTR_FORMAT
>>   130                                      ", prev=" INTPTR_FORMAT ", 
>> new="
>>   131                                      INTPTR_FORMAT, p2i(this), 
>> p2i(prev),
>>   132                                      p2i(new_value));
>>   133 }
>>
>> I could change it like this:
>>
>>   124 // Set _owner field to new_value; current value must match 
>> old_value.
>>   125 inline void ObjectMonitor::set_owner_from(void* new_value, 
>> void* old_value) {
>>   126   void* prev = _owner;
>>   127   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>> INTPTR_FORMAT
>>   128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
>> p2i(old_value));
>>   129   _owner = new_value;
>>   130   OrderAccess::fence();
>>   131   log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
>> INTPTR_FORMAT
>>   132                                      ", prev=" INTPTR_FORMAT ", 
>> new="
>>   133                                      INTPTR_FORMAT, p2i(this), 
>> p2i(prev),
>>   134                                      p2i(new_value));
>>   135 }
>>
>> It's two lines longer, but it should require less head scratching to
>> figure out what I'm trying to do. Would this be acceptable?
>
> As per previous discussion I think you still need a release_store of 
> _owner (at least in the case where you are releasing the monitor).

I've made a mistake with this encapsulation. I made it look like a
general setter of a new value. In reality, both callers specify
new_value == NULL so we don't actually need the new_value parameter.

I think it needs to be something like this:

   124 // Clear _owner field; current value must match old_value.
   125 inline void ObjectMonitor::clear_owner_from(void* old_value) {
   126   void* prev = _owner;
   127   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
INTPTR_FORMAT
   128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
p2i(old_value));
   129   OrderAccess::release_store(&_owner, (void*)NULL);
   130   OrderAccess::fence();
   131   log_trace(monitorinflation, owner)("clear_owner_from(): mid=" 
INTPTR_FORMAT
   132                                      ", prev=" INTPTR_FORMAT, 
p2i(this), p2i(prev));
   133 }

Thanks for sticking with this part of the review.

> That's it on this thread. I still have to look at version 2.08 in full.

I look forward to your feedback!

Dan

>
> Thanks,
> David
> -----
>
>>
>>
>>>
>>>> I don't think that using 'Atomic::store(&_owner, new_value)' is the
>>>> right choice for these two call sites.
>>>
>>> If you don't actually need the cmpxchg to handle concurrent updates 
>>> to the _owner field, then a plain store (not an Atomic::store - that 
>>> was an error on my part) does not seem unreasonable; or if there are 
>>> still memory sync issues here, perhaps a release_store.
>>
>> So in the above proposed code I switched to a plain store followed by
>> a fence().
>>
>>
>>> If you use cmpxchg then anyone reading the code will assume there is 
>>> a concurrent update that you are guarding against.
>>
>> Yup. I concede the point that I'm obviously confusing the other
>> code readers... sorry about that...
>>
>>
>>>
>>>> The last two parameter set_owner_from() is talked about in the
>>>> next reply.
>>>>
>>>>
>>>>> Similarly for the old_value1/old_valuie2 version.
>>>>
>>>> The three parameter version of set_owner_from() is only called from 
>>>> one
>>>> place and the last two parameter version is called from the same 
>>>> place:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>
>>>> 1903       if (AsyncDeflateIdleMonitors) {
>>>> 1904         m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
>>>> 1905       } else {
>>>> 1906         m->set_owner_from(mark.locker(), NULL);
>>>> 1907       }
>>>>
>>>> The original code was:
>>>>
>>>> 1399       m->set_owner(mark.locker());
>>>>
>>>> The original set_owner() code was defined like this:
>>>>
>>>>    87 inline void ObjectMonitor::set_owner(void* owner) {
>>>>    88   _owner = owner;
>>>>    89 }
>>>>
>>>> So the original code didn't do any memory sync'ing at all and I've
>>>> changed that to a cmpxchg() on both code paths. That appears to be
>>>> overkill for that callsite...
>>>
>>> Again I'm not sure any memory sync requirements from the non-async 
>>> case should necessarily transfer over to the async case. Even if you 
>>> end up requiring similar memory sync the reasoning would be quite 
>>> different I would expect.
>>
>> In this case, both async deflation and safepoint based deflation are
>> happy with the same memory sync because the newly allocated 
>> ObjectMonitor
>> isn't published yet so it is not deflatible by either mechanism. Also 
>> the
>> act of publishing the ObjectMonitor* will take care of the memory sync.
>>
>>
>>>
>>>>
>>>> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
>>>> section of the code. We've gotten our ObjectMonitor from om_alloc()
>>>> and are initializing a number of fields in the ObjectMonitor. The
>>>> ObjectMonitor is not published until we do:
>>>>
>>>> 1916       object->release_set_mark(markWord::encode(m));
>>>>
>>>> So we don't need the memory sync'ing features of the cmpxchg() for
>>>> either of the set_owner_from() calls and all that leaves is the
>>>> state machine sanity check.
>>>>
>>>> I really like the state machine sanity check on the owner field but
>>>> that's just because it came in handy when chasing the recent races.
>>>> It would be easy to change the three parameter version of
>>>> set_owner_from() to not do memory sync'ing, but still do the state
>>>> machine sanity check.
>>>>
>>>> Update: Changing the three parameter version of set_owner_from()
>>>> may impact the changes to owner_is_DEFLATER_MARKER() discussed
>>>> above. Sigh...
>>>> Update 2: Probably no impact because the three parameter version of
>>>> set_owner_from() is only used before the ObjectMonitor is published
>>>> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
>>>> appeared on an in-use list.
>>>>
>>>> However, the two parameter version of set_owner_from() needs its
>>>> memory sync'ing behavior for it's objectMonitor.cpp call sites so
>>>> this call site would need something different.
>>>>
>>>> I'm not sure which solution I'm going to pick yet, but I definitely
>>>> have to change something here since we don't need cmpxchg() at this
>>>> call site. More thought is required.
>>>
>>> I will look to see where this ended up.
>>
>> I'll wait to see if you can live with the v2.08 version. I hope so...
>>
>>
>>>
>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>
>>>>>
>>>>>  267   if (AsyncDeflateIdleMonitors &&
>>>>>  268       try_set_owner_from(Self, DEFLATER_MARKER) == 
>>>>> DEFLATER_MARKER) {
>>>>
>>>> For more context, we are in:
>>>>
>>>>   241 void ObjectMonitor::enter(TRAPS) {
>>>>
>>>>
>>>>> I don't see why you need to call try_set_owner_from again here as 
>>>>> "cur" will already be DEFLATER_MARKER from the previous 
>>>>> try_set_owner.
>>>>
>>>> I assume the previous try_set_owner() call you mean is this one:
>>>>
>>>>   248   void* cur = try_set_owner_from(Self, NULL);
>>>>
>>>> This first try_set_owner() is for the most common case of no owner.
>>>>
>>>> The second try_set_owner() call is for a different condition than 
>>>> the first:
>>>>
>>>>   268       try_set_owner_from(Self, DEFLATER_MARKER) == 
>>>> DEFLATER_MARKER) {
>>>>
>>>> L248 is trying to change the _owner field from NULL -> 'Self'.
>>>> L268 is trying to change the _owner field from DEFLATER_MARKER to 
>>>> 'Self'.
>>>>
>>>> If the try_set_owner() call on L248 fails, 'cur' can be several 
>>>> possible
>>>> values:
>>>>
>>>>    - the calling thread (recursive enter is handled on L254-7)
>>>>    - other owning thread value (BasicLock* or Thread*)
>>>>    - DEFLATER_MARKER
>>>
>>> I'll give a caution okay to that explanation (the deficiency being 
>>> in my understanding, not your explaining :) ).
>>
>> Thanks. I'll take it!
>>
>>
>>>
>>>>> Further, I don't see how installing self as the _owner here is 
>>>>> valid and means you acquired the monitor, as the fact it was 
>>>>> DEFLATER_MARKER means it is still being deflated by another thread 
>>>>> doesn't it ???
>>>>
>>>> I guess the comment after L268 didn't work for you:
>>>>
>>>>   269     // The deflation protocol finished the first part 
>>>> (setting owner),
>>>>   270     // but it failed the second part (making ref_count 
>>>> negative) and
>>>>   271     // bailed. Or the ObjectMonitor was async deflated and 
>>>> reused.
>>>>
>>>> It means that the deflater thread was racing with this enter and
>>>> managed to set the owner field to DEFLATER_MARKER as the first step
>>>> in the deflation protocol. Our entering thread actually won the race
>>>> when it managed to set the ref_count to a positive value as part of
>>>> the ObjectMonitorHandle stuff done in the inflate() call that preceded
>>>> the enter() call. However, the deflater thread hasn't realized that it
>>>> lost the race yet and hasn't restored the owner field back to NULL.
>>>
>>> You're right the comment didn't work for me as it required me to be 
>>> holding too much of the protocol in my head. Makes more sense now.
>>
>> Good to hear!
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>
>> Thanks again for the thorough reviews!
>>
>> Dan
>>