RFR(S): 8236035: refactor ObjectMonitor::set_owner() and _owner field setting

Wed Jan 29 00:00:35 UTC 2020

On 1/28/20 6:16 PM, David Holmes wrote:
> On 29/01/2020 3:11 am, Daniel D. Daugherty wrote:
>> Hi David and Kim,
>>
>> David, thanks for chiming in on this comment.
>>
>> More below...
>>
>>
>> On 1/27/20 10:06 PM, David Holmes wrote:
>>> Hi Kim,
>>>
>>> Picking up on one area of comments ...
>>>
>>> On 28/01/2020 7:44 am, Kim Barrett wrote:
>>>>> On Jan 27, 2020, at 3:06 PM, Daniel D. Daugherty 
>>>>> <daniel.daugherty at oracle.com> wrote:
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>    88 // Clear _owner field; current value must match old_value.
>>>>    89 // If needs_fence is true, we issue a fence() after the 
>>>> release_store().
>>>>    90 // Otherwise, a storeload() is good enough. See the callers 
>>>> for more info.
>>>>    91 inline void 
>>>> ObjectMonitor::release_clear_owner_with_barrier(void* old_value,
>>>> 92 bool needs_fence) {
>>>>
>>>> Here's what I was going to say about the original version:
>>>>
>>>> I would prefer the description of needs_fence to be on the API
>>>> declaration in the header.  On the other hand, I think that argument
>>>> is unnecessary specialization that has no actual effect (other than to
>>>> perhaps make the code larger and slower on some platforms). storeload
>>>> is equivalent to fence on every platform in HotSpot. (Either
>>>> storeload is implemented as a call to fence, or they have the same
>>>> definitions in terms of some platform-specific thing.)  I think it
>>>> should just use release_store_fence (and perhaps adjust the function's
>>>> name).
>>>
>>> The code should use the operation that is semantically required for 
>>> correctness, irrespective of what the underlying implementation is 
>>> equivalent to.
>>>
>>>> The second version has reverted to having the callers specify the
>>>> barrier explicitly.  It looks to me like the distinction in the two
>>>> call sites is just different authors at different times.
>>>
>>> Actually this is all Dave Dice code from 2005.
>>
>> David, thanks for digging this up. I know that I dug it up once before
>> when you commented on this code in the 8153224 review thread so you
>> saved me the spelunking trip...
>
> Ha! Wish I'd saved myself the trip - I'd forgotten all the details 
> from previous discussions :(
>
>>
>>>
>>>>  For example,
>>>> the fence at objectMonitor.cpp:1097 is commented as being for store of
>>>> _owner vs load in unpark, so seems like it could be a storeload if one
>>>> wanted to go that way.
>>>
>>> Must admit I find that comment hard to understand.
>>>
>>> Let's examine the two code fragments. Here's current code for first 
>>> chunk:
>>>
>>>  919     Atomic::release_store(&_owner, (void*)NULL);   // drop the 
>>> lock
>>>  920     OrderAccess::storeload();                      // See if we 
>>> need to wake a successor
>>>  921     if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != 
>>> NULL) {
>>>  922       return;
>>>  923     }
>>>
>>> Back in 2005 we had some additional commentary that explained this:
>>>
>>>  // Observe the Dekker/Lamport duality:
>>> 2801       // A thread in ::exit() executes:
>>> 2802       //   ST Owner=null; MEMBAR; LD EntryList|cxq.
>>> 2803       // A thread in the contended ::enter() path executes the 
>>> complementary:
>>> 2804       //   ST EntryList|cxq = nonnull; MEMBAR; LD Owner.
>>>
>>> The MEMBAR referenced at L2802 is the storeload we see above in 
>>> current code. The enter code is covered by this comment (in part) in 
>>> current code:
>>>
>>>  527   // Note the Dekker/Lamport duality: ST cxq; MEMBAR; LD Owner.
>>>  528   // In this case the ST-MEMBAR is accomplished with CAS().
>>>
>>> So the storeload is all that is required here.
>>
>> When David and I discussed this in the 8153224 thread, we concluded that
>> the storeload was all that we needed and we chose not to change it to a
>> fence(). That discussion motivated the more fleshed out comment here
>> in the proposed change:
>>
>>   915     // Uses a storeload to separate release_store(owner) from the
>>   916     // successor check. The try_set_owner() below uses 
>> cmpxchg() so
>>   917     // we get the fence down there.
>>   918     release_clear_owner(Self);
>>   919     OrderAccess::storeload();
>>   920
>>   921     if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != 
>> NULL) {
>>   922       return;
>>   923     }
>
> After re-reading various versions of the code and comments from 
> 2005-2007 I can't help but feel we've lost some of the big picture 
> here when it comes to explaining the memory barriers:
> - the release_store of _owner is necessary to ensure anyone seeing 
> _owner as NULL is guaranteed to see the previous "meta-data" updates
> - the storeload() is needed as part of the Dekker-duality used when 
> accessing _owner and _succ
> - the fence() is needed for ... well that depends on the exact 
> location of the fence.

Ummm... it's not clear here whether you want me to change the comments
or not. I'm kind of hoping that you're happy with the latest version
(CR2) that I sent out for review this afternoon (my TZ)...

Dan

>
> Cheers,
> David
> -----
>
>>
>>> In ExitEpilog we currently have:
>>>
>>> 1094   // Drop the lock
>>> 1095   Atomic::release_store(&_owner, (void*)NULL);
>>> 1096   OrderAccess::fence();                               // ST 
>>> _owner vs LD in unpark()
>>> 1097
>>> 1098   DTRACE_MONITOR_PROBE(contended__exit, this, object(), Self);
>>> 1099   Trigger->unpark();
>>>
>>> Back in 2005 this was:
>>>
>>> 2670    _owner = NULL ;
>>> 2671    OrderAccess::fence() ;
>>> 2672
>>> <unrelated comments elided>
>>> 2682    if (SafepointSynchronize::do_call_back()) {
>>> 2683       TEVENT (unpark before SAFEPOINT) ;
>>> 2684    }
>>> 2685
>>> <unrelated comments elided>
>>> 2699    Trigger->unpark() ;
>>>
>>> No explanation for the fence() - though to me this is needed to 
>>> ensure visibility of the store to _owner, as well as ensuring 
>>> ordering with the unpark code. The comment was added (again by Dice) 
>>> in Feb 2007. So the question is:
>>> - is the fence() stronger than what we need, or is the comment 
>>> incomplete?
>>> I tend to favor the latter, given the fence has always been there.
>>>
>>> So I remain in favor of isolating the trailing memory-barrier from 
>>> the release_store of _owner.
>>
>> When David and I discussed this in the 8153224 thread, we concluded that
>> this location needed the full fence() so that motivated the only 
>> slightly
>> fleshed out comment here:
>>
>> 1095   // Uses a fence to separate release_store(owner) from the LD 
>> in unpark().
>> 1096   release_clear_owner(Self);
>> 1097   OrderAccess::fence();
>>
>>
>> In summary, I'm planning to leave these two pieces of code as they are
>> in this version of the fix.
>>
>> Kim, please confirm that you are okay with this or not.
>>
>> David, thanks again for saving me the spelunking work.
>>
>> Dan
>>
>>
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>
>>>> So I still suggest just using release_store_fence, possibly with a
>>>> function name adjustment.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>   102   void* prev = _owner;
>>>>   105   _owner = new_value;
>>>>   114   void* prev = _owner;
>>>>   119   _owner = self;
>>>>
>>>> Consider using Atomic::load and Atomic::store for these, making the
>>>> intent of a relaxed atomic operation explicit.  Though that might be
>>>> seen as inconsistent with various other places where _owner is being
>>>> directly read or written.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>    90   void* prev = _owner;
>>>>
>>>> Consider making this DEBUG_ONLY, and changing the reference to prev in
>>>> the later log_trace use old_value instead of prev.  This would allow
>>>> the load of _owner to be eliminated in a release build; that isn't
>>>> permitted as written, because it's volatile.
>>>>
>>>> (The old_value seems to be (nearly?) always either a value we already
>>>> have for other reasons, or NULL, so the only additional cost we're
>>>> paying for it is register pressure to keep it around until the 
>>>> possible
>>>> tracing use.)
>>>>
>>>> Similarly in the other nearby functions.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>> In ObjectMonitor::exit.
>>>>   864   if (THREAD != _owner) {
>>>>   865     void* cur = _owner;
>>>>
>>>> The value of _owner is being captured in a variable to avoid multiple
>>>> reads in the code below.  I don't see any reason not to exchange these
>>>> two lines and change the compare to use cur instead of _owner.
>>>> (_owner being volatile prevents the compiler from automatically
>>>> coalescing the loads.)
>>>>
>>>> Similarly in complete_exit, around lin 1124.
>>>> Similarly in check_owner, around line 1175.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>>
>>>>
>>