RFR(S): 8236035: refactor ObjectMonitor::set_owner() and _owner field setting
David Holmes
david.holmes at oracle.com
Tue Jan 28 23:16:01 UTC 2020
On 29/01/2020 3:11 am, Daniel D. Daugherty wrote:
> Hi David and Kim,
>
> David, thanks for chiming in on this comment.
>
> More below...
>
>
> On 1/27/20 10:06 PM, David Holmes wrote:
>> Hi Kim,
>>
>> Picking up on one area of comments ...
>>
>> On 28/01/2020 7:44 am, Kim Barrett wrote:
>>>> On Jan 27, 2020, at 3:06 PM, Daniel D. Daugherty
>>>> <daniel.daugherty at oracle.com> wrote:
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>> 88 // Clear _owner field; current value must match old_value.
>>> 89 // If needs_fence is true, we issue a fence() after the
>>> release_store().
>>> 90 // Otherwise, a storeload() is good enough. See the callers for
>>> more info.
>>> 91 inline void
>>> ObjectMonitor::release_clear_owner_with_barrier(void* old_value,
>>> 92 bool needs_fence) {
>>>
>>> Here's what I was going to say about the original version:
>>>
>>> I would prefer the description of needs_fence to be on the API
>>> declaration in the header. On the other hand, I think that argument
>>> is unnecessary specialization that has no actual effect (other than to
>>> perhaps make the code larger and slower on some platforms). storeload
>>> is equivalent to fence on every platform in HotSpot. (Either
>>> storeload is implemented as a call to fence, or they have the same
>>> definitions in terms of some platform-specific thing.) I think it
>>> should just use release_store_fence (and perhaps adjust the function's
>>> name).
>>
>> The code should use the operation that is semantically required for
>> correctness, irrespective of what the underlying implementation is
>> equivalent to.
>>
>>> The second version has reverted to having the callers specify the
>>> barrier explicitly. It looks to me like the distinction in the two
>>> call sites is just different authors at different times.
>>
>> Actually this is all Dave Dice code from 2005.
>
> David, thanks for digging this up. I know that I dug it up once before
> when you commented on this code in the 8153224 review thread so you
> saved me the spelunking trip...
Ha! Wish I'd saved myself the trip - I'd forgotten all the details from
previous discussions :(
>
>>
>>> For example,
>>> the fence at objectMonitor.cpp:1097 is commented as being for store of
>>> _owner vs load in unpark, so seems like it could be a storeload if one
>>> wanted to go that way.
>>
>> Must admit I find that comment hard to understand.
>>
>> Let's examine the two code fragments. Here's current code for first
>> chunk:
>>
>> 919 Atomic::release_store(&_owner, (void*)NULL); // drop the lock
>> 920 OrderAccess::storeload(); // See if we
>> need to wake a successor
>> 921 if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ !=
>> NULL) {
>> 922 return;
>> 923 }
>>
>> Back in 2005 we had some additional commentary that explained this:
>>
>> // Observe the Dekker/Lamport duality:
>> 2801 // A thread in ::exit() executes:
>> 2802 // ST Owner=null; MEMBAR; LD EntryList|cxq.
>> 2803 // A thread in the contended ::enter() path executes the
>> complementary:
>> 2804 // ST EntryList|cxq = nonnull; MEMBAR; LD Owner.
>>
>> The MEMBAR referenced at L2802 is the storeload we see above in
>> current code. The enter code is covered by this comment (in part) in
>> current code:
>>
>> 527 // Note the Dekker/Lamport duality: ST cxq; MEMBAR; LD Owner.
>> 528 // In this case the ST-MEMBAR is accomplished with CAS().
>>
>> So the storeload is all that is required here.
>
> When David and I discussed this in the 8153224 thread, we concluded that
> the storeload was all that we needed and we chose not to change it to a
> fence(). That discussion motivated the more fleshed out comment here
> in the proposed change:
>
> 915 // Uses a storeload to separate release_store(owner) from the
> 916 // successor check. The try_set_owner() below uses cmpxchg() so
> 917 // we get the fence down there.
> 918 release_clear_owner(Self);
> 919 OrderAccess::storeload();
> 920
> 921 if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ !=
> NULL) {
> 922 return;
> 923 }
After re-reading various versions of the code and comments from
2005-2007 I can't help but feel we've lost some of the big picture here
when it comes to explaining the memory barriers:
- the release_store of _owner is necessary to ensure anyone seeing
_owner as NULL is guaranteed to see the previous "meta-data" updates
- the storeload() is needed as part of the Dekker-duality used when
accessing _owner and _succ
- the fence() is needed for ... well that depends on the exact location
of the fence.
Cheers,
David
-----
>
>> In ExitEpilog we currently have:
>>
>> 1094 // Drop the lock
>> 1095 Atomic::release_store(&_owner, (void*)NULL);
>> 1096 OrderAccess::fence(); // ST
>> _owner vs LD in unpark()
>> 1097
>> 1098 DTRACE_MONITOR_PROBE(contended__exit, this, object(), Self);
>> 1099 Trigger->unpark();
>>
>> Back in 2005 this was:
>>
>> 2670 _owner = NULL ;
>> 2671 OrderAccess::fence() ;
>> 2672
>> <unrelated comments elided>
>> 2682 if (SafepointSynchronize::do_call_back()) {
>> 2683 TEVENT (unpark before SAFEPOINT) ;
>> 2684 }
>> 2685
>> <unrelated comments elided>
>> 2699 Trigger->unpark() ;
>>
>> No explanation for the fence() - though to me this is needed to ensure
>> visibility of the store to _owner, as well as ensuring ordering with
>> the unpark code. The comment was added (again by Dice) in Feb 2007. So
>> the question is:
>> - is the fence() stronger than what we need, or is the comment
>> incomplete?
>> I tend to favor the latter, given the fence has always been there.
>>
>> So I remain in favor of isolating the trailing memory-barrier from the
>> release_store of _owner.
>
> When David and I discussed this in the 8153224 thread, we concluded that
> this location needed the full fence() so that motivated the only slightly
> fleshed out comment here:
>
> 1095 // Uses a fence to separate release_store(owner) from the LD in
> unpark().
> 1096 release_clear_owner(Self);
> 1097 OrderAccess::fence();
>
>
> In summary, I'm planning to leave these two pieces of code as they are
> in this version of the fix.
>
> Kim, please confirm that you are okay with this or not.
>
> David, thanks again for saving me the spelunking work.
>
> Dan
>
>
>>
>> Cheers,
>> David
>> -----
>>
>>
>>> So I still suggest just using release_store_fence, possibly with a
>>> function name adjustment.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>> 102 void* prev = _owner;
>>> 105 _owner = new_value;
>>> 114 void* prev = _owner;
>>> 119 _owner = self;
>>>
>>> Consider using Atomic::load and Atomic::store for these, making the
>>> intent of a relaxed atomic operation explicit. Though that might be
>>> seen as inconsistent with various other places where _owner is being
>>> directly read or written.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>> 90 void* prev = _owner;
>>>
>>> Consider making this DEBUG_ONLY, and changing the reference to prev in
>>> the later log_trace use old_value instead of prev. This would allow
>>> the load of _owner to be eliminated in a release build; that isn't
>>> permitted as written, because it's volatile.
>>>
>>> (The old_value seems to be (nearly?) always either a value we already
>>> have for other reasons, or NULL, so the only additional cost we're
>>> paying for it is register pressure to keep it around until the possible
>>> tracing use.)
>>>
>>> Similarly in the other nearby functions.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/share/runtime/objectMonitor.cpp
>>> In ObjectMonitor::exit.
>>> 864 if (THREAD != _owner) {
>>> 865 void* cur = _owner;
>>>
>>> The value of _owner is being captured in a variable to avoid multiple
>>> reads in the code below. I don't see any reason not to exchange these
>>> two lines and change the compare to use cur instead of _owner.
>>> (_owner being volatile prevents the compiler from automatically
>>> coalescing the loads.)
>>>
>>> Similarly in complete_exit, around lin 1124.
>>> Similarly in check_owner, around line 1175.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list