RFR(S): 8236035: refactor ObjectMonitor::set_owner() and _owner field setting

Tue Jan 21 15:30:29 UTC 2020

Ping! Still looking for a second reviewer on this thread. In particular,
we're looking for opinions on some of the issues that David H. raised
in his review (see below)...

Thanks, in advance, for any additional review(s) here...

Dan

On 12/19/19 6:02 PM, David Holmes wrote:
> Hi Dan,
>
> Couple of follow ups :)
>
> On 20/12/2019 3:15 am, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Thanks for the quick review! Sorry for my delay in responding. I've been
>> working on the Async Monitor Deflation wiki updates...
>>
>>
>> On 12/18/19 12:35 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> On 18/12/2019 7:35 am, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I'm extracting another standalone fix from the Async Monitor Deflation
>>>> project (JDK-8153224) and sending it out for review (and testing)
>>>> separately.
>>>>
>>>>      JDK-8236035 refactor ObjectMonitor::set_owner() and _owner 
>>>> field setting
>>>>      https://bugs.openjdk.java.net/browse/JDK-8236035
>>>>
>>>> Here's the webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8236035-webrev/0-for-jdk15/
>>>>
>>>> Folks that have reviewed JDK-8153224 will recognize these changes as
>>>> a subset of the _owner field changes from the Async Monitor Deflation
>>>> project. It's a subset because the Async Monitor Deflation project
>>>> needs one additional simply_set_owner_from() variant.
>>>>
>>>> In this set of changes:
>>>>
>>>> - We replace the one use of set_owner() and the direct _owner field
>>>>    updates with four distinct functions:
>>>>
>>>>   235   // Clear _owner field; current value must match old_value.
>>>>   236   void      release_clear_owner_with_barrier(void* old_value, 
>>>> bool needs_fence);
>>>>   237   // Simply set _owner field to new_value; current value must 
>>>> match old_value.
>>>>   238   void      simply_set_owner_from(void* old_value, void* 
>>>> new_value);
>>>>   239   // Simply set _owner field to self; current value must 
>>>> match basic_lock_p.
>>>>   240   void      simply_set_owner_from_BasicLock(void* 
>>>> basic_lock_p, Thread* self);
>>>>   241   // Try to set _owner field to new_value if the current 
>>>> value matches
>>>>   242   // old_value. Otherwise, does not change the _owner field.
>>>>   243   void*     try_set_owner_from(void* old_value, void* 
>>>> new_value);
>>>>
>>>> - Each function has an assert() to verify the pre-condition and has 
>>>> new
>>>>    log_trace(monitorinflation, owner) calls for logging ownership 
>>>> changes.
>>>> - The gory details about necessary memory syncing or why memory 
>>>> syncing
>>>>    is not needed is now in one place rather than at each call site.
>>>>
>>>> These changes are being tested in a Mach5 Tier[1-3] run that's still
>>>> running (JDK14 ATR has priority). I'm also running these changes thru
>>>> a set of 10 SPECjbb2015 runs.
>>>>
>>>> Thanks, in advance, for comments, questions or suggestions.
>>>
>>> This is obviously functionally fine so the only issues are 
>>> stylistic/naming ones. Apologies if what I'm about to say 
>>> contradicts anything I've said previously. :)
>>>
>>> 1. I think the simply_ prefix is unnecessary.
>>
>> I'll remove the "simply_" prefix.
>>
>>
>>> 2. I think release_clear_owner_with_barrier is too complex/awkward. 
>>> It would be better IMO to just have release_clear_owner() and leave 
>>> the correct trailing barrier in the caller.
>>
>> I made that change in the 8153224 review based on this comment from you
>> (on 2019.11.06):
>>
>>> > I think it needs to be something like this:
>>> >
>>> >    124 // Clear _owner field; current value must match old_value.
>>> >    125 inline void ObjectMonitor::clear_owner_from(void* old_value) {
>>> >    126   void* prev = _owner;
>>> >    127   ADIM_guarantee(prev == old_value, "unexpected prev 
>>> owner=" INTPTR_FORMAT
>>> >    128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
>>> p2i(old_value));
>>> >    129   OrderAccess::release_store(&_owner, (void*)NULL);
>>> >    130   OrderAccess::fence();
>>> >    131   log_trace(monitorinflation, owner)("clear_owner_from(): 
>>> mid=" INTPTR_FORMAT
>>> >    132                                      ", prev=" 
>>> INTPTR_FORMAT, p2i(this), p2i(prev));
>>> >    133 }
>>>
>>> When factored out like this you are forced to use the heaviest 
>>> memory barrier needed by a given callsite and then apply it to all - 
>>> ie using the fence() always when it might not be needed.
>>>
>>> I'm really not a fan of this kind of factoring out in complex 
>>> lock-free code as it hides the details of the memory sync 
>>> operations. You could make this more obvious in the naming e.g.
>>>
>>> inline void ObjectMonitor::release_clear_owner_and_fence(void* 
>>> expected_owner)
>>>
>>> or if the fence is not always needed you could parameterise the 
>>> final memory barrier as well. 
>>
>> But as you said in this set of review comments:
>>
>>  > Apologies if what I'm about to say contradicts anything I've said 
>> previously. :)
>>
>> For now I'm planning to go your original 2019.11.06 suggestion and
>> keep the parameterized memory barrier. Let's see what other reviewers
>> have to say.
>
> Now that I see the barrier type parameterised I don't like it :)
>
>>
>>> 3. try_ operations are typically boolean operations. In this code 
>>> one of the try_set_owner_from calls appears to need the returned 
>>> Thread*, but on closer inspection this seems not to be the case:
>>>
>>> 1684       ox = (Thread*)try_set_owner_from(NULL, Self);
>>> 1685       if (ox == NULL) {
>>> ... // implied else
>>> 1707       // The CAS failed ... we can take any of the following 
>>> actions:
>>> ...
>>> 1712       prv = ox;
>>> 1713       goto Abort;
>>>
>>> So we set prv from ox but then goto Abort which never references 
>>> prv, so AFAICS we don't need to set it, so all we need from 
>>> try_set_owner_from is true or false.
>>
>> In the above location I changed this original code:
>>
>>      old L1683:       ox = (Thread*)Atomic::cmpxchg(&_owner, 
>> (void*)NULL, Self);
>>
>> to:
>>
>>      new L1684:       ox = (Thread*)try_set_owner_from(NULL, Self);
>>
>> which perfectly illustrates that I was trying for try_set_owner_from()
>> to be a direct replacement for this kind of Atomic::cmpxchg() call. And
>> like cmpxchg(), try_set_owner_from() returns current value in the memory
>> location at the time of the compare. If try_set_owner_from() returns the
>> compare value (NULL in this case), then you know that the owner field 
>> was
>> changed from NULL -> Self. If try_set_owner_from() returns anything 
>> other
>> than the compare value, then you know that the owner field was not 
>> changed.
>>
>> You are quite right that 'ox' on that specific code path is not needed
>> beyond the NULL check because that code path will 'goto Abort' where
>> neither 'ox' or 'prv' are needed. The assignment 'prv = ox' is 
>> definitely
>> not needed (but that's a baseline bug).
>>
>> However, you missed the fact that the non-boolean return from
>> try_set_owner_from() is needed in ObjectMonitor::enter():
>
> Yes I missed that - sorry.
>
>>   L248:   void* cur = try_set_owner_from(NULL, Self);
>>   L249:   if (cur == NULL) {
>>   L250:     assert(_recursions == 0, "invariant");
>>   L251:     return;
>>   L252:   }
>>   L253:
>>   L254:   if (cur == Self) {
>>   L255:     // TODO-FIXME: check for integer overflow!  BUGID 6557169.
>>   L256:     _recursions++;
>>   L257:     return;
>>   L258:   }
>>   L259:
>>   L260:   if (Self->is_lock_owned((address)cur)) {
>>   L261:     assert(_recursions == 0, "internal state error");
>>   L262:     _recursions = 1;
>>   L263:     simply_set_owner_from_BasicLock(cur, Self);  // Convert 
>> from BasicLock*      to Thread*.
>>   L264:     return;
>>   L265:   }
>>
>> That return value is checked for different possible values on L249, 
>> L254,
>> L260 and passed as a param on L263. A boolean won't do the trick in this
>> code. That's the only location in the baseline where we need more than
>> a boolean would give us. However, in the Async Monitor Deflation project
>> there are more places that use the return from try_set_owner_from() in
>> more than one way so a boolean won't do the trick.
>>
>> What I was trying for with the name try_set_owner_from() was to capture
>> that we are _attempting_ to set the owner field unlike set_owner_from()
>> where we _always_ set the owner field.
>>
>> I suppose we could look at different names? Would cmp_set_owner_from()
>> or cmpxchg_set_owner_from() be better?
>
> The try_ is not terrible or wrong, it's just more common for it to be 
> a boolean function. No big deal. I can't think of a better name.
>
>>
>>> 4. In my own Java objectMonitor work I call 
>>> simply_set_owner_from_BasicLock "transition_from_basic_lock". It 
>>> makes it clearer (to me at least) exactly what purpose this serves 
>>> and avoids the need for comments to that effect.
>>
>> As mentioned above, I'm going to remove the "simply_" prefix so we have
>>
>>      set_owner_from_BasicLock  <=> transition_from_basic_lock
>>
>> I could go with either name. I did use "BasicLock" instead of 
>> "basic_lock"
>> because the type in the system is BasicLock so I wanted the function to
>> show up in a grep.
>>
>> Let's see what other reviewers think...
>>
>>
>>> One remaining concern with this refactoring and the associated 
>>> logging is the potential impact it may have on performance due to 
>>> inlining changes and the need to check if logging is enabled. '
>>
>> I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations) 
>> between
>> my baseline (jdk-14+26) and the various fixes. For this fix (8236035)
>> relative to my first baseline run:
>>
>>      criticalJOPS   +1.73%  (Non-significant)
>>      maxJOPS        +0.26%  (Non-significant)
>>
>> I reran the baseline a second time using the same bits and for this fix
>> (8236035) relative to the second run:
>>
>>      criticalJOPS   +8.98%  (Non-significant)
>>      maxJOPS       +10.00%  (Non-significant)
>>
>> When I compare the two baseline runs:
>>
>>      criticalJOPS   -6.65%  (Non-significant)
>>      maxJOPS        -8.86%  (Non-significant)
>>
>> So even with 10 iterations, these SPECjbb2015 runs are all over the
>> place. And Aurora is agreeing with that by flagging the results as
>> "Non-significant" because the 'p' values are too large.
>>
>> I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping the
>> iterations from 10 to 15. In the recent past, Per told me I might
>> have to do 20 runs of SPECjbb2015 to shake out the variations.
>> We'll see what turns up with this latest set of runs...
>
> Yes those results seem somewhat all over the place. Maybe see if 
> Claes' startup benchmark runs are impacted at all.
>
>>
>>> The above comments carry no more or less weight than anyone else's 
>>> so I'll leave it to you and other reviewers as to whether anything 
>>> actually changes. :)
>>
>> Thanks! I appreciate the flexibility.
>
> Christmas spirit :D
>
> I'll be back to my intransigent self in the New Year ;-)
>
> Cheers,
> David
>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Dan
>>
>