RFR(S): 8236035: refactor ObjectMonitor::set_owner() and _owner field setting

Thu Dec 19 17:15:15 UTC 2019

Hi David,

Thanks for the quick review! Sorry for my delay in responding. I've been
working on the Async Monitor Deflation wiki updates...

On 12/18/19 12:35 AM, David Holmes wrote:
> Hi Dan,
>
> On 18/12/2019 7:35 am, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I'm extracting another standalone fix from the Async Monitor Deflation
>> project (JDK-8153224) and sending it out for review (and testing)
>> separately.
>>
>>      JDK-8236035 refactor ObjectMonitor::set_owner() and _owner field 
>> setting
>>      https://bugs.openjdk.java.net/browse/JDK-8236035
>>
>> Here's the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8236035-webrev/0-for-jdk15/
>>
>> Folks that have reviewed JDK-8153224 will recognize these changes as
>> a subset of the _owner field changes from the Async Monitor Deflation
>> project. It's a subset because the Async Monitor Deflation project
>> needs one additional simply_set_owner_from() variant.
>>
>> In this set of changes:
>>
>> - We replace the one use of set_owner() and the direct _owner field
>>    updates with four distinct functions:
>>
>>   235   // Clear _owner field; current value must match old_value.
>>   236   void      release_clear_owner_with_barrier(void* old_value, 
>> bool needs_fence);
>>   237   // Simply set _owner field to new_value; current value must 
>> match old_value.
>>   238   void      simply_set_owner_from(void* old_value, void* 
>> new_value);
>>   239   // Simply set _owner field to self; current value must match 
>> basic_lock_p.
>>   240   void      simply_set_owner_from_BasicLock(void* basic_lock_p, 
>> Thread* self);
>>   241   // Try to set _owner field to new_value if the current value 
>> matches
>>   242   // old_value. Otherwise, does not change the _owner field.
>>   243   void*     try_set_owner_from(void* old_value, void* new_value);
>>
>> - Each function has an assert() to verify the pre-condition and has new
>>    log_trace(monitorinflation, owner) calls for logging ownership 
>> changes.
>> - The gory details about necessary memory syncing or why memory syncing
>>    is not needed is now in one place rather than at each call site.
>>
>> These changes are being tested in a Mach5 Tier[1-3] run that's still
>> running (JDK14 ATR has priority). I'm also running these changes thru
>> a set of 10 SPECjbb2015 runs.
>>
>> Thanks, in advance, for comments, questions or suggestions.
>
> This is obviously functionally fine so the only issues are 
> stylistic/naming ones. Apologies if what I'm about to say contradicts 
> anything I've said previously. :)
>
> 1. I think the simply_ prefix is unnecessary.

I'll remove the "simply_" prefix.

> 2. I think release_clear_owner_with_barrier is too complex/awkward. It 
> would be better IMO to just have release_clear_owner() and leave the 
> correct trailing barrier in the caller.

I made that change in the 8153224 review based on this comment from you
(on 2019.11.06):

> > I think it needs to be something like this:
> >
> >    124 // Clear _owner field; current value must match old_value.
> >    125 inline void ObjectMonitor::clear_owner_from(void* old_value) {
> >    126   void* prev = _owner;
> >    127   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
> >    128                  ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(old_value));
> >    129   OrderAccess::release_store(&_owner, (void*)NULL);
> >    130   OrderAccess::fence();
> >    131   log_trace(monitorinflation, owner)("clear_owner_from(): 
> mid=" INTPTR_FORMAT
> >    132                                      ", prev=" INTPTR_FORMAT, 
> p2i(this), p2i(prev));
> >    133 }
>
> When factored out like this you are forced to use the heaviest memory 
> barrier needed by a given callsite and then apply it to all - ie using 
> the fence() always when it might not be needed.
>
> I'm really not a fan of this kind of factoring out in complex 
> lock-free code as it hides the details of the memory sync operations. 
> You could make this more obvious in the naming e.g.
>
> inline void ObjectMonitor::release_clear_owner_and_fence(void* 
> expected_owner)
>
> or if the fence is not always needed you could parameterise the final 
> memory barrier as well. 

But as you said in this set of review comments:

 > Apologies if what I'm about to say contradicts anything I've said 
previously. :)

For now I'm planning to go your original 2019.11.06 suggestion and
keep the parameterized memory barrier. Let's see what other reviewers
have to say.

> 3. try_ operations are typically boolean operations. In this code one 
> of the try_set_owner_from calls appears to need the returned Thread*, 
> but on closer inspection this seems not to be the case:
>
> 1684       ox = (Thread*)try_set_owner_from(NULL, Self);
> 1685       if (ox == NULL) {
> ... // implied else
> 1707       // The CAS failed ... we can take any of the following 
> actions:
> ...
> 1712       prv = ox;
> 1713       goto Abort;
>
> So we set prv from ox but then goto Abort which never references prv, 
> so AFAICS we don't need to set it, so all we need from 
> try_set_owner_from is true or false.

In the above location I changed this original code:

     old L1683:       ox = (Thread*)Atomic::cmpxchg(&_owner, 
(void*)NULL, Self);

to:

     new L1684:       ox = (Thread*)try_set_owner_from(NULL, Self);

which perfectly illustrates that I was trying for try_set_owner_from()
to be a direct replacement for this kind of Atomic::cmpxchg() call. And
like cmpxchg(), try_set_owner_from() returns current value in the memory
location at the time of the compare. If try_set_owner_from() returns the
compare value (NULL in this case), then you know that the owner field was
changed from NULL -> Self. If try_set_owner_from() returns anything other
than the compare value, then you know that the owner field was not changed.

You are quite right that 'ox' on that specific code path is not needed
beyond the NULL check because that code path will 'goto Abort' where
neither 'ox' or 'prv' are needed. The assignment 'prv = ox' is definitely
not needed (but that's a baseline bug).

However, you missed the fact that the non-boolean return from
try_set_owner_from() is needed in ObjectMonitor::enter():

  L248:   void* cur = try_set_owner_from(NULL, Self);
  L249:   if (cur == NULL) {
  L250:     assert(_recursions == 0, "invariant");
  L251:     return;
  L252:   }
  L253:
  L254:   if (cur == Self) {
  L255:     // TODO-FIXME: check for integer overflow!  BUGID 6557169.
  L256:     _recursions++;
  L257:     return;
  L258:   }
  L259:
  L260:   if (Self->is_lock_owned((address)cur)) {
  L261:     assert(_recursions == 0, "internal state error");
  L262:     _recursions = 1;
  L263:     simply_set_owner_from_BasicLock(cur, Self);  // Convert from 
BasicLock*      to Thread*.
  L264:     return;
  L265:   }

That return value is checked for different possible values on L249, L254,
L260 and passed as a param on L263. A boolean won't do the trick in this
code. That's the only location in the baseline where we need more than
a boolean would give us. However, in the Async Monitor Deflation project
there are more places that use the return from try_set_owner_from() in
more than one way so a boolean won't do the trick.

What I was trying for with the name try_set_owner_from() was to capture
that we are _attempting_ to set the owner field unlike set_owner_from()
where we _always_ set the owner field.

I suppose we could look at different names? Would cmp_set_owner_from()
or cmpxchg_set_owner_from() be better?

> 4. In my own Java objectMonitor work I call 
> simply_set_owner_from_BasicLock "transition_from_basic_lock". It makes 
> it clearer (to me at least) exactly what purpose this serves and 
> avoids the need for comments to that effect.

As mentioned above, I'm going to remove the "simply_" prefix so we have

     set_owner_from_BasicLock  <=> transition_from_basic_lock

I could go with either name. I did use "BasicLock" instead of "basic_lock"
because the type in the system is BasicLock so I wanted the function to
show up in a grep.

Let's see what other reviewers think...

> One remaining concern with this refactoring and the associated logging 
> is the potential impact it may have on performance due to inlining 
> changes and the need to check if logging is enabled. '

I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations) between
my baseline (jdk-14+26) and the various fixes. For this fix (8236035)
relative to my first baseline run:

     criticalJOPS   +1.73%  (Non-significant)
     maxJOPS        +0.26%  (Non-significant)

I reran the baseline a second time using the same bits and for this fix
(8236035) relative to the second run:

     criticalJOPS   +8.98%  (Non-significant)
     maxJOPS       +10.00%  (Non-significant)

When I compare the two baseline runs:

     criticalJOPS   -6.65%  (Non-significant)
     maxJOPS        -8.86%  (Non-significant)

So even with 10 iterations, these SPECjbb2015 runs are all over the
place. And Aurora is agreeing with that by flagging the results as
"Non-significant" because the 'p' values are too large.

I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping the
iterations from 10 to 15. In the recent past, Per told me I might
have to do 20 runs of SPECjbb2015 to shake out the variations.
We'll see what turns up with this latest set of runs...

> The above comments carry no more or less weight than anyone else's so 
> I'll leave it to you and other reviewers as to whether anything 
> actually changes. :)

Thanks! I appreciate the flexibility.

Dan

>
> Thanks,
> David
> -----
>
>> Dan