RFR(L) 8153224 Monitor deflation prolong safepoints (CR7/v2.07/10-for-jdk14)

Mon Nov 4 13:09:48 UTC 2019

Hi,

TL/DR: David is right; the commentary is weird and does not capture what 
the real constraints are.

As the comment implied before "8222034: Thread-SMR functions should be 
updated to remove work around", the PPC port used to have incorrect 
memory ordering, and the code guarded against that. inc/dec used to be 
memory_order_relaxed and add/sub used to be memory_order_acq_rel on PPC, 
despite the shared contract promising memory_order_conservative.

The implication for the nested counter in the Thread SMR project was 
that I wanted to use the inc/dec API but knew it was not gonna work as 
expected on PPC because we really needed *at least* memory_order_acq_rel 
when decrementing (and memory_order_conservative when incrementing, 
which was simulated in a CAS loop... yuck), but would find ourselves 
getting memory_order_relaxed. Rather than treating it as a bug in the 
PPC atomics implementation, and having the code be broken while we 
waited for a fix, I changed the use to sub when decrementing (which gave 
me the required memory_order_acq_rel ordering I needed), and the 
horrible CAS loop when incrementing, as a workaround, and alerted Martin 
Doerr that this would needed to be sorted out in the PPC code. Since 
then, the PPC code did indeed get cleaned up so that inc/dec stopped 
being relaxed-only and worked as advertised.

After that, the "8222034: Thread-SMR functions should be updated to 
remove work around" change removed the workaround that was no longer 
required from the code, and put back the desired inc/dec calls (which 
now used an overly conservative memory_order_conservative ordering, 
which is suboptimal, in particular for decrements, but importantly not 
incorrect). Since the nested case would almost never run and is possibly 
the coldest code path in the VM, I did not care to comment in that 
review thread about optimizing it by explicitly passing in a weaker 
ordering. However, I should have commented on the comment that was 
changed, which does indeed look a bit confused. David is right that the 
stuff about volatile has nothing to do with why this is correct. The 
correctness required memory_order_acq_rel for decrements, but the 
implementation provided more, which is fine.

The actual reason why I wanted memory_order_conservative for correctness 
when incrementing and memory_order_acq_rel when decrementing, was to 
prevent accesses inside of the critical section (in particular - reading 
Thread*s from the acquired ThreadsList), from floating outside of the 
reference increment and decrement that marks reading the list as safe to 
access without the underlying list blowing up. In practice, it might 
have been possible to relax it a bit by relying on side effects of other 
unrelated parts of the protocol to have spurious fencing... but I did 
not want to get the protocol tangled in that way because it would be 
difficult to reason about.

Hope this explanation clears up that confusion.

Thanks,
/Erik

On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
> Erik,
>
> David H. made a comment during this review cycle that should interest 
> you.
>
> The longer version of this comment came up in early reviews of the Async
> Monitor Deflation code because I copied the code and the longer comment
> from threadSMR.cpp. I updated the comment based on your input and review
> and changed the comment and code in threadSMR.cpp and in the Async 
> Monitor
> Deflation project code.
>
> The change in threadSMR.cpp was done with this changeset:
>
> $ hg log -v -r 54517
> changeset:   54517:c201ca660afd
> user:        dcubed
> date:        Thu Apr 11 14:14:30 2019 -0400
> files:       src/hotspot/share/runtime/threadSMR.cpp
> description:
> 8222034: Thread-SMR functions should be updated to remove work around
> Reviewed-by: mdoerr, eosterlund
>
> Here's one of the two diffs to job your memory:
>
>  void ThreadsList::dec_nested_handle_cnt() {
> -  // The decrement needs to be MO_ACQ_REL. At the moment, the 
> Atomic::dec
> -  // backend on PPC does not yet conform to these requirements. 
> Therefore
> -  // the decrement is simulated with an Atomic::sub(1, &addr).
> -  // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR 
> mechanism
> -  // is not generally safe to use.
> -  Atomic::sub(1, &_nested_handle_cnt);
> +  // The decrement only needs to be MO_ACQ_REL since the reference
> +  // counter is volatile (and the hazard ptr is already NULL).
> +  Atomic::dec(&_nested_handle_cnt);
>  }
>
> Below is David's comment about the code comment...
>
> Dan
>
>
> Trimming down to just that issue...
>
> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>> On 10/24/19 7:00 AM, David Holmes wrote:
> >
> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>
>>>  199 // The decrement only needs to be MO_ACQ_REL since the reference
>>>  200   // counter is volatile.
>>>  201   Atomic::dec(&_ref_count);
>>>
>>> volatile is irrelevant with regards to memory ordering as it is a 
>>> compiler annotation. And you haven't specified any memory order 
>>> value so the default is conservative ie. implied full fence. (I see 
>>> the same incorrect comment is in threadSMR.cpp!)
>>
>> I got that wording from threadSMR.cpp and Erik O. confirmed my use of 
>> that
>> wording previously. I'll chase it down with Erik and get back to you.
>>
>>
>>> 208   // The increment needs to be MO_SEQ_CST so that the reference
>>>  209   // counter update is seen as soon as possible in a race with the
>>>  210   // async deflation protocol.
>>>  211   Atomic::inc(&_ref_count);
>>>
>>> Ditto you haven't specified any ordering - and inc() and dec() will 
>>> have the same default.
>>
>> And again, I'll have to chase this down with Erik O. and get back to 
>> you.
>