RFR(L) 8153224 Monitor deflation prolong safepoints (CR7/v2.07/10-for-jdk14)
erik.osterlund at oracle.com
erik.osterlund at oracle.com
Mon Nov 4 13:09:48 UTC 2019
Hi,
TL/DR: David is right; the commentary is weird and does not capture what
the real constraints are.
As the comment implied before "8222034: Thread-SMR functions should be
updated to remove work around", the PPC port used to have incorrect
memory ordering, and the code guarded against that. inc/dec used to be
memory_order_relaxed and add/sub used to be memory_order_acq_rel on PPC,
despite the shared contract promising memory_order_conservative.
The implication for the nested counter in the Thread SMR project was
that I wanted to use the inc/dec API but knew it was not gonna work as
expected on PPC because we really needed *at least* memory_order_acq_rel
when decrementing (and memory_order_conservative when incrementing,
which was simulated in a CAS loop... yuck), but would find ourselves
getting memory_order_relaxed. Rather than treating it as a bug in the
PPC atomics implementation, and having the code be broken while we
waited for a fix, I changed the use to sub when decrementing (which gave
me the required memory_order_acq_rel ordering I needed), and the
horrible CAS loop when incrementing, as a workaround, and alerted Martin
Doerr that this would needed to be sorted out in the PPC code. Since
then, the PPC code did indeed get cleaned up so that inc/dec stopped
being relaxed-only and worked as advertised.
After that, the "8222034: Thread-SMR functions should be updated to
remove work around" change removed the workaround that was no longer
required from the code, and put back the desired inc/dec calls (which
now used an overly conservative memory_order_conservative ordering,
which is suboptimal, in particular for decrements, but importantly not
incorrect). Since the nested case would almost never run and is possibly
the coldest code path in the VM, I did not care to comment in that
review thread about optimizing it by explicitly passing in a weaker
ordering. However, I should have commented on the comment that was
changed, which does indeed look a bit confused. David is right that the
stuff about volatile has nothing to do with why this is correct. The
correctness required memory_order_acq_rel for decrements, but the
implementation provided more, which is fine.
The actual reason why I wanted memory_order_conservative for correctness
when incrementing and memory_order_acq_rel when decrementing, was to
prevent accesses inside of the critical section (in particular - reading
Thread*s from the acquired ThreadsList), from floating outside of the
reference increment and decrement that marks reading the list as safe to
access without the underlying list blowing up. In practice, it might
have been possible to relax it a bit by relying on side effects of other
unrelated parts of the protocol to have spurious fencing... but I did
not want to get the protocol tangled in that way because it would be
difficult to reason about.
Hope this explanation clears up that confusion.
Thanks,
/Erik
On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
> Erik,
>
> David H. made a comment during this review cycle that should interest
> you.
>
> The longer version of this comment came up in early reviews of the Async
> Monitor Deflation code because I copied the code and the longer comment
> from threadSMR.cpp. I updated the comment based on your input and review
> and changed the comment and code in threadSMR.cpp and in the Async
> Monitor
> Deflation project code.
>
> The change in threadSMR.cpp was done with this changeset:
>
> $ hg log -v -r 54517
> changeset: 54517:c201ca660afd
> user: dcubed
> date: Thu Apr 11 14:14:30 2019 -0400
> files: src/hotspot/share/runtime/threadSMR.cpp
> description:
> 8222034: Thread-SMR functions should be updated to remove work around
> Reviewed-by: mdoerr, eosterlund
>
> Here's one of the two diffs to job your memory:
>
> void ThreadsList::dec_nested_handle_cnt() {
> - // The decrement needs to be MO_ACQ_REL. At the moment, the
> Atomic::dec
> - // backend on PPC does not yet conform to these requirements.
> Therefore
> - // the decrement is simulated with an Atomic::sub(1, &addr).
> - // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR
> mechanism
> - // is not generally safe to use.
> - Atomic::sub(1, &_nested_handle_cnt);
> + // The decrement only needs to be MO_ACQ_REL since the reference
> + // counter is volatile (and the hazard ptr is already NULL).
> + Atomic::dec(&_nested_handle_cnt);
> }
>
> Below is David's comment about the code comment...
>
> Dan
>
>
> Trimming down to just that issue...
>
> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>> On 10/24/19 7:00 AM, David Holmes wrote:
> >
> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>
>>> 199 // The decrement only needs to be MO_ACQ_REL since the reference
>>> 200 // counter is volatile.
>>> 201 Atomic::dec(&_ref_count);
>>>
>>> volatile is irrelevant with regards to memory ordering as it is a
>>> compiler annotation. And you haven't specified any memory order
>>> value so the default is conservative ie. implied full fence. (I see
>>> the same incorrect comment is in threadSMR.cpp!)
>>
>> I got that wording from threadSMR.cpp and Erik O. confirmed my use of
>> that
>> wording previously. I'll chase it down with Erik and get back to you.
>>
>>
>>> 208 // The increment needs to be MO_SEQ_CST so that the reference
>>> 209 // counter update is seen as soon as possible in a race with the
>>> 210 // async deflation protocol.
>>> 211 Atomic::inc(&_ref_count);
>>>
>>> Ditto you haven't specified any ordering - and inc() and dec() will
>>> have the same default.
>>
>> And again, I'll have to chase this down with Erik O. and get back to
>> you.
>
More information about the hotspot-runtime-dev
mailing list