RFR(L) 8153224 Monitor deflation prolong safepoints (CR7/v2.07/10-for-jdk14)

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Oct 29 20:20:18 UTC 2019


Hi David,

Sorry for the delay in replying to this email. I spent most of last week
migrating my development environment from my dying MBP13 to a new MBP13.
I think I have everything recovered, but only time will tell...

More below...


On 10/24/19 7:00 AM, David Holmes wrote:
> Hi Dan,
>
> Final part of review - sorry for the delay.
>
> src/hotspot/share/runtime/objectMonitor.hpp
>
>  387   // For internal used by ObjectSynchronizer::monitors_iterate().
>
> s/used/use/

Will fix that.


> ---
>
> src/hotspot/share/runtime/objectMonitor.inline.hpp
>
>  65   return OrderAccess::load_acquire(&_owner) == DEFLATER_MARKER;
>
> I query the need for load_acquire here.

With all of the _owner field setting logic now encapsulated in the
set_owner_from(), set_owner_from_BasicLock() and try_set_owner_from()
functions, it is easy to see that all except for set_owner_from_BasicLock()
use cmpxchg() for the update. And with set_owner_from_BasicLock(), we
already know that _owner != DEFLATER_MARKER.

I agree and will change that line to:

     65   return _owner == DEFLATER_MARKER;

Update: Now I'm questioning making this change because of the possible
changes to set_owner_from() below... Sigh...


>
>   71   assert(ref_count() == 0, "must be 0: ref_count=%d", ref_count());
>   94     guarantee(ref_count() <= 0, "must be <= 0: ref_count=%d", 
> ref_count());
>
> You don't want to read ref_count() twice else you may fail the check 
> but report the expected value of 0.

Agreed. I'll fix those to use a local so we report what we tested. I did
that in another location that also reported the current value in the same
error message. That helped with debugging some of the races.


>  122 // Set _owner field to new_value; current value must match 
> old_value.
>  123 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
> old_value) {
>  124   void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>  125   ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
>
> The use of cmpxchg seems a little strange here if you are asserting 
> that when this is called _owner must equal old_value. That means you 
> don't expect any race and if there is no race with another thread 
> writing to _owner then you don't need the cmpxchg. A normal:
>
> if (_owner == old_value) {
>    Atomic::store(&_owner, new_value);
>    log(...);
> } else {
>    guarantee(false, " unexpected old owner ...");
> }

The two parameter version of set_owner_from() is only called from three
places and we'll cover two of them here:

src/hotspot/share/runtime/objectMonitor.cpp:

1041     if (AsyncDeflateIdleMonitors) {
1042       set_owner_from(NULL, Self);
1043     } else {
1044       OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
lock
1045       OrderAccess::storeload();                        // See if we 
need to wake a successor
1046     }

and:

1221   if (AsyncDeflateIdleMonitors) {
1222     set_owner_from(NULL, Self);
1223   } else {
1224     OrderAccess::release_store(&_owner, (void*)NULL);
1225     OrderAccess::fence();                               // ST 
_owner vs LD in unpark()
1226   }

So I've replaced the existing {release_store(), storeload()} combo for one
call site and the existing {release_store(), fence()} combo for the other
call site with a cmpxchg(). I chose cmpxchg() for these reasons:

1) I wanted the same memory sync behavior at both call sites.
2) I wanted similar/same memory sync behavior as the original
    code at those call sites.
3) I wanted the return value from cmpxchg() for my state machine
    sanity check.

I don't think that using 'Atomic::store(&_owner, new_value)' is the
right choice for these two call sites.

The last two parameter set_owner_from() is talked about in the
next reply.


> Similarly for the old_value1/old_valuie2 version.

The three parameter version of set_owner_from() is only called from one
place and the last two parameter version is called from the same place:

src/hotspot/share/runtime/synchronizer.cpp:

1903       if (AsyncDeflateIdleMonitors) {
1904         m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
1905       } else {
1906         m->set_owner_from(mark.locker(), NULL);
1907       }

The original code was:

1399       m->set_owner(mark.locker());

The original set_owner() code was defined like this:

   87 inline void ObjectMonitor::set_owner(void* owner) {
   88   _owner = owner;
   89 }

So the original code didn't do any memory sync'ing at all and I've
changed that to a cmpxchg() on both code paths. That appears to be
overkill for that callsite...

We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
section of the code. We've gotten our ObjectMonitor from om_alloc()
and are initializing a number of fields in the ObjectMonitor. The
ObjectMonitor is not published until we do:

1916       object->release_set_mark(markWord::encode(m));

So we don't need the memory sync'ing features of the cmpxchg() for
either of the set_owner_from() calls and all that leaves is the
state machine sanity check.

I really like the state machine sanity check on the owner field but
that's just because it came in handy when chasing the recent races.
It would be easy to change the three parameter version of
set_owner_from() to not do memory sync'ing, but still do the state
machine sanity check.

Update: Changing the three parameter version of set_owner_from()
may impact the changes to owner_is_DEFLATER_MARKER() discussed
above. Sigh...
Update 2: Probably no impact because the three parameter version of
set_owner_from() is only used before the ObjectMonitor is published
and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
appeared on an in-use list.

However, the two parameter version of set_owner_from() needs its
memory sync'ing behavior for it's objectMonitor.cpp call sites so
this call site would need something different.

I'm not sure which solution I'm going to pick yet, but I definitely
have to change something here since we don't need cmpxchg() at this
call site. More thought is required.


> This function does not aid clarity in my view, given that it replaces 
> a simple assignment:
>
> // Set _owner field to self; current value must match basic_lock_p.
> inline void ObjectMonitor::set_owner_from_BasicLock(Thread* self, 
> void* basic_lock_p) {
>   assert(self->is_lock_owned((address)basic_lock_p), "self=" 
> INTPTR_FORMAT
>          " must own basic_lock_p=" INTPTR_FORMAT, p2i(self), 
> p2i(basic_lock_p));
>   void* prev = _owner;
>   ADIM_guarantee(prev == basic_lock_p, "unexpected prev owner=" 
> INTPTR_FORMAT
>                  ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(basic_lock_p));
>   // Non-null owner field to non-null owner field is safe without
>   // cmpxchg() as long as all readers can tolerate either flavor.
>   _owner = self;
>   log_trace(monitorinflation, owner)("mid=" INTPTR_FORMAT ", prev="
>                                      INTPTR_FORMAT ", new=" 
> INTPTR_FORMAT,
>                                      p2i(this), p2i(prev), p2i(self));
> }
>
> the places where this is used have already read _owner and passed it 
> in as basic_lock_p; and they have already checked 
> self->is_locked_owned. The only useful thing here is potentially the 
> log statement but even that is lacking as noone will be able to 
> readily tell that this was a basic_lock conversion. The log statement 
> here (and elsewhere) should give an indication of the nature of the 
> update to owner. But I'd rather see this method go and just add a 
> context specific log statement at each of the places where we morph 
> the basic_lock to the thread.

So encapsulate some of the sites that set the _owner field, but not all?
Once I decided to go down the encapsulation road, I figured I should
encapsulate them all for consistency. I guess not. I'm thinking that
Coleen may not be happy with this inconsistency in encapsulation.

So the other thing that will be lost is the explicit state machine sanity
check:

   void* prev = _owner;
   ADIM_guarantee(prev == basic_lock_p, "unexpected prev owner=" 
INTPTR_FORMAT
                  ", expected=" INTPTR_FORMAT, p2i(prev), 
p2i(basic_lock_p));

This check will detect if the _owner field unexpected changes from the 
BasicLock*
address to something else due to a race. The particular race I was 
interested in
was whether more one thread could be trying to morph the BasicLock* into the
JavaThread* at the same time. I have a vague memory of this being 
possible due
to biased locking/deoptimization work, but I could be wrong. I never saw 
this
guarantee() fire during any of my testing, stress or otherwise. And my 
memory
of one thread morphing another thread's BasicLock* into a JavaThread* is a
very old one; things could easily have changed...

I will look at undoing the encapsulation with 
set_owner_from_BasicLock(). I'll
also look at updating the log messages that I added in this round to be more
helpful.


>  199 // The decrement only needs to be MO_ACQ_REL since the reference
>  200   // counter is volatile.
>  201   Atomic::dec(&_ref_count);
>
> volatile is irrelevant with regards to memory ordering as it is a 
> compiler annotation. And you haven't specified any memory order value 
> so the default is conservative ie. implied full fence. (I see the same 
> incorrect comment is in threadSMR.cpp!)

I got that wording from threadSMR.cpp and Erik O. confirmed my use of that
wording previously. I'll chase it down with Erik and get back to you.


> 208   // The increment needs to be MO_SEQ_CST so that the reference
>  209   // counter update is seen as soon as possible in a race with the
>  210   // async deflation protocol.
>  211   Atomic::inc(&_ref_count);
>
> Ditto you haven't specified any ordering - and inc() and dec() will 
> have the same default.

And again, I'll have to chase this down with Erik O. and get back to you.


>
>  218   return OrderAccess::load_acquire(&_ref_count);
>
> Again I query the use of load_acquire here.

For more context:

  217 inline jint ObjectMonitor::ref_count() const {
  218   return OrderAccess::load_acquire(&_ref_count);
  219 }

I concur that it looks like this can change to:

   218   return _ref_count;

because all the updates to the _ref_count field are done via:
Atomic::inc(), Atomic::dec() or cmpxchg().

Looking at the history of the code, I did have a race where I came to
the conclusion that the load_acquire() was needed. I'll have to go back
to that race's analysis and double check my thinking.


> ---
>
> src/hotspot/share/runtime/objectMonitor.cpp
>
>
>  267   if (AsyncDeflateIdleMonitors &&
>  268       try_set_owner_from(Self, DEFLATER_MARKER) == 
> DEFLATER_MARKER) {

For more context, we are in:

  241 void ObjectMonitor::enter(TRAPS) {


> I don't see why you need to call try_set_owner_from again here as 
> "cur" will already be DEFLATER_MARKER from the previous try_set_owner.

I assume the previous try_set_owner() call you mean is this one:

  248   void* cur = try_set_owner_from(Self, NULL);

This first try_set_owner() is for the most common case of no owner.

The second try_set_owner() call is for a different condition than the first:

  268       try_set_owner_from(Self, DEFLATER_MARKER) == DEFLATER_MARKER) {

L248 is trying to change the _owner field from NULL -> 'Self'.
L268 is trying to change the _owner field from DEFLATER_MARKER to 'Self'.

If the try_set_owner() call on L248 fails, 'cur' can be several possible
values:

   - the calling thread (recursive enter is handled on L254-7)
   - other owning thread value (BasicLock* or Thread*)
   - DEFLATER_MARKER


> Further, I don't see how installing self as the _owner here is valid 
> and means you acquired the monitor, as the fact it was DEFLATER_MARKER 
> means it is still being deflated by another thread doesn't it ???

I guess the comment after L268 didn't work for you:

  269     // The deflation protocol finished the first part (setting owner),
  270     // but it failed the second part (making ref_count negative) and
  271     // bailed. Or the ObjectMonitor was async deflated and reused.

It means that the deflater thread was racing with this enter and
managed to set the owner field to DEFLATER_MARKER as the first step
in the deflation protocol. Our entering thread actually won the race
when it managed to set the ref_count to a positive value as part of
the ObjectMonitorHandle stuff done in the inflate() call that preceded
the enter() call. However, the deflater thread hasn't realized that it
lost the race yet and hasn't restored the owner field back to NULL.

This particular race is covered by this subsection in the wiki:

https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation#AsyncMonitorDeflation-T-enterWinsByA-B-A

If the entering thread succeeds in the try_set_owner_from() call on L268,
it will indeed be the legitimate owner of the ObjectMonitor. The deflating
thread will eventually realize that it lost the race and when it tries to
restore the owner field from DEFLATER_MARKER -> NULL, it will see that the
owner field is no longer DEFLATER_MARKER and just bail out.


>  303 assert(AsyncDeflateIdleMonitors || this->object() != NULL, 
> "invariant");
>
> I don't see how AsyncDeflateIdleMonitors permits a NULL object() at 
> this point given we have hit the contended path!

Good question! I think that assertion adjustment is from before we had
ref_counts and the ref_count now prevents us from getting as far as
clearing the _object field in the async deflation process. I'll restore
that back to the original baseline assert.


>  508 } else {
>  509     ss->print("owner=" INTPTR_FORMAT, NULL);
>
> wouldn't you want to report that it is deflating in this case?

More context here:

  506   } else if (_owner != DEFLATER_MARKER) {
  507     ss->print("owner=" INTPTR_FORMAT, p2i(_owner));
  508   } else {
  509     ss->print("owner=" INTPTR_FORMAT, NULL);
  510   }

so we intentionally print NULL instead of DEFLATER_MARKER here because
the DEFLATER_MARKER value doesn't make the is_busy() function return
true. Unfortunately, is_busy() is over in objectMonitor.hpp:

  252   intptr_t is_busy() const {
  253     // TODO-FIXME: assert _owner == null implies _recursions = 0
  254     // We do not include _ref_count in the is_busy() check because
  255     // _ref_count is for indicating that the ObjectMonitor* is in
  256     // use which is orthogonal to whether the ObjectMonitor itself
  257     // is in use for a locking operation.
  258     intptr_t ret_code = _contentions | _waiters | intptr_t(_cxq) | 
intptr_t(_EntryList);
  259     if (!AsyncDeflateIdleMonitors) {
  260       ret_code |= intptr_t(_owner);
  261     } else {
  262       if (_owner != DEFLATER_MARKER) {
  263         ret_code |= intptr_t(_owner);
  264       }
  265     }
  266     return ret_code;
  267   }

I'll add comment between L508 and L509 to explain why we are ignoring
DEFLATER_MARKER. Something like:

     // We report NULL instead of DEFLATER_MARKER here because is_busy()
     // ignores DEFLATER_MARKER values.


>  533 if (AsyncDeflateIdleMonitors &&
>  534       try_set_owner_from(Self, DEFLATER_MARKER) == 
> DEFLATER_MARKER) {
>
>  660     if (AsyncDeflateIdleMonitors &&
>  661         try_set_owner_from(Self, DEFLATER_MARKER) == 
> DEFLATER_MARKER) {
>
>  792     if (AsyncDeflateIdleMonitors &&
>  793         try_set_owner_from(Self, DEFLATER_MARKER) == 
> DEFLATER_MARKER) {
>
> Same query as above - line 267 - how does this imply a free monitor 
> ready for use ??

Same answer as above. And I have the same/similar comment explaining what
happened if we succeed in the try_set_owner_from() call for each of the
above sites.


> 1005       tty->print_cr("ERROR: ObjectMonitor::exit(): thread=" 
> INTPTR_FORMAT
> 1006                     " is exiting an ObjectMonitor it does not own.",
> 1007                     p2i(THREAD));
> 1008       tty->print_cr("The imbalance is possibly caused by JNI 
> locking.");
> 1009       print_debug_style_on(tty);
>
> Leftover debug output?

Yes, as I mentioned in the CR6-to-CR7-changes attachment to the code
review email for this round:

 > Temporary:
 > <snip>
 >   - ObjectMonitor::exit() has some logic that catches this error:
 >
 >         Non-balanced monitor enter/exit!
 >
 >     I have extra debugging code in there right now that will come
 >     out after a few more complete test cycles have given me more
 >     confidence that all the bug variants are squashed.

I haven't seen this issue pop up during the entire CR7 testing round.
Solaris-X64 is the last platform to have a sighting prior to CR7 and
that was after 7951 iterations of "MoCrazy 1024". Solaris-X64 is
currently > 15000 iterations and 11+ days... I'll leave it running
until I'm ready to start CR8 testing...

So some of the debug output will go away. Other parts will return to
being non-release bits assert() and possibly some log_error() output.


> If you actually want a print here it should be a warning or some other 
> logging operation. (Aside: this happens more often with JIT bugs than 
> actual JNI locking :) ).

Yup. I've actually never seen it with JNI locking during this project.

It appears that the one or two tests we had that explicitly exercised
this part of JNI locking are no longer run as part of Mach5 Tier[1-8].

At one point, Jerry and I discussed adding a diagnostic option to
enable this check in 'release' bits. I still have the note on my
whiteboard from Colorado:

     Jerry -> AssertOnMismatchedMonitor
           -> FailOnMismatchedMonitor

as the two choices to think about. It's probably going to be really
hard to erase that part of the whiteboard... That note has been there
for quite a while...


>
> 1039     // On SPARC that requires MEMBAR #loadstore|#storestore.
> 1040     // But of course in TSO #loadstore|#storestore is not required.
>
> Old comments no longer needed or relevant given the code uses 
> OrderAccess regardless of platform.

Always happy to delete or correct stale comments. Will delete.


> ---
>
> That's it!

Thanks for such a thorough review!! I greatly appreciate you taking
the time for a crawl thru review!!


I'm going to return to making changes for CR8. At this point, I've
(tried to) address for CR7:

- 3 OpenJDK review emails from David H.
- 2 private review emails from Robbin
- 1 private review email from Per

If I've missed a review email from someone I apologize and please
resend it; email may have been lost during my transition from my dying
MBP13 to my new MBP13.

Dan



>
> Thanks,
> David
> -----
>
>
> On 18/10/2019 7:50 am, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The Async Monitor Deflation project is reaching the end game. I have no
>> changes planned for the project at this time so all that is left is code
>> review and any changes that results from those reviews.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>> in the main body of this email.
>>
>> Main bug URL:
>>
>>      JDK-8153224 Monitor deflation prolong safepoints
>>      https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+19.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.07 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.07 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>
>>
>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
>> changes:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab.
>>
>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>> tier[1-3]
>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>
>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>> Performance lab
>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>
>>      - "base" is jdk-14+18
>>      - "v2.07" is the latest version and includes C2 
>> inc_om_ref_count() support
>>        on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>      - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>      - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>> specified
>>
>>           hbIR           hbIR
>>      (max attempted)  (settled)  max-jOPS  critical-jOPS runtime
>>      ---------------  ---------  --------  ------------- -------
>>             34282.00   30635.90  28831.30       20969.20 3841.30 base
>>             34282.00   30973.00  29345.80       21025.20 3964.10 v2.07
>>             34282.00   31105.60  29174.30       21074.00 3931.30 
>> v2.07_handshake
>>             34282.00   30789.70  27151.60       19839.10 3850.20 
>> v2.07_off
>>
>>      - The Aurora Perf comparison tool reports:
>>
>>          Comparison              max-jOPS critical-jOPS
>>          ----------------------  -------------------- 
>> --------------------
>>          base vs 2.07            +1.78% (s, p=0.000)   +0.27% (ns, 
>> p=0.790)
>>          base vs 2.07_handshake  +1.19% (s, p=0.007)   +0.58% (ns, 
>> p=0.536)
>>          base vs 2.07_off        -5.83% (ns, p=0.394)  -5.39% (ns, 
>> p=0.347)
>>
>>          (s) - significant  (ns) - not-significant
>>
>>      - For historical comparison, the Aurora Perf comparision tool
>>          reported for v2.06 with a baseline of jdk-13+31:
>>
>>          Comparison              max-jOPS critical-jOPS
>>          ----------------------  -------------------- 
>> --------------------
>>          base vs 2.06            -0.32% (ns, p=0.345)  +0.71% (ns, 
>> p=0.646)
>>          base vs 2.06_off        +0.49% (ns, p=0.292)  -1.21% (ns, 
>> p=0.481)
>>
>>          (s) - significant  (ns) - not-significant
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>> for our first code review in that new context!!
>>>
>>> I've been focused on changing the monitor list management code to be
>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>> like that, it takes a while to chase down all the new and wonderful
>>> races. At this point, I have the code back to the same stability that
>>> I had with CR5/v2.05/8-for-jdk13.
>>>
>>> To lay the ground work for this round of review, I pushed the following
>>> two fixes to jdk/jdk earlier today:
>>>
>>>     JDK-8230184 rename, whitespace, indent and comments changes in 
>>> preparation
>>>                 for lock free Monitor lists
>>>     https://bugs.openjdk.java.net/browse/JDK-8230184
>>>
>>>     JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>> after 8230184
>>>     https://bugs.openjdk.java.net/browse/JDK-8230317
>>>
>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>> JDK-8230184 and JDK-8230317.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>
>>>
>>>
>>> The primary focus of this review cycle is on the lock-free Monitor List
>>> management changes so here's a webrev for just that patch (v2.06c):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>
>>>
>>> The secondary focus of this review cycle is on the bug fixes that have
>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>> patch (v2.06b):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>
>>>
>>> The third and final bucket for this review cycle is the rename, 
>>> whitespace,
>>> indent and comments changes made in preparation for lock free 
>>> Monitor list
>>> management. Almost all of that was extracted into JDK-8230184 for the
>>> baseline so this bucket now has just a few comment changes relative to
>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>
>>>
>>>
>>> Some folks might want to see just what has changed since the last 
>>> review
>>> cycle so here's a webrev for that (v2.06 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>
>>>
>>>
>>> Last, but not least, some folks might want to see the code before the
>>> addition of lock-free Monitor List management so here's a webrev for
>>> that (v2.00 -> v2.05):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>
>>>
>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> but that should only be changes to describe per-thread list async 
>>> monitor
>>> deflation being done by the ServiceThread.
>>>
>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>
>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>> Oracle's usual set of platforms. It has also been through my usual set
>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>
>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>> using
>>> this patch baselined on jdk-13+31 (for stability):
>>>
>>>           hbIR           hbIR
>>>      (max attempted)  (settled)  max-jOPS  critical-jOPS runtime
>>>      ---------------  ---------  --------  ------------- -------
>>>             34282.00   28837.20  27905.20       19817.40 3658.10 base
>>>             34965.70   29798.80  27814.90       19959.00 3514.60 v2.06d
>>>             34282.00   29100.70  28042.50       19577.00 3701.90 
>>> v2.06d_off
>>>             34282.00   29218.50  27562.80       19397.30 3657.60 
>>> v2.06d_ocache
>>>             34965.70   29838.30  26512.40       19170.60 3569.90 v2.05
>>>             34282.00   28926.10  27734.00       19835.10 3588.40 
>>> v2.05_off
>>>
>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>> the "ocache" config is with 128 byte cache line sizes instead of 64 
>>> byte
>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>> before
>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>> buckets for this review recycle.
>>>
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I've been focused on chasing down and fixing the rare test failures
>>>> that only pop up rarely. So this round is primarily fixes for races
>>>> with a few additional fixes that came from Karen's review of CR4.
>>>> Thanks Karen!
>>>>
>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>> (jdk/jdk) repo soon...
>>>>
>>>> Here's the full webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>
>>>> Here's the incremental webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>
>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>> to match the CR5 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>> available.
>>>> Since I haven't made any performance changes in this round, I'll only
>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>
>>>> Next up:
>>>>
>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>   Linux-X64 and we've determined that some of that comes from
>>>>   contention on the gListLock. So I'm going to investigate removing
>>>>   the gListLock. Yes, another lock free set of changes is coming!
>>>> - Of course, going lock free often causes new races and new failures
>>>>   so that's a good reason for make those changes isolated in their
>>>>   own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>> - I finally have a potential fix for the Win* failure with
>>>>     gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>   but I haven't run it through Mach5 yet so it'll be in the next 
>>>> round.
>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>>   monitor related failures there. I suspect that I need to go take a
>>>>   look at the C2 RTM macro assembler code and look for things that 
>>>> might
>>>>   conflict if Async Monitor Deflation. If you're interested in that 
>>>> kind
>>>>   of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>   added in this round!
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a fix for an issue that came up during performance testing.
>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>> experiments.
>>>>>
>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>> verbose due to the complexity of the issue, but the changes
>>>>> themselves are not that big.
>>>>>
>>>>> Functional:
>>>>>   - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>     ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>     ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>     - is_safepoint_deflation_needed() returns the result of
>>>>>       monitors_used_above_threshold() for safepoint based
>>>>>       monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>     - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>       there is a special deflation request, e.g., System.gc()
>>>>>       - This solves a bug where there are a bunch of Cleanup
>>>>>         safepoints that simply request async deflation which
>>>>>         keeps the async JavaThreads from making progress on
>>>>>         their async deflation work.
>>>>>   - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>       Async deflate idle monitors every so many milliseconds when
>>>>>       MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>   - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>     ObjectSynchronizer::is_async_deflation_needed():
>>>>>     - is_async_deflation_needed() returns true when
>>>>>       is_async_cleanup_requested() is true or when
>>>>>       monitors_used_above_threshold() is true (but no more often than
>>>>>       AsyncDeflationInterval).
>>>>>     - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>       at most GuaranteedSafepointInterval millis:
>>>>>       - This allows is_async_deflation_needed() to be checked at
>>>>>         the same interval as GuaranteedSafepointInterval.
>>>>>         (default is 1000 millis/1 second)
>>>>>       - Once is_async_deflation_needed() has returned true, it
>>>>>         generally cannot return true for AsyncDeflationInterval.
>>>>>         This is to prevent async deflation from swamping the
>>>>>         ServiceThread.
>>>>>   - The ServiceThread still handles async deflation of the global
>>>>>     in-use list and now it also marks JavaThreads for async deflation
>>>>>     of their in-use lists.
>>>>>     - The ServiceThread will check for async deflation work every
>>>>>       GuaranteedSafepointInterval.
>>>>>     - A safepoint can still cause the ServiceThread to check for
>>>>>       async deflation work via is_async_deflation_requested.
>>>>>   - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>     monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>   - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>     VMThread safepoint now set the is_special_deflation_requested
>>>>>     flag to reduce the in-use monitor population that is reported by
>>>>>     ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>
>>>>> Test update:
>>>>>   - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>     AsyncDeflateIdleMonitors.
>>>>>
>>>>> Collateral:
>>>>>   - Add/clarify/update some logging messages.
>>>>>
>>>>> Cleanup:
>>>>>   - Updated comments based on Karen's code review.
>>>>>   - Change 'special cleanup' -> 'special deflation' and
>>>>>     'async cleanup' -> 'async deflation'.
>>>>>     - comment and function name changes
>>>>>   - Clarify MonitorUsedDeflationThreshold description;
>>>>>
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+22.
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>
>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>> mechanism so I have to figure out how to add that content.
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I had some discussions with Karen about a race that was in the
>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>> simple: remove the special case code for async deflation in the
>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>> for ObjectMonitor::enter() protection.
>>>>>>
>>>>>> During those discussions Karen also floated the idea of using the
>>>>>> ref_count field instead of the contentions field for the Async
>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>> Sorry for all the names...
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>
>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>> stress kit is running right now.
>>>>>>
>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>> the results and analyze them.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>> Deflation
>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>> with the
>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). 
>>>>>>> Sorry
>>>>>>> for all the names...
>>>>>>>
>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>> patch
>>>>>>> is out of our hair.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>
>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>> running
>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>
>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>> testing is done.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>> ready to
>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>> and as
>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>> names...
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> Baseline bug fixes URL:
>>>>>>>>
>>>>>>>>     JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>> Deflation project
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>
>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>
>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>> version
>>>>>>>> of JDK-8153224...
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now 
>>>>>>>> and
>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>> Solaris-X64
>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>> work on:
>>>>>>>>>
>>>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>
>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>
>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>> rounds of
>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>> testing on
>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>> Kitchensink
>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>> fastdebug
>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>>>> stress
>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>
>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>> latest version of the patch.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> P.S.
>>>>>>>>> One subtest in 
>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>> trying
>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>> aggravating
>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list