RFR(L) 8153224 Monitor deflation prolong safepoints (CR7/v2.07/10-for-jdk14)

Fri Nov 1 16:08:48 UTC 2019

Hi David,

Thanks for closing the loop on the load_acquire() discussion.
More below...

On 10/31/19 12:00 AM, David Holmes wrote:
> Hi Dan,
>
> First I've seen your additional tracking emails, but for now just want 
> to respond to the load_acquire discussion below ...
>
> On 30/10/2019 6:18 am, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Sorry for the delay in replying to this email. I spent most of last week
>> migrating my development environment from my dying MBP13 to a new MBP13.
>> I think I have everything recovered, but only time will tell...
>>
>> More below...
>>
>> On 10/23/19 12:00 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Trimming to the open issues.
>>
>> I'll try to do the same (but was only able to trim one thing)...
>>
>>
>>>
>>> On 23/10/2019 6:46 am, Daniel D. Daugherty wrote:
>>>> Okay, time to return to my uses of release_store() and load_acquire().
>>>> Consider this bit of code:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>
>>>>      L351: // Take an ObjectMonitor from the start of the specified 
>>>> list. Also
>>>>      L352: // decrements the specified counter. Returns NULL if 
>>>> none are available.
>>>>      L353: static ObjectMonitor* 
>>>> take_from_start_of_common(ObjectMonitor* volatile * list_p,
>>>>      L354: int volatile * count_p) {
>>>>      L355:   ObjectMonitor* next = NULL;
>>>>      L356:   ObjectMonitor* take = NULL;
>>>>      L357:   // Mark the list head to guard against A-B-A race:
>>>>      L358:   if (!mark_list_head(list_p, &take, &next)) {
>>>>      L359:     return NULL;  // None are available.
>>>>      L360:   }
>>>>      L361:   // Switch marked list head to next (which unmarks the 
>>>> list head, but
>>>>      L362:   // leaves take marked):
>>>>      L363:   OrderAccess::release_store(list_p, next);
>>>>      L364:   Atomic::dec(count_p);
>>>>      L365:   // Unmark take, but leave the next value for any 
>>>> lagging list
>>>>      L366:   // walkers. It will get cleaned up when take is 
>>>> prepended to
>>>>      L367:   // the in-use list:
>>>>      L368:   set_next(take, next);
>>>>      L369:   return take;
>>>>      L370: }
>>>>
>>>>
>>>> On L358, we call mark_list_head() and if that call returns true, then
>>>> the calling thread is the "owner" of the list head. My idea is that
>>>> once my thread "owns" the list head, I can use release_store() as a
>>>> smaller hammer than cmpxchg() to sync out the new value ('next') as
>>>> the new list head.
>>>>
>>>> Should I be doing something other than release_store() here?
>>>>
>>>> My thinking is that release_store() pairs up with the load_acquire()
>>>> of any reader thread that's trying to simply walk the list. And that
>>>> release_store() also pairs up with any _other_ writer thread that's
>>>> trying to mark the list head using:
>>>>
>>>>      mark_list_head() -> mark_next() -> cmpxchg()
>>>>
>>>> Here's mark_list_head():
>>>>
>>>>      L194: // Mark the next field in the list head ObjectMonitor. 
>>>> If marking was
>>>>      L195: // successful, then the mid and the unmarked next field 
>>>> are returned
>>>>      L196: // via parameter and true is returned. Otherwise false 
>>>> is returned.
>>>>      L197: static bool mark_list_head(ObjectMonitor* volatile * 
>>>> list_p,
>>>>      L198:                            ObjectMonitor** mid_p, 
>>>> ObjectMonitor** next_p) {
>>>>      L199:   while (true) {
>>>>      L200:     ObjectMonitor* mid = OrderAccess::load_acquire(list_p);
>>>>      L201:     if (mid == NULL) {
>>>>      L202:       return false;  // The list is empty so nothing to 
>>>> mark.
>>>>      L203:     }
>>>>      L204:     if (mark_next(mid, next_p)) {
>>>>      L205:       if (OrderAccess::load_acquire(list_p) != mid) {
>>>>      L206:         // The list head changed so we have to retry.
>>>>      L207:         set_next(mid, *next_p);  // unmark mid
>>>>      L208:         continue;
>>>>      L209:       }
>>>>      L210:       // We marked next field to guard against races.
>>>>      L211:       *mid_p = mid;
>>>>      L212:       return true;
>>>>      L213:     }
>>>>      L214:   }
>>>>      L215: }
>>>>
>>>> which does a matching load_acquire() to get the current list head
>>>> and calls mark_next() to try and mark it. It then calls load_acquire()
>>>> again to verify that the list head hasn't changed while doing the
>>>> mark_next().
>>>>
>>>> So here's mark_next():
>>>>
>>>>   161 // Mark the next field in an ObjectMonitor. If marking was 
>>>> successful,
>>>>   162 // then the unmarked next field is returned via parameter and 
>>>> true is
>>>>   163 // returned. Otherwise false is returned.
>>>>   164 static bool mark_next(ObjectMonitor* om, ObjectMonitor** 
>>>> next_p) {
>>>>   165   // Get current next field without any marking value.
>>>>   166   ObjectMonitor* next = (ObjectMonitor*)
>>>>   167 ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
>>>>   168   if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) 
>>>> != next) {
>>>>   169     return false;  // Could not mark the next field or it was 
>>>> already marked.
>>>>   170   }
>>>>   171   *next_p = next;
>>>>   172   return true;
>>>>   173 }
>>>>
>>>> We use load_acquire() on L166-7 to make sure that we have the latest
>>>> value in the next_om field. That load_acquire() matches up with the
>>>> release_store() done by another thread to unmark the next field (see
>>>> the set_next() function). Or it matches with the cmpxchg() done by
>>>> another to mark the next field. Yes, we could do a regular load when
>>>> we know that the field is only changed by cmpxchg(), but it could
>>>> have been changed by release_store() and my thinking is that we need
>>>> a matching load_acquire().
>>>>
>>>> We have to use cmpxchg() here to try and set the mark on the next 
>>>> field.
>>>> If successful, then we return the value of the unmarked next field via
>>>> the 'next_p' param and we return true. However, just because we marked
>>>> the next field in the ObjectMonitor doesn't mean that we marked the 
>>>> list
>>>> head. That's what the second load_acquire() on L205 above verifies.
>>>>
>>>>
>>>> Okay, that was pretty gory on the details of my thinking! I think it's
>>>> pretty clear that I'm doing load_acquire()/release_store() pairs on
>>>> the list head pointers or the next_om fields to make sure that I match
>>>> loads and stores of _just_ the list head pointer or the next_om field
>>>> in question. I'm not trying affect other fields in the ObjectMonitor
>>>> or at least I don't think I have to do that.
>>>>
>>>> Maybe I've misunderstood the load_acquire()/release_store() stuff 
>>>> (again)
>>>> and this code is doing way too much! I would be happy to have my 
>>>> thinking
>>>> about this load_acquire()/release_store() stuff corrected and the code
>>>> simplified. Maybe in this sequence:
>>>>
>>>>      mark_list_head() -> mark_next() -> cmpxchg()
>>>>
>>>> we don't need all the load_acquire() calls because the sequence 
>>>> only ends
>>>> successfully with a cmpxchg() and that makes memory happy 
>>>> everywhere with
>>>> just simple loads. Dunno. I definitely need more feedback here!
>>>
>>> Okay that was a massive deep dive :) but it is exactly what I was 
>>> referring to when I said:
>>>
>>> "My main comment on all the list management code pertains to the
>>> difficulty in seeing which release_store is paired with which
>>> load_acquire, and why."
>>
>> Yup. And that comment is exactly why I wrote the "massive deep dive"
>> reply... so I _think_ I replied to the comment and thus proved the
>> point that this stuff is difficult all at the same time. :-)
>>
>>
>>> To understand the use of acquire/release/cmpxchg in detail you 
>>> really need to see all that code inlined/flattened so that you can 
>>> see how they actually arrange themselves.
>>
>> Agreed...
>
> To be clear I can't see the code that way so it is difficult to give 
> concrete opinions on appropriateness of load_acquire/release_store at 
> individual code locations.
>
>>
>>> I'm immediately suspicious of a load_acquire that is just before a 
>>> cmpxchg on the same field because the cmpxchg will subsume any 
>>> memory affects of the load-acquire.
>>
>> Okay, I'm trying to figure out if this comment is meant for L166, L167
>> and L168 of mark_next():
>>
>>    165   // Get current next field without any marking value.
>>    166   ObjectMonitor* next = (ObjectMonitor*)
>>    167 ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
>>    168   if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) 
>> != next) {
>>    169     return false;  // Could not mark the next field or it was 
>> already marked.
>>    170   }
>>    171   *next_p = next;
>>    172   return true;
>>
>> or if this is just a general comment or both... :-)
>
> Both.
>
>>
>>> But the devil is in the detail.
>>
>> Agreed and with this project there are so many details. :-(
>>
>>
>>> If a field can be set by release_store, and there are previous 
>>> stores related to that which must be visible, then a read of that 
>>> field can't be via a plain load (but either load_acquire or as part 
>>> of cmpxchg).
>>
>> As the comment on L166 says, the goal of L167 and L168 is to get the
>> current next field without any marking value. Since the next field 
>> can be
>> changed by either cmpxchg() or release_store() and we have no idea which
>> one made the most recent update, we have to use a load_acquire() to get
>> the latest value of the next field.
>>
>> As for the cmpxchg() on L168, the potential update operation has to be
>> done with cmpxchg() in order to safely and exclusively allow only one
>> thread to mark the ObjectMonitor's next field (or no threads if the next
>> field is already marked).
>>
>> So... I think I have good reasons for a load_acquire() to be immediately
>> followed by a cmpxchg(), in this case anyway.
>
> Actually no. :) The purpose of a load_acquire is to ensure that 
> subsequent loads of other variables will return values written before 
> the release_store that is paired with the load_acquire. In this code 
> there are no subsequent loads between the load_acquire and the 
> cmpxchg. The cmpxchg itself provides full bi-directional fence 
> semantics, so it deals with any loads after the cmpxchg. Hence the 
> acquire part of the load_acquire is simply not needed.

So instead of this:

    166   ObjectMonitor* next = (ObjectMonitor*)
    167 ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
    168   if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) != 
next) {

we can do the simpler:

    166 ObjectMonitor* next = (ObjectMonitor*)((intptr_t)om->_next_om) & 
~0x1);

and the simple load of the _next_om field will do the trick. Okay,
I'll include that change in my next round of v2.08 testing.

>
>>
>> Of course, I'm assuming that this is the load_acquire(), cmpxchg() pair
>> that you are suspicious of. It could be some other pair or it could be
>> the pairs in general.
>
> Anywhere there are no intervening loads, between the load_acquire, and 
> the cmpxchg, the acquire part is not needed.

I didn't find any other load_acquire(), cmpxchg() pairs when I looked
the other day.

>
>>
>>> But there are so many fields at play here that it is very difficult 
>>> to keep track of everything.
>>>
>>> You mention above:
>>>
>>> "My idea is that once my thread "owns" the list head, I can use 
>>> release_store() as a smaller hammer than cmpxchg() to sync out the 
>>> new value ('next') as the new list head."
>>>
>>> and I tend to agree that the second cmpxchg is certainly not needed 
>>> as there cannot be any contended update after you've marked the 
>>> "next" field.
>>
>> Not quite as strong of an agreement as I was hoping for, but I'll 
>> take it. :-)
>>
>> Is there a different memory update/sync operation I should be using 
>> for my
>> "smaller hammer"?  I don't think so, but I could be missing something...
>>
>> For example, I don't think I can switch from 
>> OrderAccess::release_store()
>> to Atomic::store() and from OrderAccess::load_acquire() to 
>> Atomic::load()
>> because I'll lose the {release, acquire} matching up component.
>
> Well you wouldn't need Atomic operations regardless.
>
> Whether you could use plain load/store depends, as we've already 
> agreed, on whether other accesses are dependent on the release/acquire 
> semantics. And I can't see whether they are not with the current 
> structure of the code, so I have no concrete suggestions here so lets 
> move on.

Moving on.

>
>>
>> I also need to clarify one thing. In my reply to your previous set of
>> comments, I wrote:
>>
>>> I think it's
>>> pretty clear that I'm doing load_acquire()/release_store() pairs on
>>> the list head pointers or the next_om fields to make sure that I match
>>> loads and stores of _just_ the list head pointer or the next_om field
>>> in question. I'm not trying affect other fields in the ObjectMonitor
>>> or at least I don't think I have to do that.
>>
>> Now that I've had time to think about it, I have to retract that last
>> sentence. When doing these lock free list manipulations, we often will
>> change some other field in the ObjectMonitor as part of that 
>> manipulation.
>>
>> For example, as part of deflation, we will clear the _object field when
>> we extract the ObjectMonitor from the containing in-use list and before
>> we prepend it to the global free list. A release_store() is used to 
>> update
>> the next field in the (cur_mid_in_use) ObjectMonitor that refers to the
>> deflated ObjectMonitor. That release_store() is important for sync'ing
>> out the clearing of the _object field because once the deflated
>> ObjectMonitor is unlinked from an in-use list, that field is no longer
>> updated by GC.
>
> Agreed.
>
>>
>>> In essence the marking is implementing a spin lock
>>
>> I like the "spin lock" analogy.
>>
>>
>>> so once you have the "lock" you can use normal load/stores to make 
>>> updates
>>
>> Yup. I get it, e.g., we mark an ObjectMonitor so that we can deflate it
>> and part of deflation is making some updates...
>>
>>
>>> - but that assumes all updates occur under the "lock", that 
>>> releasing the "lock" also has the right barriers and that any reads 
>>> sync correctly with the writes. If any of that doesn't hold then 
>>> release_store and load_acquire will be needed rather than plain 
>>> stores and loads.
>>
>> I _think_ that all updates that we make as part of list manipulation are
>> done after marking the ObjectMonitor's next field and before we unmark
>> that same next field. So I _think_ the "lock" is properly held...
>>
>> Of course, it's always possible that I missed something, but that's what
>> we have code reviewers for right? :-)
>
> I love your sense of humour ;-)

I try... it's helpful to have a sense of humour with ObjectMonitors... :-)

>
>>
>>> So again you'd need to look at all the flattened code paths to see 
>>> how this all hangs together.
>>
>> For my load_acquire() calls I have matched them up with the 
>> corresponding
>> release_store() call. And in some cases the load_acquire() call site 
>> matches
>> up with a call site that does release_store() on one branch and 
>> cmpxchg()
>> on the other. Yes, complicated and mind numbing... Sorry about that...
>>
>> I've also looked at the release_store() calls and matched them up with a
>> corresponding load_acquire().
>>
>> However, I've looked at this code so many different times that I 
>> might be
>> blind to some aspects of it so another pair of eyes looking for missing
>> matches is always welcome.
>
> Without seeing an inlined/flattened version of the code it's near 
> impossible (for me at least) to see the code paths in detail.
>
>>
>>>
>>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>>
>>>>> I have some concerns that the serviceThread must now wakeup 
>>>>> periodically instead of only on demand. It has a lot of tasks to 
>>>>> check now and it is not obvious how complex or trivial those 
>>>>> checks are. We don't currently gather any statistics in this area 
>>>>> so we have no idea how long the serviceThread typically waits 
>>>>> between requests. The fact the serviceThread is woken on demand 
>>>>> means that the desire to be "checked at the same interval" seems 
>>>>> exceedingly unlikely - the serviceThread would have to waiting for 
>>>>> extended periods of time for the timed-wait to have any real 
>>>>> affect on the frequency of the async monitor deflation.
>>>>
>>>> Periodically waking up the ServiceThread is done to match the 
>>>> safepoint
>>>> cleanup period of GuaranteedSafepointInterval when doing safepoint 
>>>> based
>>>> deflation. Without waking up periodically, we could go a long time 
>>>> before
>>>> doing any deflation via the ServiceThread and that would definitely be
>>>> an observable change in behavior relative to safepoint based 
>>>> deflation.
>>>> In some of my test runs, I had seen us go for 8-27 seconds without 
>>>> doing
>>>> any async monitor deflation. Of course, that meant that there was 
>>>> more to
>>>> cleanup when we did do it.
>>>
>>> How does that occur? If the ServiceThread is woken when there are 
>>> monitors to deflate then that implies that if it was not woken then 
>>> there were none to deflate. Or turning it around, if after waiting 
>>> for GuaranteedSafepointInterval ms the Service Thread finds there 
>>> are monitors to deflate why wasn't it directly notified of that?
>>
>> I think there's a bit of confusion here.
>>
>> We don't wake up the ServiceThread when there are ObjectMonitors to
>> deflate. We wake up the ServiceThread to see if it can deflate any
>> ObjectMonitors. ObjectMonitor idleness or suitability to be deflated is
>> determined by the ServiceThread. ObjectMonitor idleness is not a 
>> property
>> determined by another part of the system and the ServiceThread is woken
>> to handle it.
>
> Right my mistake.
>
> That's it for this email.

And that's it for me. Thanks again for closing the loop on this
load_acquire() stuff.

Dan

>
> Thanks,
> David
> -----
>
>>
>>>
>>>> I've been thinking about have my own AsyncDeflateIdleMonitorsThread 
>>>> and move
>>>> all the async deflation code from ServiceThread there. So many 
>>>> things are
>>>> being added to the ServiceThread that I do wonder whether it is 
>>>> overloaded,
>>>> but (as you said) there's no way to know that (yet).
>>>
>>> Some stuff just got moved out of the ServiceThread to a new 
>>> NotificationThread (which is a non-hidden Java thread), so the 
>>> ServiceThread has a little more capacity (potentially) to handle 
>>> async monitor deflation. But yes if the ServiceThread becomes a 
>>> bottleneck it will defeat the purpose of offloading work from the 
>>> VMThread at safepoints.
>>
>> The other "benefit" of having a dedicated AsyncDeflateIdleMonitorsThread
>> is that it would be easier for us to determine how much actual work is
>> done by the whole ObjectMonitor deflation procedure. We can say XX% of
>> the VMs execution time went to the AsyncDeflateIdleMonitorsThread.
>>
>> Also, the work moved out of the ServiceThread is not an "always on" 
>> thing
>> so the help for async deflation is limited at best.
>>
>>
>>>
>>>> BTW, I think GuaranteedSafepointInterval might be "broken" at this 
>>>> point
>>>> in that we don't guarantee a safepoint at that interval anymore. 
>>>> This has
>>>> kind of slipped through the cracks...
>>>
>>> We can look at that separately.
>>
>> Agreed. Not directly related to the Async Monitor Deflation project
>> except as the explanation for how I went so long without seeing an
>> async monitor deflation... Without that cleanup safepoint happening
>> every GuaranteedSafepointInterval (and without the wait() timeout 
>> value),
>> we don't have ObjectSynchronizer::do_safepoint_work() periodically
>> waking up the ServiceThread to do an async monitor deflation sweep...
>>
>>
>>
>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>
>>>>> No changes to this file.
>>>>>
>>>>> ---
>>>>>
>>>>> test/jdk/tools/jlink/multireleasejar/JLinkMultiReleaseJarTest.java
>>>>>
>>>>> No changes to this file.
>>>>
>>>> Correct. I'm unsure how to deal with this type of thing. This project
>>>> is managed as a series of patches. Earlier patches changed those 
>>>> files,
>>>> but a later patch undid those changes. So depending on where you are
>>>> in the patch queue, the logical 'files.list' contents are different.
>>>> I'm currently using a 'files.list' that reflects every file touched by
>>>> the current set of patches and that's why those two files are included
>>>> with no changes.
>>>>
>>>> Suggestions?
>>>
>>> When you think you have the final version ready for review then you 
>>> should flatten out the patches (I don't know the right terminology) 
>>> so that the final webrev is accurate.
>>
>> Good point. I've been so focused on different reviewers wanting to come
>> at this mass of code from different phases of the development process 
>> and
>> that led me to keep all the various version patches intact. I think 
>> we're
>> past the point where we are going to back up to, e.g., v2.05, and then
>> do a completely different lock free monitor list as a new v2.06.
>>
>> Okay. When we roll to v2.08, v2.07 and predecessors will get 
>> flattened so
>> that we have an easier review process.
>>
>>
>>>
>>>> Thanks for the partial review! I'm looking forward to resolving the
>>>> queries about and your next set of review comments.
>>>
>>> I will try to look at the ObjectMonitor changes themselves this 
>>> afternoon. So many non-trivial reviews/discussions at the moment, 
>>> plus my own work :)
>>
>> Thanks for spending so much time and energy on this code.
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> ---
>>>>>
>>>>>
>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> The Async Monitor Deflation project is reaching the end game. I 
>>>>>>> have no
>>>>>>> changes planned for the project at this time so all that is left 
>>>>>>> is code
>>>>>>> review and any changes that results from those reviews.
>>>>>>>
>>>>>>> Carsten and Roman! Time for you guys to chime in again on the 
>>>>>>> code reviews.
>>>>>>>
>>>>>>> I have attached the list of fixes from CR6 to CR7 instead of 
>>>>>>> putting it
>>>>>>> in the main body of this email.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>>     JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-14+19.
>>>>>>>
>>>>>>> Here's the full webrev URL for those folks that want to see all 
>>>>>>> of the
>>>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>>>>>
>>>>>>>
>>>>>>> Some folks might want to see just what has changed since the 
>>>>>>> last review
>>>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>>>>>
>>>>>>>
>>>>>>> The OpenJDK wiki has been updated to match the 
>>>>>>> CR7/v2.07/10-for-jdk14 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> The jdk-14+18 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-8]
>>>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>>>> through my usual
>>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with 
>>>>>>> the addition
>>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the 
>>>>>>> other tests in
>>>>>>> my lab.
>>>>>>>
>>>>>>> The jdk-14+19 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-3]
>>>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>>>>>> process.
>>>>>>>
>>>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>> Performance lab
>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>>>
>>>>>>>     - "base" is jdk-14+18
>>>>>>>     - "v2.07" is the latest version and includes C2 
>>>>>>> inc_om_ref_count() support
>>>>>>>       on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors 
>>>>>>> option
>>>>>>>     - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>>>>     - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>>>>>> specified
>>>>>>>
>>>>>>>          hbIR           hbIR
>>>>>>>     (max attempted)  (settled)  max-jOPS critical-jOPS runtime
>>>>>>>     ---------------  ---------  -------- ------------- -------
>>>>>>>            34282.00   30635.90  28831.30 20969.20 3841.30 base
>>>>>>>            34282.00   30973.00  29345.80 21025.20 3964.10 v2.07
>>>>>>>            34282.00   31105.60  29174.30 21074.00 3931.30 
>>>>>>> v2.07_handshake
>>>>>>>            34282.00   30789.70  27151.60 19839.10 3850.20 v2.07_off
>>>>>>>
>>>>>>>     - The Aurora Perf comparison tool reports:
>>>>>>>
>>>>>>>         Comparison              max-jOPS critical-jOPS
>>>>>>>         ----------------------  -------------------- 
>>>>>>> --------------------
>>>>>>>         base vs 2.07            +1.78% (s, p=0.000) +0.27% (ns, 
>>>>>>> p=0.790)
>>>>>>>         base vs 2.07_handshake  +1.19% (s, p=0.007) +0.58% (ns, 
>>>>>>> p=0.536)
>>>>>>>         base vs 2.07_off        -5.83% (ns, p=0.394) -5.39% (ns, 
>>>>>>> p=0.347)
>>>>>>>
>>>>>>>         (s) - significant  (ns) - not-significant
>>>>>>>
>>>>>>>     - For historical comparison, the Aurora Perf comparision tool
>>>>>>>         reported for v2.06 with a baseline of jdk-13+31:
>>>>>>>
>>>>>>>         Comparison              max-jOPS critical-jOPS
>>>>>>>         ----------------------  -------------------- 
>>>>>>> --------------------
>>>>>>>         base vs 2.06            -0.32% (ns, p=0.345) +0.71% (ns, 
>>>>>>> p=0.646)
>>>>>>>         base vs 2.06_off        +0.49% (ns, p=0.292) -1.21% (ns, 
>>>>>>> p=0.481)
>>>>>>>
>>>>>>>         (s) - significant  (ns) - not-significant
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>
>>>> <trimmed older review invites>
>>>>
>>