RFR: 8253064: monitor list simplifications and getting rid of TSM [v2]
David Holmes
david.holmes at oracle.com
Mon Nov 9 23:09:59 UTC 2020
Hi Dan,
I'm going to top-post just to keep this short :)
Thanks for all the details below on the performance aspects and use of
the heuristics - all good.
Regarding "ceiling" versus "watermark" ... a "high watermark" is a
ceiling so its really a matter of personal preference what terminology
you want to use.
Thanks again,
David
On 10/11/2020 6:16 am, Daniel D. Daugherty wrote:
> Hi David,
>
> I'm going to try replying to this review comment via email to see how that
> works out...
>
> On 11/9/20 1:06 AM, David Holmes wrote:
>> Hi Dan,
>>
>> On 9/11/2020 1:50 pm, Daniel D.Daugherty wrote:
>>> On Sun, 8 Nov 2020 21:43:00 GMT, David Holmes <dholmes at openjdk.org>
>>> wrote:
>>>
>>>>> How about this:
>>>>> static MonitorList _in_use_list;
>>>>> // The ratio of the current _in_use_list count to the ceiling is
>>>>> used
>>>>> // to determine if we are above MonitorUsedDeflationThreshold
>>>>> and need
>>>>> // to do an async monitor deflation cycle. The ceiling is
>>>>> increased by
>>>>> // AvgMonitorsPerThreadEstimate when a thread is added to the
>>>>> system
>>>>> // and is decreased by AvgMonitorsPerThreadEstimate when a
>>>>> thread is
>>>>> // removed from the system.
>>>>> // Note: If the _in_use_list max exceeds the ceiling, then
>>>>> // monitors_used_above_threshold() will use the in_use_list max
>>>>> instead
>>>>> // of the thread count derived ceiling because we have used more
>>>>> // ObjectMonitors than the estimated average.
>>>>> static jint _in_use_list_ceiling;
>>>>
>>>> Thanks for the comment. So instead of checking the threshhold on
>>>> each OM allocation we use this averaging technique to estimate the
>>>> number of monitors in use? Can you explain how this came about
>>>> rather than the simple/obvious check at allocation time. Thanks.
>>>
>>> I'm not sure I understand your question, but let me that a stab at it
>>> anyway...
>>>
>>> We used to compare the sum of the in-use counts from all the in-use
>>> lists
>>> with the total population of ObjectMonitors. If that ratio was higher
>>> than
>>> MonitorUsedDeflationThreshold, then we would do an async deflation
>>> cycle.
>>> Since we got rid of TSM, we no longer had a population of already
>>> allocated
>>> ObjectMonitors, we had a max value instead. However, when the VMs use
>>> of ObjectMonitors is first spinning up, the max value is typically
>>> very close
>>> to the in-use count so we would always be asking for an async-deflation
>>> during that spinning up phase.
>>>
>>> I created the idea of a ceiling value that is tied to thread count
>>> and the
>>> AvgMonitorsPerThreadEstimate to replace the population value that we
>>> used to have. By comparing the in-use count against the ceiling
>>> value, we
>>> no longer exceed the MonitorUsedDeflationThreshold when the VMs use
>>> of ObjectMonitors is first spinning up so we no longer do async
>>> deflations
>>> continuously during that phase. If the max value exceeds the ceiling
>>> value,
>>> then we're using a LOT of ObjectMonitors and, in that case, we compare
>>> the in-use count against the max to determine if we're exceeding the
>>> MonitorUsedDeflationThreshold.
>>>
>>> Does this help?
>>
>> It helps but I'm still wrestling with what
>> MonitorUsedDeflationThreshold actually means now.
>>
>> So the existing MonitorUsedDeflationThreshold is used as a measure of
>> the proportion of monitors actually in-use compared to the number of
>> monitors pre-allocated.
>
> Slight correction: not a measure of proportion, but a threshold on the
> proportion.
>
>
>> If an inflation request requires a new block to be allocated and we're
>> above MonitorUsedDeflationThreshold % then a request for async
>> deflation occurs (when we actually check).
>
> Not quite. We did have code in om_alloc() that could cause a deflation
> cycle to be invoked, but that code was not enabled by default and was
> removed. See the fix for:
>
> JDK-8230940 Obsolete MonitorBound
> https://bugs.openjdk.java.net/browse/JDK-8230940
>
> which was pushed in jdk-15-B22.
>
> In the current baseline, ObjectSynchronizer::is_async_deflation_needed()
> is used to ask if we need to perform an async deflation cycle. That
> function calls monitors_used_above_threshold() which is where the logic
> that uses the MonitorUsedDeflationThreshold value comes into play.
>
>
>> The new code, IIUC, says, lets assume we expect
>> AvgMonitorsPerThreadEstimate monitors-per-thread. If the number of
>> monitors in-use is > MonitorUsedDeflationThreshold % of
>> (AvgMonitorsPerThreadEstimate * number_of_threads), then we request
>> async deflation.
>
> You understand correctly.
>
>
>> So ... obviously we need some kind of watermark based system for
>> requesting deflation otherwise there will be far too many deflation
>> requests.
>
> Yes. That's what I saw before I added AvgMonitorsPerThreadEstimate and
> the ceiling. Although, I think I like the phrase "watermark" better!
>
> Should I rename this new ceiling concept to watermark??
>
>
>> And we also don't want to have check for exceeding the threshold on
>> every monitor allocation. So the deflation thread will wakeup
>> periodically and check if the threshold is exceeded.
>
> Exactly. GuaranteedSafepointInterval (default 1sec) controls how often
> the MonitorDeflationThread will wake up on its own to check for work.
> AsyncDeflationInterval (default 250ms) controls at most how often
> the MonitorDeflationThread will actually do the work.
>
> However, if an async deflation cycle is directly requested, then it
> will be honored regardless of the limits. At this point, only the
> WhiteBox support uses that feature. Although we do have an RFE from
> you to restore that feature to System.gc():
>
> JDK-8249638 Re-instate idle monitor deflation as part of System.gc()
> https://bugs.openjdk.java.net/browse/JDK-8249638
>
>
>> Okay ... so then it comes down to deciding whether
>> AvgMonitorsPerThreadEstimate is the best way to establish the
>> watermark and what the default value should be. This doesn't seem like
>> something that an application developer could reasonably try to
>> estimate so it is just going to be a tuning knob they adjust somewhat
>> arbitrarily.
>
> I actually don't expect anyone to care about AvgMonitorsPerThreadEstimate,
> but I've created it just-in-case.
>
>
>> I assume the 1024 default came from tuning something?
>
> The default 1024 value is the closest I can come to the baseline VM
> behavior where we allowed a per-thread free-list to allocate at most
> 1024 ObjectMonitors in one attempt. Since the per-thread free list is
> the fastest code path for an ObjectMonitor allocation in the baseline
> VM, I thought using that value made for a reasonable default for the
> AvgMonitorsPerThreadEstimate.
>
> However, we don't pre-allocate in the new code so a thread doesn't
> actually have (at most) 1024 ObjectMonitors waiting on the per-thread
> free list for fast allocation.
>
>
>> Have you looked at the affect on memory use these changes have (ie
>> peak RSS use)?
>
> Actually what I've been looking at is population values in the baseline
> VM and the max in the new code with Kitchensink8H runs. I don't have
> the absolutely latest values, but, IIRC, the new code has a much smaller
> max value than the baseline population value.
>
> Obviously this just tells me the raw numbers of ObjectMonitors and doesn't
> include any "hidden" overhead that would be captured by RSS values, but I
> think it gives me a reasonable feel for memory utilization.
>
>
>> Did your performance measurements look at using different values?
>
> No. All my performance runs used default values. I tend to focus on
> out-of-the-box settings unless I'm looking to justify changing some
> default value.
>
>
>> (I can imagine that with enough memory we can effectively disable
>> deflation and so potentially increase performance. OTOH maybe
>> deflation is so infrequent it is a non-issue.)
>>
>> I have to confess that I never really thought about the old set of
>> heuristics for this, but the fact we're changing the heuristics does
>> raise a concern about what impact applications may see.
>
> None of the performance testing that we've done so far has raised any
> concerns about the new heuristics.
>
> The new ObjectMonitor inflation mechanism is much, much faster than
> the old mechanism. During my Inflate2 stress testing, the baseline VM
> would peak at a population of about 12 million at 4.5 hours into an 8
> hour run with release bits. With the new code, Inflate2 would reach a
> max of 400+ million at about an hour with no signs that was the actual
> peak when I stopped the run.
>
>
>> BTW MonitorUsedDeflationThreshold should really be diagnostic not
>> experimental, as real applications may need to tune it (and people
>> often don't want to use experimental flags in production as a matter
>> of policy).
>
> MonitorUsedDeflationThreshold wasn't added with this project. It was
> added by Robbin using this bug ID:
>
> JDK-8181859 Monitor deflation is not checked in cleanup path
> https://bugs.openjdk.java.net/browse/JDK-8181859
>
> way back in jdk-10-B21... I don't know the reason that Robbin created
> the option as experimental rather than diagnostic, but I can investigate.
>
> Thanks again for the review!
>
>
> At this point, I don't see anything that I plan to change in response
> to this set of comments. I do have a query up above about renaming the
> ceiling concept to watermark. Please let me know what you think.
>
> Dan
>
>>
>> Thanks,
>> David
>> -----
>>
>>> -------------
>>>
>>> PR: https://git.openjdk.java.net/jdk/pull/642
>>>
>
More information about the hotspot-dev
mailing list