RFR: 8181859: Monitor deflation is not checked in cleanup path

Tue Jun 13 12:53:20 UTC 2017

Hi all, please review,

Today cleanup is only triggered by IC buffers that needs to be finalized.
This cleanup check is done in every GuaranteedSafepointInterval (default 1s).
If the cleanup check return false there will be no safepoint, so the option name, GuaranteedSafepointInterval, is misleading.
This makes the time between safepoint potentials much longer after the compiler stabilize
and can have a big negative affect on the numbers of monitor, and so latency increase.

This patch adds a check to ObjectSynchronizer if there is potential many monitors for deflation and thus triggers a safepoint.
It also adds a new (in this patch exprimental) option MonitorUsedDeflationThreshold,
which is the percentage of monitors used in the total population.

The monitor population is today controlled by MonitorBound, the selected GC arbitrary safepoints and
the compiler IC buffer check each GuaranteedSafepointInterval.

After this patch above is still true, but also MonitorUsedDeflationThreshold on GuaranteedSafepointInterval
greatly affects the monitor population as can bee seen below.

The default value for MonitorUsedDeflationThreshold give you ~2-3x of monitors used under GuaranteedSafepointInterval.
This turns out to be a very reasonable value for most cases.

nosql benchmark, MonitorUsedDeflationThreshold 0 (off) vs 90 (vs 20)
Monitor population 132334 -> 63627 (28448)
Total time in safepoint 6.52109 -> 5.74264 (5.58456)
Number of safepoint increase with ~30% (~100%) on default GuaranteedSafepointInterval (1000ms)
Worse cleanup deflation 120 ms -> 35 ms (30ms)
Throughput same

SpecJBB2015 linux x64, critical jops +2-10%

In a special nosql benchmark with very low threshold:
Worse single threaded ObjectSynchronizer::oops_do goes down from ~15ms (avg: ~2.6ms) to ~4ms (avg: ~0.7ms)
Worse single threaded deflation cleanup goes down from ~40ms (avg:~3.3ms) to ~10ms (avg: ~2ms)

On very large machine, e.g. Sprac M7 the overhead of safepointing is very large, could be up to ~40ms.
The default value for such a big machine have a negative impact, specjbb2015 ~ -2%.

Here I suggest the default value should be 90, which seem to have no negative effects on an average Linux x64 server class machine.
A smaller machine thus lower safepoint overhead should also gain from this default value.

I do not see any conflict with the proposed "JEP Draft: Concurrent Monitor Deflation" by Carsten and Roman.
The same check should in that case start the concurrent deflation.

Bug: https://bugs.openjdk.java.net/browse/JDK-8181859
Patch: http://cr.openjdk.java.net/~rehn/8181859/webrev/
JEP: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-June/023654.html

Thanks!

/Robbin