RFR: 8319048: Monitor deflation unlink phase prolongs time to safepoint [v7]

Axel Boldt-Christmas aboldtch at openjdk.org
Fri Nov 17 12:53:35 UTC 2023


On Thu, 16 Nov 2023 20:07:51 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the symptoms, reproducer, and analysis in the bug.
>> 
>> There is a major problem in current unlinking code: we only check for safepoint every 1M monitors, which might take a while with large population of dead monitors, prolonging time to safepoint. Even if we spend 1ns per monitor, that's already +1ms in TTSP. In reality, we see double-digit-ms outliers in TTSP. (There is a secondary problem that comes with searching for new `prev` if monitor insertion happened while we were preparing the batch for unlinking; this might theoretically take the unbounded time.)
>> 
>> This PR fixes the issue by providing a smaller batch size for unlinking. The unlinking batch size basically defines two things: a) how often do we check for safepoint (`ObjectSynchronizer::chk_for_block_req` in the method below); and b) how much overhead we have on mutating the monitor lists. If we unlink monitors one by one, then in a worst case, we would do a CAS on `head` and the atomic store for `OM._next` for every monitor, both of which are expensive if done per monitor.
>> 
>> The experiments with the reproducer from the bug shows that the threshold of 500 works well: it mitigates TTSP outliers nearly completely, while still providing the large enough batch size to absorb list mutation overheads. See how bad outliers are in baseline, and how outliers get lower with lower batch, and almost completely disappear at 500. I believe the difference between baseline and `MUB=1M` is short-cut-ting the search for new `prev`.
>> 
>> ![plot-monitor-unlink-1](https://github.com/openjdk/jdk/assets/1858943/21b22029-6802-44a6-aea5-52b7fd124717)
>> 
>> Additional testing:
>>  - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3`
>>  - [x] Ad-hoc performance tests, see above
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
>  - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
>  - Move unlink_batch init to proper place
>  - Add invariant checking
>  - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
>  - Fix test after recent test infra renames
>  - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
>  - Batch 500
>  - Pre-final touchups
>  - Option range and tests
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/beb75f42...09863092

Just had some questions / comments.

src/hotspot/share/runtime/globals.hpp line 738:

> 736:   product(intx, MonitorUnlinkBatch, 500, DIAGNOSTIC,                        \
> 737:           "The maximum number of monitors to unlink in one batch.")         \
> 738:           constraint(MonitorUnlinkBatchConstraintFunc, AfterErgo)           \

Could this have a `range(1, max_jint)` as well? 
Should it mention MonitorDeflationMax? Or does it even need the constrained by MonitorDeflationMax. While it seems useless to have a value greater than MonitorDeflationMax, it does not seem invalid.

src/hotspot/share/runtime/synchronizer.cpp line 108:

> 106:         unlinked_list->append(next);
> 107:         next = next_next;
> 108:         if (unlink_batch++ >= (size_t)MonitorUnlinkBatch) {

Previously we `break` on the `MonitorDeflationMax` which is a implicit limit of the number of deflations. Not as `MonitorUnlinkBatch` could be `MonitorDeflationMax-1` then we would could walk more links than needed. 

Overall I am unsure why we do not pass in deflated_count to unlink and break when the `unlink_count == deflated_count`. Maybe I am missing something and there can be more deflated monitors than  the deflation_count we get. If so it seems like both this and the previous implementation would miss unlinking deflated monitors.

src/hotspot/share/runtime/synchronizer.cpp line 131:

> 129:             prev = n;
> 130:           }
> 131:           assert(prev != nullptr, "Should have found the batch head");

Will this ever be reached? Seems like we would crash when reading the `_next_om` field in n. Or could `m` ever be `nullptr` here.

-------------

PR Review: https://git.openjdk.org/jdk/pull/16412#pullrequestreview-1736868325
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1397231202
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1397245901
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1397236332


More information about the hotspot-runtime-dev mailing list