RFR: 8319048: Monitor deflation unlink phase prolongs time to safepoint [v8]
Daniel D. Daugherty
dcubed at openjdk.org
Mon Nov 20 22:36:15 UTC 2023
On Mon, 20 Nov 2023 11:05:55 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> See the symptoms, reproducer, and analysis in the bug.
>>
>> There is a major problem in current unlinking code: we only check for safepoint every 1M monitors, which might take a while with large population of dead monitors, prolonging time to safepoint. Even if we spend 1ns per monitor, that's already +1ms in TTSP. In reality, we see double-digit-ms outliers in TTSP. (There is a secondary problem that comes with searching for new `prev` if monitor insertion happened while we were preparing the batch for unlinking; this might theoretically take the unbounded time.)
>>
>> This PR fixes the issue by providing a smaller batch size for unlinking. The unlinking batch size basically defines two things: a) how often do we check for safepoint (`ObjectSynchronizer::chk_for_block_req` in the method below); and b) how much overhead we have on mutating the monitor lists. If we unlink monitors one by one, then in a worst case, we would do a CAS on `head` and the atomic store for `OM._next` for every monitor, both of which are expensive if done per monitor.
>>
>> The experiments with the reproducer from the bug shows that the threshold of 500 works well: it mitigates TTSP outliers nearly completely, while still providing the large enough batch size to absorb list mutation overheads. See how bad outliers are in baseline, and how outliers get lower with lower batch, and almost completely disappear at 500. I believe the difference between baseline and `MUB=1M` is short-cut-ting the search for new `prev`.
>>
>> 
>>
>> Additional testing:
>> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3`
>> - [x] Ad-hoc performance tests, see above
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision:
>
> - No need for vm.flagless
> - Use precise deflated_count
> - Rework option handling
> - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
> - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
> - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
> - Move unlink_batch init to proper place
> - Add invariant checking
> - Merge branch 'master' into JDK-8319048-monitor-deflation-unlink
> - Fix test after recent test infra renames
> - ... and 5 more: https://git.openjdk.org/jdk/compare/dcf109ba...5c41e46e
Thumbs up in principle.
This will need merging with JDK-8320304 and then will need to be
re-reviewed.
It would also be good if this could be run thru Mach5 testing (at least
Tier[1-3] by someone at Oracle on your behalf.
Based on @dholmes-ora 's comments, it sounds like he is no longer
worried about chasing down potential perf anomalies that were seen
in the Oracle perf farm and we'll just deal with any fall out later. Is
that a correct interpretation?
src/hotspot/share/runtime/synchronizer.cpp line 122:
> 120: // walks searching for new prev during unlink under heavy list inserts.
> 121: break;
> 122: }
Is there a reason to make this check and bail out while executing the `do ... while`
and gathering the first batch of deflated monitors? Why can't you finish gathering
your first batch and let the head change logic below just take care of it?
src/hotspot/share/runtime/synchronizer.cpp line 127:
> 125: // Unlink the found batch.
> 126: if (prev == nullptr) {
> 127: // The batch is not preceded by another monitor. There is a chance the batch starts at head.
Perhaps:
// The current batch is the first batch so there is a chance that it starts at head.
src/hotspot/share/runtime/synchronizer.cpp line 135:
> 133: prev = n;
> 134: }
> 135: assert(prev != nullptr, "Should have found the batch head");
Perhaps:
"Should have found the prev for the current batch"
src/hotspot/share/runtime/synchronizer.cpp line 140:
> 138: } else {
> 139: // The batch is preceded by another monitor. This guarantees the current batch does not
> 140: // start at head. Remove the entire batch without updating the head.
Perhaps
// The current batch is preceded by another batch. This guarantees the current batch
// does not start at head. Unlink the entire current batch without updating the head.
-------------
Marked as reviewed by dcubed (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16412#pullrequestreview-1740795041
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1399815690
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1399821393
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1399822748
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1399824367
More information about the hotspot-runtime-dev
mailing list