RFR: 8319048: Monitor deflation unlink phase prolongs time to safepoint [v3]
Y. Srinivas Ramakrishna
ysr at openjdk.org
Wed Nov 8 09:05:03 UTC 2023
On Fri, 3 Nov 2023 13:08:23 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> See the symptoms, reproducer, and analysis in the bug.
>>
>> This PR fixes the issue by providing a smaller batch size for unlinking. The unlinking batch size basically defines two things: a) how often do we check for safepoint (`ObjectSynchronizer::chk_for_block_req` in the method below); and b) how much overhead we have on mutating the monitor lists. If we unlink monitors one by one, then in a worst case, we would do a CAS on `head` for every monitor, which is bound to be expensive.
>>
>> There is a major problem in current code: we only check for safepoint every 1M monitors, which might take a while. Even if we spend 1ns per monitor, that's already +1ms in TTSP. There is a secondary problem that comes with searching for new `prev` if monitor insertion happened while we were preparing the batch for unlinking.
>>
>> The experiments with the reproducer from the bug shows that the threshold of 500 works well: it mitigates TTSP latencies nearly completely, while still providing the large enough batch size to absorb list mutation overheads. See how bad outliers are in baseline, and how outliers get lower with lower batch, and almost completely disappear at 500. I believe the difference between baseline and `MUB=1M` is short-cut-ting the search for new `prev`.
>>
>> 
>>
>> Additional testing:
>> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3`
>> - [x] Ad-hoc performance tests, see above
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix test after recent test infra renames
A small change might avoid the contended activity at the head of the list where prepends happen. If the list re-walk at the first batch upon contention can be avoided, then your optimal batch size might perhaps change somewhat for the benchmark you used to tune its setting.
Otherwise looks good.
src/hotspot/share/runtime/synchronizer.cpp line 152:
> 150: break;
> 151: }
> 152: if (prev == nullptr && Atomic::load(&_head) != m) {
If the only concurrent activity is list inserts that we are protecting against when processing the first batch, why not just let head stay intact, and commence the batch at the second element in the list (i.e. at head->next_om()) and save some of this effort to bound the walk for prev as prev (i.e. head) won't change while we walk the first (or any) batch. I realize then that one might be left with one element in the list. May be that's not a big deal especially if there are lots of inserts happening (so it won't remain at the head). When inserts are never happening, may be this isn't a big deal? (Not sure.)
-------------
PR Review: https://git.openjdk.org/jdk/pull/16412#pullrequestreview-1719705806
PR Review Comment: https://git.openjdk.org/jdk/pull/16412#discussion_r1386215324
More information about the hotspot-runtime-dev
mailing list