RFR: Tsan oop map table hash fix [v2]
Man Cao
manc at openjdk.org
Mon Aug 19 18:57:01 UTC 2024
On Sat, 17 Aug 2024 01:51:22 GMT, Jiangli Zhou <jiangli at openjdk.org> wrote:
>> Additional testing on JDK 21 with tsan found some crashes due to using `oop` `identity_hash` for computing the TsanOopMap entry hash. The crashes manifest as following two types:
>>
>> 1) `guarantee(moves_this_cycle) failed: "Impossible to reconcile GC"`
>>
>> 2) `ThreadSanitizer: CHECK failed: tsan_sync.cpp:261 "((*dst_meta)) == ((0))"`
>>
>> With `-Xlog:tsan=trace` output, I found the same `oop` is added to the map twice in some cases. The second time when an `oop` is added occurs during `InterpreterMacroAssembler::lock_object` path. Apparently a new identity hash is computed for the `oop` with a previous computed identity hash. That causes the `oop` being added again. Later when notifying tsan if the `oop` is moved, it triggers the above crashes depending on the memory layout.
>>
>> This fix computes `primitive_hash` using the oop address instead of `identity_hash` for entry hash. During GC, `TsanOopMapTable::collect_moved_objects_and_notify_freed` unlinks an entry from the map if the enclosing `oop` is moved. The moved entries are collected into an array and are inserted back to the map after iterating the map (within a worker thread). That's needed since the entry hash would be different after an `oop` is moved. The operations are protected by the `TsanOopMap_lock`.
>>
>> Since `primitive_hash` is now used, I also changed `TsanOopMapTableKey::get_hash` and `TsanOopMapTableKey::equals` to use `entry.object_no_keepalive()` instead of `entry._obj`. If `entry._obj` is outdated, it could result incorrect hash value for the current entry. So the updated oop address should be used instead. Running tsan jtreg tests on slowdebug binary run into new failures without the change.
>>
>> The change is updated to only call `TsanOopMap::notify_tsan_for_freed_and_moved_objects` from the last worker thread that completes the parallel WeakProcessor::Task work.
>>
>> Testing
>>
>> Tsan jtreg tests on linux-x64:
>>
>> $ bash configure --with-boot-jdk=/usr/local/google/home/jianglizhou/openjdk/jdk-21.0.1 --with-debug-level=release --disable-warnings-as-errors --with-jtreg=/usr/local/google/home/jianglizhou/github/jtreg/build/images/jtreg --with-stdc++lib=static --disable-precompiled-headers --enable-unlimited-crypto --with-native-debug-symbols=internal --with-default-make-target=jdk-image --disable-warnings-as-errors --with-toolchain-type=clang
>>
>>
>> ==============================
>> Test summary
>> ==============================
>> TEST ...
>
> Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision:
>
> - - Release memory for the malloced new TsanOopMapTableKey after adding back to the TsanOopMap. Thanks manc@ for
> noticing the memory leak.
> - Add TsanOopMapTableKey::TsanOopMapTableKey(const TsanOopMapTableKey& src, oop obj).
> - Change to call TsanOopMap::notify_tsan_for_freed_and_moved_objects from the last worker thread that completes
> the parallel WeakProcessor::Task work:
> - Add WeakProcessor::Task::_nworkers_completed to track the completed workers. This is only used if
> INCLUDE_TSAN is true.
> - Increment WeakProcessor::Task::_nworkers_completed in WeakProcessor::Task::work when each parallel worker
> finishes work.
>
> With the change, TsanOopMap::notify_tsan_for_freed_and_moved_objects is only called from one thread. This is
> to address the issue described in https://github.com/openjdk/tsan/pull/23#discussion_r1717793720.
Thanks. This looks pretty good, one last optimization feedback.
src/hotspot/share/gc/shared/weakProcessor.inline.hpp line 100:
> 98:
> 99: TSAN_RUNTIME_ONLY(
> 100: MutexLocker mu(TsanOopMap_lock, Mutex::_no_safepoint_check_flag);
I think we can completely get rid of this lock during GC, by switching to Atomic on `_nworkers_completed`:
TSAN_RUNTIME_ONLY(
uint completed = Atomic::add(&_nworkers_completed, 1);
assert(_nworkers >= completed, "must be");
if (_nworkers == completed) {
TsanOopMap::notify_tsan_for_freed_and_moved_objects();
}
)
We need to declared `_nworkers_completed` as `volatile uint` for using Atomic.
src/hotspot/share/tsan/tsanOopMapTable.cpp line 73:
> 71:
> 72: bool TsanOopMapTable::add_entry(TsanOopMapTableKey *entry, size_t size) {
> 73: assert(TsanOopMap_lock->is_locked(), "sanity check");
If we get rid of the locking in `WeakProcessor::Task::work()`, this assert could be `SafepointSynchronize::is_at_safepoint()` instead.
-------------
PR Review: https://git.openjdk.org/tsan/pull/23#pullrequestreview-2246218431
PR Review Comment: https://git.openjdk.org/tsan/pull/23#discussion_r1722201699
PR Review Comment: https://git.openjdk.org/tsan/pull/23#discussion_r1722211993
More information about the tsan-dev
mailing list