RFR: Tsan oop map table hash fix [v2]
Jiangli Zhou
jiangli at openjdk.org
Sat Aug 17 01:57:06 UTC 2024
On Thu, 15 Aug 2024 23:03:09 GMT, Jiangli Zhou <jiangli at openjdk.org> wrote:
>> src/hotspot/share/tsan/tsanOopMapTable.cpp line 76:
>>
>>> 74: bool added;
>>> 75: size_t* v = _table.put_if_absent(*entry, size, &added);
>>> 76: assert(added, "must be");
>>
>>> Since primitive_hash is now used, I also changed TsanOopMapTableKey::get_hash and TsanOopMapTableKey::equals to use entry.object_no_keepalive() instead of entry._obj. If entry._obj is outdated, it could result incorrect hash value for the current entry. So the updated oop address should be used instead.
>>
>> I think the concurrent processing of `TsanOopMapTable` and another worker thread processing `OopStorageSet` to update TSAN's `WeakHandles` is a scary situation, when we start using `entry.object_no_keepalive()` or `WeakHandle::peek()`. `peek()` may return an updated or not-yet-updated value, which will likely cause hidden bugs.
>>
>> I came up with the buggy scenario below:
>> Suppose object A was at 0x1200, object B was at 0x2300. During GC, A becomes dead, and GC moves B to A's location, so B is now at 0x1200.
>> Worker thread T1 processed part of the `OopStorageSet` and updated B's weak handle (wh_B). The weak handle for A (wh_A) is not yet processed.
>> T1 starts to process `TsanOopMapTable`, and it will try adding the new B to the table, with `wh_B.peek()` returing 0x1200. It will fail to add this entry, because the new B looks identical to the entry for A, with `wh_A.peek()` still returning 0x1200.
>> Then object B will be lost from the `TsanOopMapTable`.
>>
>> I think a better approach is ensure the `TsanOopMapTable` is only processed once after all of the `OopStorageSet` is updated.
>> One option is the alternative approach of inserting `TsanOopMap::notify_tsan_for_freed_and_moved_objects()` to individual GCs as mentioned in https://github.com/openjdk/tsan/pull/19.
>> Another possibility is to develop a mechanism to check whether `OopStorageSet` has been fully processed, before doing actual work to update `TsanOopMapTable`.
>
> Thanks for the analysis. The scenario describes a potential rare case where an `oop` is moved to an address location containing a freed `oop` and tsan is notified about the moved `oop` before the freed `oop`. I agree that's a potential bug.
>
> Doing `TsanOopMap::notify_tsan_for_freed_and_moved_objects()` to individual GCs has higher risks of not covering all different cases if GC changes. I like you suggestion of implementing "a mechanism to check whether OopStorageSet has been fully processed". Implementing within `OopStorageSet` could be complicated. I think it might be doable in `WeakProcessor::Task::work` to check if all tasks are done. I'm trying that approach.
See updated PR for changing to only call TsanOopMap::notify_tsan_for_freed_and_moved_objects() in the last worker that completes WeakProcessor::Task work.
-------------
PR Review Comment: https://git.openjdk.org/tsan/pull/23#discussion_r1720533267
More information about the tsan-dev
mailing list