RFR: Tsan oop map table hash fix [v2]

Sat Aug 17 01:57:06 UTC 2024

On Thu, 15 Aug 2024 23:03:09 GMT, Jiangli Zhou <jiangli at openjdk.org> wrote:

>> src/hotspot/share/tsan/tsanOopMapTable.cpp line 76:
>> 
>>> 74:   bool added;
>>> 75:   size_t* v = _table.put_if_absent(*entry, size, &added);
>>> 76:   assert(added, "must be");
>> 
>>> Since primitive_hash is now used, I also changed TsanOopMapTableKey::get_hash and TsanOopMapTableKey::equals to use entry.object_no_keepalive() instead of entry._obj. If entry._obj is outdated, it could result incorrect hash value for the current entry. So the updated oop address should be used instead. 
>> 
>> I think the concurrent processing of `TsanOopMapTable` and another worker thread processing `OopStorageSet` to update TSAN's `WeakHandles` is a scary situation, when we start using `entry.object_no_keepalive()` or `WeakHandle::peek()`. `peek()` may return an updated or not-yet-updated value, which will likely cause hidden bugs.
>> 
>> I came up with the buggy scenario below:
>> Suppose object A was at 0x1200, object B was at 0x2300. During GC, A becomes dead, and GC moves B to A's location, so B is now at 0x1200.
>> Worker thread T1 processed part of the `OopStorageSet` and updated B's weak handle (wh_B).  The weak handle for A (wh_A) is not yet processed.
>> T1 starts to process `TsanOopMapTable`, and it will try adding the new B to the table, with `wh_B.peek()` returing 0x1200. It will fail to add this entry, because the new B looks identical to the entry for A, with `wh_A.peek()` still returning 0x1200.
>> Then object B will be lost from the `TsanOopMapTable`.
>> 
>> I think a better approach is ensure the `TsanOopMapTable` is only processed once after all of the `OopStorageSet` is updated.
>> One option is the alternative approach of inserting  `TsanOopMap::notify_tsan_for_freed_and_moved_objects()` to individual GCs as mentioned in https://github.com/openjdk/tsan/pull/19.
>> Another possibility is to develop a mechanism to check whether `OopStorageSet` has been fully processed, before doing actual work to update `TsanOopMapTable`.
>
> Thanks for the analysis. The scenario describes a potential rare case where an `oop` is moved to an address location containing a freed `oop` and tsan is notified about the moved `oop` before the freed `oop`. I agree that's a potential bug.
> 
> Doing `TsanOopMap::notify_tsan_for_freed_and_moved_objects()` to individual GCs has higher risks of not covering all different cases if GC changes. I like you suggestion of implementing "a mechanism to check whether OopStorageSet has been fully processed". Implementing within `OopStorageSet` could be complicated. I think it might be doable in `WeakProcessor::Task::work` to check if all tasks are done. I'm trying that approach.

See updated PR for changing to only call TsanOopMap::notify_tsan_for_freed_and_moved_objects() in the last worker that completes WeakProcessor::Task work.

-------------

PR Review Comment: https://git.openjdk.org/tsan/pull/23#discussion_r1720533267