RFR: Reimplement TsanOopMap support using WeakHandle and ResizeableResourceHashtable

Man Cao manc at openjdk.org
Tue Jul 23 01:56:57 UTC 2024


On Tue, 16 Jul 2024 00:37:09 GMT, Jiangli Zhou <jiangli at openjdk.org> wrote:

> The new implementation for TsanOopMap support in JDK 21 is partially modeled after [JvmtiTagMap](https://github.com/openjdk/jdk21u-dev/blob/cdbd94f8fa97847c02e1f0f4c6cd75f457a62e25/src/hotspot/share/prims/jvmtiTagMap.hpp#L37) and [JvmtiTagMapTable](https://github.com/openjdk/jdk21u-dev/blob/cdbd94f8fa97847c02e1f0f4c6cd75f457a62e25/src/hotspot/share/prims/jvmtiTagMapTable.hpp#L43):
> 
> - Use [WeakHandle](https://github.com/openjdk/jdk21u-dev/blob/cdbd94f8fa97847c02e1f0f4c6cd75f457a62e25/src/hotspot/share/oops/weakHandle.hpp#L42) for handling oops in the table entries;
> - Use ResizeableResourceHashtable to for the underlying hash table. Table growing/resizing and removal of freed object entries are mostly handled within ResizeableResourceHashtable, except some small parts are done within TsanOopMapTable. That makes the new implementation simpler.
> 
> The TsanOopMapTable contains entries consisting of TsanOopMapTableKey:oop_size (key:value) pairs. The TsanOopMapTableKey contains:
> 
>  - A Weakhandle that holds a pointer to the oop;
>  - The original (or updated) address of the object in Java heap;
> 
> For TsanOopMapTable support, I added a new weak OopStorage, which is created during tsan_init(). All WeakHandles used for tracking Tsan interested Java objects are created from that OopStorage. GC processes these WeakHandles in concurrent WeakProcessor worker threads.
> 
> To notify Tsan about object free/move in time, I added a call in WeakProcessor::Task::work to process the TsanOopMap:
> 
> - If an object is freed by GC, call __tsan_java_free to notify Tsan. Remove the entry from the table.
> - If an object is moved by GC, update the raw address in the entry using the new oop address obtained from the WeakHandle and add the “move” to a GrowableArray.
> - After all entries in TsanOopMap are processed, sort the “moves” in the GrowableArray and notify Tsan by calling __tsan_java_move for each object in the sorted array. This part is the existing implementation from JDK 11 for Tsan support.
> 
> Note all above operations occur during GC, before any of the mutators sees a moved or freed Java object.
> 
> As an alternative, I initially experimented with doing the above operations concurrently (to mutators) in ServiceThread after GC finished processing the WeakHandles. That could occur after mutators sees moved objects and resulted incorrect data being recorded in Tsan.
> 
> Suppressed following Tsan errors in JDK:
> 1)  In `java.util.concurrent.locks`, for
> 
> WARNING: ThreadSanitizer: data race (pid=787116)...

Thanks a lot for this heavy-lifting work!
I have not fully finished the review, mainly not yet for details in TsanOopMapTableKey. But here are some feedback.

In description: 
> For TsanOopMapTable support, I added a new weak OopStorage, which is created during tsan_init(). All WeakHandles used for tracking Tsan interested Java objects are created from that OopStorage. GC processes these WeakHandles in concurrent WeakProcessor worker threads.

Last sentence seems incorrect. These WeakHandles are currently processed by a subset of parallel GC worker threads. We probably should change it to be processed by the VM thread sequentially. See comment in weakProcessor.inline.hpp.

src/hotspot/share/gc/shared/weakProcessor.inline.hpp line 100:

> 98: 
> 99:   TSAN_RUNTIME_ONLY(
> 100:     TsanOopMap::notify_tsan_for_freed_and_moved_objects();

The `WeakProcessor::Task::work()` function is called by a subset of GC worker thread. This function will be called multiple times in parallel, but each thread runs the same code and processes the same data, so it is redundant work. It should probably be called only once, by the VM thread.

We can invoke `notify_tsan_for_freed_and_moved_objects()` from the VM thread by adding it to the two versions of `WeakProcessor::weak_oops_do` below:
- After line 142 `task.report_num_dead()`, in this file (weakProcessor.inline.hpp).
- In weakProcessor.cpp, between inside `WeakProcessor::weak_oops_do(BoolObjectClosure* is_alive, OopClosure* keep_alive)`.

We have another internal patch that adds a callback during a GC pause, at those two exact places. It is very similar to what TSAN needs to do.

src/hotspot/share/gc/shared/weakProcessor.inline.hpp line 158:

> 156:   WeakProcessorTimes times(nworkers);
> 157:   weak_oops_do(workers, is_alive, keep_alive, &times);
> 158: 

Nit: unnecessary newline.

src/hotspot/share/tsan/tsanOopMap.cpp line 232:

> 230:         if (can_move) {
> 231:           // Notify TSan, update occupied region.
> 232:           log_trace(tsan)("__tsan_java_move for [" PTR_FORMAT ", " PTR_FORMAT  "] -> [" PTR_FORMAT ", " PTR_FORMAT "]\n",

How about using `log_trace` only for this line, the line in `notify_tsan_for_freed_and_moved_objects` and the line above `__tsan_java_alloc`, and using `log_debug` for all other logging lines?
I suppose these 3 lines print a lot more messages than other logging lines.

src/hotspot/share/tsan/tsanOopMap.cpp line 240:

> 238:       assert(to > mem_begin_ && to <= mem_end_, "end address outside range");
> 239:       BitMap::idx_t idx_to = to_idx(to);
> 240:       return bitmap_.find_first_set_bit(to_idx(from), idx_to) == idx_to;

I have not looked into it deeply, but could you summarize the difference between `find_first_set_bit` and JDK 11's `get_next_one_offset`? I thought `find_first_set_bit` does the same thing.

src/hotspot/share/tsan/tsanOopMap.cpp line 263:

> 261: // OopStorage so the number matches with the 'weak_count' in oopStorageSet.hpp.
> 262: void TsanOopMap::initialize_map() {
> 263:   // No need to do register_num_dead_callback for concurrent work as we do

Perhaps we can document this more explicitly, perhaps in tsanOopMap.hpp, that currently TsanOopMap/OopStorage must be processed during a stop-the-world GC pause. This is because we need to update TSAN's metadata for all moved or freed objects during GC, before the mutator reads or writes to these objects. Thus, unlikely other weak OopStorage, TsanOopMap does not support concurrent processing, and does not need to call `OopStorage::register_num_dead_callback()`.

src/hotspot/share/tsan/tsanOopMap.cpp line 268:

> 266:   // WeakProcessor.
> 267:   _weak_oop_storage = OopStorageSet::create_weak("Tsan weak OopStorage", mtInternal);
> 268:   assert(_weak_oop_storage != NULL, "sanity");

Let's use `nullptr` instead of `NULL` in these touched lines. (I'm also OK if we replace all `NULL` with `nullptr` in these touched TSAN files).

src/hotspot/share/tsan/tsanOopMap.cpp line 286:

> 284: // Called during GC by WeakProcessor.
> 285: void TsanOopMap::notify_tsan_for_freed_and_moved_objects() {
> 286:   if (_oop_map == nullptr) {

Could this be an `assert(_oop_map != nullptr)`?
Also, do we want to keep the old `assert(SafepointSynchronize::is_at_safepoint())`?

src/hotspot/share/tsan/tsanOopMap.cpp line 302:

> 300: 
> 301:   {
> 302:     MutexLocker mu(TsanOopMap_lock, Mutex::_no_safepoint_check_flag);

Do we really need this lock if it is called during a GC pause by only the VM thread? Currently it is called by multiple GC threads, so the lock might appear necessary.

Also see the comment in the old `TsanOopMap::weak_oops_do`. We should probably keep that comment.

src/hotspot/share/tsan/tsanOopMap.cpp line 316:

> 314:     max_high = MAX2(source_high, target_high);
> 315: 
> 316:     moves.sort((2 * n_downward_moves > moves.length()) ?

Sorting is only needed before calling `handle_overlapping_moves` right? Then it should be in the else branch.

src/hotspot/share/tsan/tsanOopMap.cpp line 450:

> 448:     }
> 449: 
> 450:     // Source and target ranges overlap, the moves need to be ordered to prevent

Can we retain this comment paragraph?

src/hotspot/share/tsan/tsanOopMap.hpp line 46:

> 44: }  // namespace TsanOopMapImpl
> 45: 
> 46: #include "tsan/tsanOopMapTable.hpp"

How about moving `PendingMove` declaration to tsanOopMapTable.hpp, so this `#include` statement could be immediately after `#define SHARE_TSAN_TSANOOPMAP_HPP`?

-------------

Changes requested by manc (Reviewer).

PR Review: https://git.openjdk.org/tsan/pull/19#pullrequestreview-2192687934
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687251963
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687231901
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687289253
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687286079
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687294251
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687278527
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687305054
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687304054
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687307604
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687310668
PR Review Comment: https://git.openjdk.org/tsan/pull/19#discussion_r1687274453


More information about the tsan-dev mailing list