RFR: 8277486: NMT: Cleanup ThreadStackTracker
Thomas Stuefe
stuefe at openjdk.java.net
Wed Nov 24 08:54:14 UTC 2021
On Mon, 22 Nov 2021 15:36:41 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:
> There are several issues with `ThreadStackTracker`.
>
> 1) Following assertion:
> `assert(MemTracker::tracking_level() >= NMT_summary, "Must be");`
> in `ThreadStackTracker::record/release_thread_stack()` is unreliable. The full fence after downgrading tracking level is *not* sufficient to avoid the racy.
>
> 2) NMT tracking level is downgraded without `ThreadCritical` lock held. But, it does require `ThreadCritical` lock to be held when it actually downgrade internal tracking data structure, so checking internal state is reliable to determine current tracking state.
> Add assertion to ensure correct tracking state
>
> 3) `_thread_counter` is updated with `ThreadCritical` lock, but is read without the lock. Change to atomic update to ensure reader will not read stale value.
>
> 4) NMT events are relative rare. So far, I have yet seen (1) assertion failure but add new test may help to spot such problem.
Changes requested by stuefe (Reviewer).
src/hotspot/share/services/memTracker.hpp line 270:
> 268: if (tracking_level() < NMT_summary) return;
> 269: if (addr != NULL) {
> 270: ThreadCritical tc;
Would this not require, to prevent the assert in (new|delete)_thread_stack, to repeat the tracking level check inside the ThreadCritical scope? Since the tracking level could have changed between lines 268 and 270?
src/hotspot/share/services/threadStackTracker.cpp line 103:
> 101: assert(MemTracker::tracking_level() < NMT_summary, "Must be");
> 102: return false;
> 103: }
I looked at `MemTracker::transition_to`, the downgrade path:
} else if (current_level > level) {
// Downgrade tracking level, we want to lower the tracking level first
_tracking_level = level;
// Make _tracking_level visible immediately.
OrderAccess::fence();
VirtualMemoryTracker::transition(current_level, level);
MallocTracker::transition(current_level, level);
ThreadStackTracker::transition(current_level, level);
Is the comment about "immediately" correct? I would have assumed OrderAccess::fence() has not have anything to do with timing, just with the order the writes are perceptible from outside?
So, concurrent threads may observe tracking level == minimal but the pointers to the infrastructure are still there?
I see that you do not guard `MemTracker::transition_to` with ThreadCritical. You do guard teardown inside `ThreadStackTracker::transition` and `VirtualMemoryTracker::transition` but it looks like teardown in `MallocTracker::transition` is unguarded by ThreadCritical at least.
Just a proposal, would guarding upstairs in `MemTracker::transition_to` not be clearer and safer, especially if it includes changing the tracking level?
Alternatively, do we even have to delete all those structures on shutdown? They are not that costly. We shut down either manually - nobody cares then - or if we encounter native OOM. In the latter case, those detail structures are arguably important since they contain data that may help analyze the OOM. The malloc site table, for instance.
src/hotspot/share/services/threadStackTracker.hpp line 82:
> 80: static void delete_thread_stack(void* base, size_t size);
> 81:
> 82: static bool track_as_vm() { return AIX_ONLY(false) NOT_AIX(true); }
Unrelated to your patch: I wonder whether this could be controlled as a diagnostic switch, to be able to test both implementations on normal platforms.
src/hotspot/share/services/virtualMemoryTracker.cpp line 673:
> 671: return true;
> 672: } else {
> 673: assert(MemTracker::tracking_level() < NMT_summary, "Must be");
Is this safe? We don't guard downgrading the tracking level with ThreadCritical.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6504
More information about the hotspot-runtime-dev
mailing list