RFR: 8277486: NMT: Cleanup ThreadStackTracker

Zhengyu Gu zgu at openjdk.java.net
Mon Nov 29 16:08:06 UTC 2021


On Mon, 22 Nov 2021 15:36:41 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

> There are several issues with `ThreadStackTracker`.
> 
> 1) Following assertion:
>   `assert(MemTracker::tracking_level() >= NMT_summary, "Must be");`
> in `ThreadStackTracker::record/release_thread_stack()` is unreliable. The full fence after downgrading tracking level is *not* sufficient to avoid the racy.
> 
> 2) NMT tracking level is downgraded without `ThreadCritical` lock held. But, it does require `ThreadCritical` lock to be held when it actually downgrade internal tracking data structure, so checking internal state is reliable to determine current tracking state.
> Add assertion to ensure correct tracking state
> 
> 3) `_thread_counter` is updated with `ThreadCritical` lock, but is read without the lock. Change to atomic update to ensure reader will not read stale value.
> 
> 4) NMT events are relative rare. So far, I have yet seen (1) assertion failure but add new test may help to spot such problem.

> Hi Zhengyu,
> 
> Could we not simply remove the shutdown logic from NMT altogether? I think its not needed and makes things more complex than necessary.
> 
> We use shutdown in two cases:
> 
> 1. When user commands us to, with jcmd. As I argued before, I think this has no real use in practice. Why would you do that? In fact, I use NMT a lot, for many years, and have never used it.
> 2. when NMT shuts down as a reaction to either MallocSiteTable allocation failure, or MST overflow. Both are very rare, almost impossibly so (see below). In both cases I argue shutdown does not help, makes things complicated, and actually hinders analyzing the OOM afterward:
>    2.1) Failing to malloc a node for the MST: If that happens the C-heap is exhausted. Shutting down NMT will not affect the outcome. NMT is very frugal. So we could just ignore the failing allocation and go on with our life: either C-heap recovers (highly unlikely) or not. If it recovers, no harm done, NMT continues working. If it does not recover, the VM will get a native OOM in the immediate future. But in the OOM case, NMT report is very useful. I don't want NMT to shut down and delete the MST. I want to be able to see all my malloc sites.
>    2.2) MST overflow: for that to happen the bucket length of _one individual bucket chain_ has to reach its limit. After https://bugs.openjdk.java.net/browse/JDK-8275320, that limit is 64k nodes for both 32-but and 64-bit platforms. This plain cannot happen. But even if it were to happen, we would have a problem since MST lookup would become extremely slow because of O(n) traversal on every malloc/free.
> 
> Note how rare Point (2) is: It depends on MST fill size, which depends on NMT stack depth. With the default stack depth of 4, atm we never even reach ~1000 individual call sites. I once did an experiment with stack depth=16 and reached about 10000 call sites. For (2.1), the chance of one of the few MST node mallocs to run into OOM is astronomically low. Especially since if MST is that full, we will see tons of user mallocs, and one of those will certainly hit OOM first. And (2.2) cannot happen at all: even with an MST size of 1, we could accommodate 65k entries, a lot more than even the 10k I reached with stack depth = 16. But of course, MST size is not 1. In fact, I think we should even assert against MST overflow, not handle it gracefully.
> 
> So, let us remove shutdown. It would simplify NMT quite a bit, remove quite a bit of code, testing too. And the tracking level would be constant throughout VM life, making concurrency much simpler.
> 
> Cheers, Thomas

Hi Thomas,

I am with you, I don't see it is very useful. I believe it is a remnant of old implementation, that NMT could consume excessive amount of memory and needed to shutdown itself to ensure health of JVM.

How we proceed to remove the command? need CSR?

Thanks,

-Zhengyu

-------------

PR: https://git.openjdk.java.net/jdk/pull/6504


More information about the hotspot-runtime-dev mailing list