RFR: 8304824: NMT should not use ThreadCritical [v9]

Johan Sjölen jsjolen at openjdk.org
Thu Oct 31 11:13:36 UTC 2024


On Thu, 31 Oct 2024 10:53:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I had to analyze this again, to understand why we need this locking, since my mind slips.
> 
> I updated my findings in https://bugs.openjdk.org/browse/JDK-8325890 . Please see details there.
> 
> But I don't think the current locking makes much sense, tbh (even before this patch). Or I don't get it.
> 
> We have three counters in play here: A) the `malloc[mtChunk]` malloc counter for the mtChunk tag B) the global malloc counter C) the `arena[xxx]` arena counter for the mtXXX tag the arena is tagged with
> 
> Upon arena growth, we incremement all four. In two steps - first (A) and (B) under os::malloc, then (C) in the arena code.
> 
> Upon arena death, we may delete the chunk, in which case we adjust all three counters as well.
> 
> But since we don't want (B) to include arena sizes, we have several places where we lazily adjust (B) and (A) from the sum of all arena counters (see JBS issue). These adjustments must be synchronized with arena allocations. But we don't do this. The lock does not include (C). For that, the lock would have to be at a different place, inside arena code. We only lock around (A) and (B).
> 
> I am not sure what the point of this whole thing is. I am also not convinced that it always works correctly. This whole "updating counters lazily" business makes the code brittle.
> 
> Also, the current locking does not guarantee correctness anyway. There are several places in the coding where we calculate e.g. the sum of all mallocs, but malloc counters are modified (rightly so) out of lock protection.
> 
> To make matters even more complex, we have not two locks here but three:
> 
>     * ThreadCritical
> 
>     * the new `NmtVirtualMemory_lock`
> 
>     * we also have the `NMTQuery_lock`, which guards NMT reports from jcmd exclusively
> 
> 
> Not sure what the point of the latter even is. It certainly does not prevent us from getting reports while underlying data structures are being changed, unless I am overlooking something.
> 
> It would be very nice if someone other than me could analyze this.

If this is so brittle and complex, then I wonder what you even get out of us doing the `ThreadCritical` trick. In other words, if we just removed it, would anyone notice and be able to discern what's occurred? Open question, I might do that change and run the tests on it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2449601108


More information about the serviceability-dev mailing list