RFR: 8298992: runtime/NMT/SummarySanityCheck.java failed with "Total commi…tted (MMMMMM) did not match the summarized committed (NNNNNN)

Wed Aug 23 15:29:22 UTC 2023

On Wed, 16 Aug 2023 12:18:35 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

> During exhaustive tests, it is observed that during taking snapshot of NMT metrics it is possible that new allocations happen concurrently, although a `ThreadCritical` is used during copying current metrics to the snapshot.
> A loop is surrounding the copying and checks whether the copied and original are the same.

The comment in `void copy_to(MallocMemorySnapshot* s)` says:

    // Need to make sure that mtChunks don't get deallocated while the
    // copy is going on, because their size is adjusted using this
    // buffer in make_adjustment().

but I don't see `ThreadCritical` used anywhere else, except `copy_to()` itself, so what does that lock synchronizes with exactly? Does it do anything at all right now?

I see atomic operations used to protect the individual memory counters, but I don't see anywhere where we protect the total size + size of the chunks. If anyone of them changes, then copying:

s->_all_mallocs = _all_mallocs;
loop (index) { s->_malloc[index] = _malloc[index]; }

might at the end not to agree on:

`s._all_mallocs.size() == s._malloc[nmt_type1] + s._malloc[nmt_type2] + ...`

Afshin proposed a simple fix, which even though not ideal, will get job done.

An ideal fix would make sure that any operation involving changing `_all_mallocs` and `_malloc[nmt_type]` are both synchronized, so that we can use that same lock (whatever it is) in `copy_to()`

Did I get this right?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15306#issuecomment-1690169040