RFR: 8298992: runtime/NMT/SummarySanityCheck.java failed with "Total commi…tted (MMMMMM) did not match the summarized committed (NNNNNN)

Gerard Ziemski gziemski at openjdk.org
Tue Aug 22 15:20:17 UTC 2023


On Tue, 22 Aug 2023 12:52:13 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> During exhaustive tests, it is observed that during taking snapshot of NMT metrics it is possible that new allocations happen concurrently, although a `ThreadCritical` is used during copying current metrics to the snapshot.
>> A loop is surrounding the copying and checks whether the copied and original are the same.
>
> src/hotspot/share/services/mallocTracker.hpp line 205:
> 
>> 203:       }
>> 204:     } while(s->_all_mallocs.size() != total_size && ++loop_counter < loop_limit);
>> 205:     assert(s->_all_mallocs.size() == total_size, "Total != sum of parts");
> 
> Preexisting, but could you please move this to the cpp file? It should not live in the header.
> 
> Also I'm not sure the assert is really useful. Won't that be just a source for very rare intermittent errors? What would be the action we take if that assert fires, increase the loop count?

We copy `MemoryCounter` by value, not reference:

`s->_all_mallocs = _all_mallocs;`

where `_all_mallocs` is `MemoryCounter`, not  `*MemoryCounter`, so that assert can never trigger, right?

I mean, even if more memory gets allocated between when we exit the loop and assert, `_all_mallocs.size()` might change, but `s->_all_mallocs.size()` is frozen in time, isn't it?

So I agree that the assert is not helpful here, but it's because it will never trigger.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15306#discussion_r1301789089


More information about the hotspot-runtime-dev mailing list