RFR: 8298992: runtime/NMT/SummarySanityCheck.java failed with "Total commi…tted (MMMMMM) did not match the summarized committed (NNNNNN)
Afshin Zafari
azafari at openjdk.org
Wed Aug 23 09:10:11 UTC 2023
On Tue, 22 Aug 2023 19:36:50 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:
>> During exhaustive tests, it is observed that during taking snapshot of NMT metrics it is possible that new allocations happen concurrently, although a `ThreadCritical` is used during copying current metrics to the snapshot.
>> A loop is surrounding the copying and checks whether the copied and original are the same.
>
> src/hotspot/share/services/mallocTracker.hpp line 205:
>
>> 203: }
>> 204: } while(s->_all_mallocs.size() != total_size && ++loop_counter < loop_limit);
>> 205: assert(s->_all_mallocs.size() == total_size, "Total != sum of parts");
>
> Do we agree then that the assert on line 205 is not needed?
The issue here was that during copying malloc measures in the loop, some new allocations happen that change the copied items. This results in a mismatch of Total and the sum of items.
The `ThreadCritical` in the code was supposed to block other threads' allocations while copying. But it did not work as expected, since the `ThreadCritical` is used in a few _deallocations_ in the code.
Therefore the while loop is written here to make sure that the malloc items that copied are consistent, i.e. $Total = \sum_i item_i$.
After Gerard's comment, the while-loop is upper limited to some iterations (`loop_limit = 100`) rather than be an infinite loop.
So if after `loop_limit` no of loops, the items are still not consistent then it is better to raise it here rather than to let this mismatch propagates up to the reports.
It is expected that replacing `ThreadCritical` with mutex for NMT, will resolve the issue and no while-loop is needed anymore.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15306#discussion_r1302665375
More information about the hotspot-runtime-dev
mailing list