RFR: 8369622: GlobalChunkPoolMutex is recursively locked during error handling [v3]

Thu Oct 23 11:29:11 UTC 2025

On Tue, 21 Oct 2025 11:48:58 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Small things.
>
> src/hotspot/share/memory/arena.hpp line 46:
> 
>> 44:   bool _locked;
>> 45: public:
>> 46:   ChunkPoolLocker(LockStrategy ls = LockStrategy::Lock);
> 
> If the `LockStrategy` is defaulted to `Lock`, then all the instances of this lock used in `ChunkPool`'s cleaning functions (`return_to_pool`, `take_from_pool`, `prune` and `deallocate_chunk`) would try to lock this explicitly. So, when either of these called while NMT is reporting (acquired the lock), we have deadlock again.

This isn't the problem that we've seen though.  These shouldn't be called during error reporting explicitly like the NMT code.  The NMT code is reporting the error while holding the lock, thus needing the lock to be taken again.

> src/hotspot/share/nmt/nmtUsage.cpp line 67:
> 
>> 65:     }
>> 66:     ChunkPoolLocker cpl(ls);
>> 67:     ms = MallocMemorySummary::as_snapshot();
> 
> Preexisting: 
> The `MMS::as_snapshot()` just returns the pointer to the snapshot structure and does not update/access anything there. The life time of the `ChunkPoolLocker cpl` should be the whole body of the function.

I don't think this should change with this PR.  It could be that the lock is needed to gather the chunk pool information but the NMT reporting and subsequent adjustments should only be local to NMT and not lock the chunk pool.  I'll leave this to another CR to investigate further.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27869#discussion_r2454770767
PR Review Comment: https://git.openjdk.org/jdk/pull/27869#discussion_r2454782880