RFR: 8304824: NMT should not use ThreadCritical [v3]

Robert Toyonaga duke at openjdk.org
Wed Oct 30 13:42:19 UTC 2024


On Wed, 30 Oct 2024 12:32:17 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>>> Hi @tstuefe,
>>> 
>>> > Ah thank you. I think I added that printing ages ago. I should not have used tty. The immediate solution to get you going may be to just change this from tty to fileStream fs(stderr);
>>> 
>>> Thanks! I adopted your suggestion and replaced tty. The lock conflict is gone now.
>>> 
>>> > Ensuring that we feed NMT the tracking information in the order they happened. This is only relevant for mmap accounting, since only there do we need to track address ranges. For malloc accounting, we just increase/decrease counters, and there, the order of operations does not matter, just that we do it atomically.
>>> > ...
>>> > A pragmatic solution would be to move the locking down the stack. Currently, we lock in the generic layer around { pd_release_memory() + MemTracker::record }. But we only need to lock around the OS allocation function (mmap/munmap/VirtualAlloc etc). So we could move the locks downward to the OS specific layer where the actual allocations happen in the OS code, e.g. just lock around { munmap() + MemTracker::record }.
>>> 
>>> Ok I see. So the locking should indeed not happen inside of `MemTracker`. Yes, maybe moving the locking down the stack into platform specific code would be a good task for the future. However, if the intention is to feed NMT the tracking info in the order it happened, why are the operations on memory in `os::commit_memory` and `os::reserve_memory` not protected by the lock? In those, we delay and lock inside `MemTracker` instead.
>> 
>> That looks like an error that should be fixed.
>
> Hi @tstuefe, @roberttoyonaga 
> 
> We saw some test failures in non-generational ZGC due to this change. That GC is removed now, so there's no worry there. I did however look at all of our remaining usages of `ThreadCritical` and saw that there are thrre left that are related to NMT:
> 
> 
> src/hotspot/share/nmt/nmtUsage.cpp
> 56     ThreadCritical tc;
> 
> src/hotspot/share/nmt/mallocTracker.cpp
> 65     // Use ThreadCritical to make sure that mtChunks don't get deallocated while the
> 68     ThreadCritical tc;
> 
> 
> src/hotspot/share/memory/arena.cpp
> 178      ThreadCritical tc;  // Free chunks under TC lock so that NMT adjustment is stable.
> 
> 
> I am not that familiar with this code. Is leaving these as they are intentional, or have we introduced a potential bug?
> 
> Thank you.

Hi @jdksjolen, I originally replaced those instances, but then [put them back in](https://github.com/openjdk/jdk/pull/20852/commits/88e075d1023bbfa240c5606c9c64ea720a3e83d8) because I saw that they are related to chunks allocation/deallocation code where it seemed `ThreadCritical` was used for other reasons.  

I'm not certain, but looking at it again, it seems that the `ThreadCritical` uses in `ChunkPool::deallocate_chunk` and `ChunkPool::prune()` are only needed for NMT and are independent of the other `ThreadCritical` uses in that code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2447196115


More information about the serviceability-dev mailing list