RFR: 8304824: NMT should not use ThreadCritical [v3]

Sat Sep 14 07:01:04 UTC 2024

On Sat, 14 Sep 2024 02:35:37 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>>> > After switching to a Hotspot Mutex, it looks like the `windows-x64 / test (hs/tier1 common) GHA` is failing because the test `release_bad_ranges` in test_os.cpp is expecting an assertion and an error message to be printed. However, when this printing happens, `tty_lock` gets acquired out of rank order with the already held `NMT_lock`, causing the lock rank assertion fail. One solution would be to lower the rank of `tty_lock`. I'm not sure that's the best solution because that might cause rank conflicts with other locks (and it makes sense to give locks the highest possible rank to minimize future conflicts).
>>> 
>>> What code exactly locks tty_lock?
>> 
>> This is weird. We print the error in this case via print_error_for_unit_test(), which takes care only to use raw fprintf(stderr) for this very reason... therefore I am interested in understanding the lock semantics, to know if deadlock detection found a real problem or if this is just a weird error setup.
>
> Hi @tstuefe, it looks like calling [`os::print_memory_mappings`](https://github.com/openjdk/jdk/blob/master/src/hotspot/os/windows/os_windows.cpp#L3785) from the windows implementation of `os::pd_release_memory` is causing the rank conflict between `tty_lock` and `NMT_lock`.  This is getting called from `os::release_memory` [**after** we've already acquired the lock for NMT](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/os.cpp#L2185-L2189).
> 
> gtest `release_bad_ranges` --> `os::release_memory` and acquire `NMT_lock` --> `os::pd_release_memory` --> `os::print_memory_mappings` and acquire `tty_lock`
> 
> Is there a reason we cast a wider net and lock from outside of NMT code in `os::release_memory`, `os::uncommit_memory`,  and `os::unmap_memory`?  By contrast, in other functions such as `os::commit_memory`  and `os::reserve_memory` we lock inside of `MemTracker` instead. 
> For example, `pd_release_memory` is protected by the NMT lock, but `pd_reserve_memory` is not. Is the lock meant to synchronize the actual memory operations as well as NMT recording in some cases? If not, then maybe it makes sense to  delay and move the lock acquisition inside `MemTracker` for consistency and to reduce surface area for lock conflicts.

Hi @roberttoyonaga 

(Pinging @afshin-zafari and @jdksjolen)

First off, it is good that you ran into this since it highlights a possible real deadlock. Both locks are quite coarse. I cannot think from the top of my head of a scenario where we hold the tty lock and need NMT locking, but would not bet money on it. E.g. I bet we call malloc under ttylock (if only as a side effect of calling functions that return resource area). Malloc does not need locking; but its brittle and small changes can cause deadlocks (e.g. folks were thinking about moving ResourceArea memory to mmap, which needs NMT locks for tracking).

> Hi @tstuefe, it looks like calling [`os::print_memory_mappings`](https://github.com/openjdk/jdk/blob/master/src/hotspot/os/windows/os_windows.cpp#L3785) from the windows implementation of `os::pd_release_memory` is causing the rank conflict between `tty_lock` and `NMT_lock`. This is getting called from `os::release_memory` [**after** we've already acquired the lock for NMT](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/os.cpp#L2185-L2189).
> 
> gtest `release_bad_ranges` --> `os::release_memory` and acquire `NMT_lock` --> `os::pd_release_memory` --> `os::print_memory_mappings` and acquire `tty_lock`
> 

Ah thank you. I think I added that printing ages ago. I should not have used tty. The immediate solution to get you going may be to just change this from tty to `fileStream fs(stderr);`

> Is there a reason we cast a wider net and lock from outside of NMT code in `os::release_memory`, `os::uncommit_memory`, and `os::unmap_memory`? By contrast, in other functions such as `os::commit_memory` and `os::reserve_memory` we lock inside of `MemTracker` instead. For example, `pd_release_memory` is protected by the NMT lock, but `pd_reserve_memory` is not. Is the lock meant to synchronize the actual memory operations as well as NMT recording in some cases? If not, then maybe it makes sense to delay and move the lock acquisition inside `MemTracker` for consistency and to reduce surface area for lock conflicts.

The problem is that both tty_lock and the NMT lock are quite coarse. I always maintained that tty_lock is too coarse - we have many places where there is a ttyLocker somewhere at the start of a printing function, then the function goes and prints tons of infos it *retrieves under lock protection*. The better way to solve this is to assemble the text outside of lock protection, then only do the printing under lock protection. We do this In UL: we have a LogMessage class, which we use for multi-line output that must not tear.

The NMT lock is coarse too, but that is by design. It has two purposes:
1) Guarding internal global data structures, e.g. the MallocSiteTable or the list of virtual memory regions
2) Ensuring that we feed NMT the tracking information *in the order they happened*. This is only relevant for mmap accounting, since only there do we need to track address ranges. For malloc accounting, we just increase/decrease counters, and there, the order of operations does not matter, just that we do it atomically.

(2) is the crux here. If it were only for (1), we could keep the synchronization blocks really small, e.g. just guard modification of the MallocSiteTable. 

A pragmatic solution would be to move the locking down the stack. Currently, we lock in the generic layer around { pd_release_memory() + MemTracker::record }. But we only need to lock around the *OS allocation function* (mmap/munmap/VirtualAlloc etc). So we could move the locks downward to the OS specific layer where the actual allocations happen in the OS code, e.g. just lock around  { munmap() + MemTracker::record }.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2350882050