RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
Thomas Stuefe
stuefe at openjdk.org
Mon Mar 24 17:03:21 UTC 2025
On Mon, 24 Mar 2025 16:16:34 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
>> ### Summary:
>> This PR makes memory operations atomic with NMT accounting.
>>
>> ### The problem:
>> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races.
>>
>> 1.1 Thread_1 releases range_A.
>> 1.2 Thread_1 tells NMT "range_A has been released".
>>
>> 2.1 Thread_2 reserves (the now free) range_A.
>> 2.2 Thread_2 tells NMT "range_A is reserved".
>>
>> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS.
>>
>> ### Solution:
>> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself.
>>
>> ### Other notes:
>> I also simplified this pattern found in many places:
>>
>> if (MemTracker::enabled()) {
>> MemTracker::NmtVirtualMemoryLocker nvml;
>> result = pd_some_operation(addr, bytes);
>> if (result != nullptr) {
>> MemTracker::record_some_operation(addr, bytes);
>> }
>> } else {
>> result = pd_unmap_memory(addr, bytes);
>> }
>> ```
>> To:
>>
>> MemTracker::NmtVirtualMemoryLocker nvml;
>> result = pd_unmap_memory(addr, bytes);
>> MemTracker::record_some_operation(addr, bytes);
>> ```
>> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`.
>>
>> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific i...
>
> I don't see why we need to extend the lock to be held over the reserve/commit/alloc part.
>
> If we only extend the lock on the release side, then it looks like we get the required exclusion:
>
> lock
> 1.1 Thread_1 releases range_A.
> 1.2 Thread_1 tells NMT "range_A has been released".
> unlock
>
> 2.1 Thread_2 reserves (the now free) range_A.
> lock
> 2.2 Thread_2 tells NMT "range_A is reserved".
> unlock
>
> We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) (1.2).
>
> And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.
@stefank
> And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations.
What about the case where one thread reserves a range and another thread releases it?
1 Thread A reserves range
2 Thread B releases range
3 Thread B tells NMT "range released"
4 Thread A tells NMT "range reserved"
This would either result in an assert in NMT at step 3 when releasing a range NMT does not know. Or in an incorrectly booked range in step 4 without asserts. Am I making a thinking error somewhere?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748815516
More information about the hotspot-dev
mailing list