RFR: 8327152: NMT: use BitMap for committed memory regions in summary mode

Wed Mar 6 13:10:45 UTC 2024

On Tue, 5 Mar 2024 12:45:18 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

>> But we are currently discussing the rewrite of VMT using a VMATree. Why not wait for that? The proposed VMATree solution is better in both query speed and - unless you deal with insane levels of fragmentation - memory footprint. It is also less complex and more robust than either current code or this RFE.
>> 
>> 1 GB range, fully uncommitted, 4K pages, needs a 32K bitmap. 
>> 
>> - Marking it as uncommitted means we need to zero-init 32K memory, 4096 word writes. A solution based on tree, similar to what we have now, only needs a few to a few dozen word writes.
>> - Counting means population counts on 32K memory (4096 word loads). 
>> - Since we access at bit level, accesses need to be synchronized. This would cement the current need for a global lock for VMA registration - we would have navigated us into the "needs always a lock" corner and never get away from it.
>
>> But we are currently discussing the rewrite of VMT using a VMATree. Why not wait for that? The proposed VMATree solution is better in both query speed and - unless you deal with insane levels of fragmentation - memory footprint. It is also less complex and more robust than either current code or this RFE.
> 
> This RFE is made for some improvement _besides_ the VMATree, and not replacing/ignoring it. For sure VMATree will improve the speed and together with this we hope/expect to gain more improvement. And as a reminder, this is only for Summary mode since the stack traces are not stored or used. 
> 
>> * Marking it as uncommitted means we need to zero-init 32K memory, 4096 word writes. A solution based on tree, similar to what we have now, only needs a few to a few dozen word writes.
> 
> We only do it at initialization, and not in every commit/uncommit.
> 
> 
>> * Counting means population counts on 32K memory (4096 word loads).
> 
> At commit/uncommit, we only count the 1 bits within a _sub-range_ of the whole bitmap. Only at `committed_size()` we read this number of words.
> 
> 
>> * Since we access at bit level, accesses need to be synchronized. This would cement the current need for a global lock for VMA registration - we would have navigated us into the "needs always a lock" corner and never get away from it.
> 
> Current code, and maybe future VMA, are already within a critical section (using `ThreadCritical`) at the `MemTracker::record_xxxx` calls. No new lock/sync is needed. 
> 
> VMATree can replace the `SortedLinkList` only for `ReservedMemoryRegions`. We can still use the bitmap for handling `CommittedMemoryRegions` and avoid caring about the overlapping sub-regions.

Hi @afshin-zafari,

maybe part of the confusion is that with our current plans (see https://gist.github.com/tstuefe/d9682b7f11b3375da27faa100f45e621), VMATree would handle both reserved and committed region handling. If implemented as described, we won't need a separate solution for tracking commit state.

Further remarks inline.

> 
> > * Marking it as uncommitted means we need to zero-init 32K memory, 4096 word writes. A solution based on tree, similar to what we have now, only needs a few to a few dozen word writes.
> 
> We only do it at initialization, and not in every commit/uncommit.

Even initialization-only would be bad, but how exactly would this work?
- reserve(a, b)
- commit(a, b)
- uncommit(a, b)
- commit(a, b)
in order to track commit state, you'd need to change bitmap state for each operation, across the whole range [a, b)

> 
> > * Counting means population counts on 32K memory (4096 word loads).
> 
> At commit/uncommit, we only count the 1 bits within a _sub-range_ of the whole bitmap. Only at `committed_size()` we read this number of words.

If you run with -Xmx1g -Xms1g, you now commit 1GB. On the OS level, this is pretty much a noop, since no memory is reserved yet apart from page table entries. But you now need to mark 1GB worth of bitmap (32 K).

> 
> > * Since we access at bit level, accesses need to be synchronized. This would cement the current need for a global lock for VMA registration - we would have navigated us into the "needs always a lock" corner and never get away from it.
> 
> Current code, and maybe future VMA, are already within a critical section (using `ThreadCritical`) at the `MemTracker::record_xxxx` calls. No new lock/sync is needed.

Yes, I am aware of that. That is why I wrote: "_This would cement the current need for a global lock_" . Adding a second reason for locking to the first, which is modification of a global list. Adding more obstacles in case we ever want to make this whole thing lock free.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18090#issuecomment-1980839641