RFR: 8327152: NMT: use BitMap for committed memory regions in summary mode
Thomas Stuefe
stuefe at openjdk.org
Wed Mar 6 13:10:45 UTC 2024
On Tue, 5 Mar 2024 12:45:18 GMT, Afshin Zafari <azafari at openjdk.org> wrote:
>> But we are currently discussing the rewrite of VMT using a VMATree. Why not wait for that? The proposed VMATree solution is better in both query speed and - unless you deal with insane levels of fragmentation - memory footprint. It is also less complex and more robust than either current code or this RFE.
>>
>> 1 GB range, fully uncommitted, 4K pages, needs a 32K bitmap.
>>
>> - Marking it as uncommitted means we need to zero-init 32K memory, 4096 word writes. A solution based on tree, similar to what we have now, only needs a few to a few dozen word writes.
>> - Counting means population counts on 32K memory (4096 word loads).
>> - Since we access at bit level, accesses need to be synchronized. This would cement the current need for a global lock for VMA registration - we would have navigated us into the "needs always a lock" corner and never get away from it.
>
>> But we are currently discussing the rewrite of VMT using a VMATree. Why not wait for that? The proposed VMATree solution is better in both query speed and - unless you deal with insane levels of fragmentation - memory footprint. It is also less complex and more robust than either current code or this RFE.
>
> This RFE is made for some improvement _besides_ the VMATree, and not replacing/ignoring it. For sure VMATree will improve the speed and together with this we hope/expect to gain more improvement. And as a reminder, this is only for Summary mode since the stack traces are not stored or used.
>
>> * Marking it as uncommitted means we need to zero-init 32K memory, 4096 word writes. A solution based on tree, similar to what we have now, only needs a few to a few dozen word writes.
>
> We only do it at initialization, and not in every commit/uncommit.
>
>
>> * Counting means population counts on 32K memory (4096 word loads).
>
> At commit/uncommit, we only count the 1 bits within a _sub-range_ of the whole bitmap. Only at `committed_size()` we read this number of words.
>
>
>> * Since we access at bit level, accesses need to be synchronized. This would cement the current need for a global lock for VMA registration - we would have navigated us into the "needs always a lock" corner and never get away from it.
>
> Current code, and maybe future VMA, are already within a critical section (using `ThreadCritical`) at the `MemTracker::record_xxxx` calls. No new lock/sync is needed.
>
> VMATree can replace the `SortedLinkList` only for `ReservedMemoryRegions`. We can still use the bitmap for handling `CommittedMemoryRegions` and avoid caring about the overlapping sub-regions.
Hi @afshin-zafari,
maybe part of the confusion is that with our current plans (see https://gist.github.com/tstuefe/d9682b7f11b3375da27faa100f45e621), VMATree would handle both reserved and committed region handling. If implemented as described, we won't need a separate solution for tracking commit state.
Further remarks inline.
>
> > * Marking it as uncommitted means we need to zero-init 32K memory, 4096 word writes. A solution based on tree, similar to what we have now, only needs a few to a few dozen word writes.
>
> We only do it at initialization, and not in every commit/uncommit.
Even initialization-only would be bad, but how exactly would this work?
- reserve(a, b)
- commit(a, b)
- uncommit(a, b)
- commit(a, b)
in order to track commit state, you'd need to change bitmap state for each operation, across the whole range [a, b)
>
> > * Counting means population counts on 32K memory (4096 word loads).
>
> At commit/uncommit, we only count the 1 bits within a _sub-range_ of the whole bitmap. Only at `committed_size()` we read this number of words.
If you run with -Xmx1g -Xms1g, you now commit 1GB. On the OS level, this is pretty much a noop, since no memory is reserved yet apart from page table entries. But you now need to mark 1GB worth of bitmap (32 K).
>
> > * Since we access at bit level, accesses need to be synchronized. This would cement the current need for a global lock for VMA registration - we would have navigated us into the "needs always a lock" corner and never get away from it.
>
> Current code, and maybe future VMA, are already within a critical section (using `ThreadCritical`) at the `MemTracker::record_xxxx` calls. No new lock/sync is needed.
Yes, I am aware of that. That is why I wrote: "_This would cement the current need for a global lock_" . Adding a second reason for locking to the first, which is modification of a global list. Adding more obstacles in case we ever want to make this whole thing lock free.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18090#issuecomment-1980839641
More information about the hotspot-runtime-dev
mailing list