RFR: 8304442: Defer VirtualMemoryTracker work until reporting

Sun Mar 19 13:51:18 UTC 2023

On Sat, 18 Mar 2023 14:50:48 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

> Hi,
> 
> The virtual memory tracker of NMT used to do a lot of heavy linked list calculations when memory was reserved, committed, uncommited or split up. However, the results of these calculations are actually only used when creating a native memory report. Let's not slow down the JVM unnecessarily, we can do this work at time of report instead.
> 
> In order to achieve this I've replaced the public API with a work queue:ing solution. We append each work item to a `GrowableArray` and introduce the `commit_events` method to do the actual work, which we call internally when needed.
> 
> I measured the gains in efficiency through the use of Valgrind's Cachegrind tool. I ran a `linux-x64` build with the following source code:
> 
> 
> public class Test {
>     public static void main(String[] args) {
>     }
> }
> 
> 
> These are the total cycles executed by `os::commit` and `os::reserve` as estimated by Valgrind over the entire run of the program. The tests were only run once.
> 
> 
> java -XX:NativeMemoryTracking=detail Test.java
> 
> os::commit_memory
> old         | new         | old / new
> 935238      | 578979      | 1.6
> os::reserve_memory
> old         | new         | new / old
> 53628       | 21825       | 2.4
> 
> java -XX:NativeMemoryTracking=summary Test.java
> 
> os::commit_memory
> old     | new   | old/new
> 1033701 | 59974 | 17.2
> 
> os::reserve_memory
> old   | new  | old/new
> 10067 | 2016 | 5
> 
> 
> 
> In summary mode we get the largest performance gains as `NativeCallStack` is missing.
> 
> There should also be some memory savings as a `MemoryEvent` is smaller (64 bytes) than a `ReservedRegion` (96 bytes). That is, until a `commit_events()` occur.

Hi Thomas, thanks for the in-depth review!

Alright, if it's a hard requirement that *reporting* must be fast and not allocate memory, then this particular solution is a deal breaker.

Regarding your ideas:

1. `commit_memory` is often called by metaspace, but `pd_create_stackguard_pages` is a close 2nd.
2. I actually think that we can get most of the gains from this PR by just allocating the linked list in an arena instead (`FixedItemArray` :)?). A large amount of the runtime is spent on doing these `NativeCallStack`s during node allocation in detailed mode, and malloc:ing is potentially (I'd measure it) costly in summary mode.

The rest of this is commenting on the rest of your points.

>The usefulness question: Either mapping management in NMT is hot, or it isn't. If it isn't, there is no point in optimizing it.

If we have 100 things in the JVM that are up to 17x slower than they need be and their computations are spread over an entire process duration, then how will you figure that any of them are hot? It's hard to discern what the total performance cost is for these things.

>If it is hot, e.g., because you call os::commit a million times (?), a queue may not work as well as you think. You now accumulate an ever-growing footprint for the queue. So you need to dump the queue at some point. If you do, you lose the advantage of deferring. If you don't dump, you now have a memory leak essentially, and reporting will take a lot longer.

It's not any worse than the memory leak that would exist from keeping the linked list alive. But yes, it could be useful to dump the queue when it grows too large.

-------------

PR: https://git.openjdk.org/jdk/pull/13088