RFR: 8304442: Defer VirtualMemoryTracker work until reporting

Thomas Stuefe stuefe at openjdk.org
Tue Mar 21 16:14:47 UTC 2023


On Sat, 18 Mar 2023 14:50:48 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

> Hi,
> 
> The virtual memory tracker of NMT used to do a lot of heavy linked list calculations when memory was reserved, committed, uncommited or split up. However, the results of these calculations are actually only used when creating a native memory report. Let's not slow down the JVM unnecessarily, we can do this work at time of report instead.
> 
> In order to achieve this I've replaced the public API with a work queue:ing solution. We append each work item to a `GrowableArray` and introduce the `commit_events` method to do the actual work, which we call internally when needed.
> 
> I measured the gains in efficiency through the use of Valgrind's Cachegrind tool. I ran a `linux-x64` build with the following source code:
> 
> 
> public class Test {
>     public static void main(String[] args) {
>     }
> }
> 
> 
> These are the total cycles executed by `os::commit` and `os::reserve` as estimated by Valgrind over the entire run of the program. The tests were only run once.
> 
> 
> java -XX:NativeMemoryTracking=detail Test.java
> 
> os::commit_memory
> old         | new         | old / new
> 935238      | 578979      | 1.6
> os::reserve_memory
> old         | new         | new / old
> 53628       | 21825       | 2.4
> 
> java -XX:NativeMemoryTracking=summary Test.java
> 
> os::commit_memory
> old     | new   | old/new
> 1033701 | 59974 | 17.2
> 
> os::reserve_memory
> old   | new  | old/new
> 10067 | 2016 | 5
> 
> 
> 
> In summary mode we get the largest performance gains as `NativeCallStack` is missing.
> 
> There should also be some memory savings as a `MemoryEvent` is smaller (64 bytes) than a `ReservedRegion` (96 bytes). That is, until a `commit_events()` occur.

> Hi Thomas, thanks for the in-depth review!
> 
> Alright, if it's a hard requirement that _reporting_ must be fast and not allocate memory, then this particular solution is a deal breaker.
> 
> Regarding your ideas:
> 
>     1. `commit_memory` is often called by metaspace, but `pd_create_stackguard_pages` is a close 2nd.
> 
>     2. I actually think that we can get most of the gains from this PR by just allocating the linked list in an arena instead (`FixedItemArray` :)?). A large amount of the runtime is spent on doing these `NativeCallStack`s during node allocation in detailed mode, and malloc:ing is potentially (I'd measure it) costly in summary mode.

Makes sense. I need to get that PR done, but I'm busy with Lilliput atm.

I also dimly remember looking at NativeCallStack and finding that it was copied around too much. I mean, once its created its immutable, so one could just pass references or pointers around instead, or keep them with ref counting or similar. Not sure though if this still applies to the current NMT, though.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13088#issuecomment-1478139400


More information about the hotspot-runtime-dev mailing list