RFR: JDK-8291878: NMT: Malloc limits

Thomas Stuefe stuefe at openjdk.org
Tue Aug 16 14:02:53 UTC 2022


(Second try, I withdrew the first patch https://github.com/openjdk/jdk/pull/9778 to do some other NMT improvements first)

This PR introduces malloc limits, similar to what `MallocMaxTestWords` was intending. `MallocMaxTestWords` is broken, but this one works fine since it is based on NMT. If this one is in, I'd like to remove `MallocMaxTestWords` or, if people really care, redirect it to the new switch.

### Why this is useful:

We recently analyzed [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a jdk11u-specific regression that caused compiler arena OOMs. We used to have such problems in the past a lot when our CPU ports were young. They are rarer nowadays but still happen.

A switch to limit compiler-related mallocs would have been nice: something to cause the VM to stop with a fatal error in the compiler allocation path when the compiler arena size reached a certain point. I first tried `MallocMaxTestWords`, but that turned out to be broken since it does not de-account memory allocations.

We finally managed to get a retry file by reproducing the bug locally and ulimit-ing the virtual process size, but it was cumbersome. A simple switch like `MallocMaxTestWords` would have been much better.

In addition to customer scenarios like these, such a switch could be used to add sanity checks to compiler jtreg tests. Maybe we could have caught [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919) before shipment.

### How it works:

Patch introduces a new diagnostic switch `-XX:MallocLimit`. That switch can be used in two ways:

1 impose a total global limit to the size hotspot is allowed to malloc: 

-XX:MallocLimit=<size>

2 impose a limit to a selected NMT category, or to multiple NMT categories: 

-XX:MallocLimit=<category>:<size>[,<category>:<size>...]


If the switch is set, and the VM mallocs more in total (1) or for the given category (2), it will now stop with a fatal error. That way we can e.g. limit compiler arenas to a certain maximum in situations where the compiler runs amok, and get a compiler retry file. See here, with an artificial compiler bug introduced:


thomas at starfish$ ./images/jdk/bin/java  -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:MallocLimit=compiler:1g -jar /shared/projects/spring-petclinic/target/spring-petclinic-2.5.0-SNAP
SHOT.jar
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (mallocTracker.cpp:146), pid=519822, tid=519836
#  guarantee(false) failed: MallocLimit: category "Compiler" reached limit (size: 1073765608, limit: 1073741824) 
#
...
# An error report file with more information is saved as:
# /shared/projects/openjdk/jdk-jdk/output-release/hs_err_pid519822.log
#
# Compiler replay data is saved as:
# /shared/projects/openjdk/jdk-jdk/output-release/replay_pid519822.log
#


### Costs:

Memory cost:

- We don't pay anything if NMT is disabled.

- With NMT enabled, we need 138 bytes in total for Limit Counters.

Performance cost:

- We don't pay anything if NMT is disabled.

- We don't pay anything if NMT is enabled but limits are disabled. So this feature only costs something if it is used.

I did a very unrealistic micro benchmark where the VM does billions of malloc in multiple threads in tight loops. Unrealistic since usually the VM does also other things.

If both NMT and Limits are enabled, the performance tax depends on whether its a category-specific limit or a global limit.

For category-specific malloc limits (e.g. `-XX:MallocLimit=compiler:1g`), overhead is practically nil. That makes sense since all we do is a compare between a runtime constant limit and a value that is probably, at the time of the compare, in a register anyway.

For global limit (e.g. `-XX:MallocLimit=1g`), there is performance overhead of about 30%. I improved performace with [JDK-8292072](https://bugs.openjdk.org/browse/JDK-8292072) compared to my first patch (https://github.com/openjdk/jdk/pull/9778#issuecomment-1207808307), but it still is somewhat costly. The reason is that the global allocation size needs to be calculated by adding multiple atomic counters.

However, I can easily improve performance to match the category specific costs by reworking how NMT maintains Arena counters. I have a patch that does that and also simplifies NMT coding a lot. My only problem is that NMT changes are difficult since I cannot find reviewers easily.

Finally, note that once this patch is in, we can remove `MallocMaxTestWords`, which will give a performance boost since its one atomic counter less we need to update on each malloc.

-------------

Commit messages:
 - Fix Test
 - MallocLimit

Changes: https://git.openjdk.org/jdk/pull/9891/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9891&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8291878
  Stats: 487 lines in 9 files changed: 480 ins; 3 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/9891.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9891/head:pull/9891

PR: https://git.openjdk.org/jdk/pull/9891


More information about the hotspot-runtime-dev mailing list