RFR: JDK-8291878: NMT: Diagnostic malloc limits
Thomas Stuefe
stuefe at openjdk.org
Fri Aug 5 17:35:41 UTC 2022
This PR introduces malloc limits, similar to what `MallocMaxTestWords` was intending. `MallocMaxTestWords` is broken, but this one works fine since it is based on NMT. If this one is in, I'd like to remove `MallocMaxTestWords` or, if people really care, redirect it to the new switch.
----
Why this is useful:
We recently analyzed [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a jdk11u-specific regression that caused a compiler arena to explode. We used to have such problems in the past a lot, when our CPU ports were young. They are rarer nowadays but still happen.
A switch to limit compiler-related mallocs would have been nice: something to cause the VM to stop with a fatal error in the compiler allocation path when the compiler arena size reached a certain point. I first tried `MallocMaxTestWords`, but that turned out to be broken since it does not de-account memory allocations.
We finally managed to get a retry file by reproducing the bug locally and ulimit-ing the virtual process size, but it was cumbersome. A simple switch like `MallocMaxTestWords` would have been much better.
In addition to customer scenarios like these, such a switch could be used to add sanity checks to compiler jtreg tests. Maybe we could have caught [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919) before shipment.
-----
How it works:
Patch introduces a new diagnostic switch `-XX:MallocLimit`. That switch can be used in two ways:
1 impose a total global limit to the size hotspot is allowed to malloc:
-XX:MallocLimit=<size>
2 impose a limit to a selected NMT category, or to multiple NMT categories:
-XX:MallocLimit=<category>:<size>[,<category>:<size>...]
If the switch is set, and the VM mallocs more in total (1) or for the given category (2), it will now stop with a fatal error. That way we can e.g. limit compiler arenas to a certain maximum in situations where the compiler runs amok, and get a compiler retry file. See here, with an artificial compiler bug introduced:
thomas at starfish$ ./images/jdk/bin/java -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:MallocLimit=compiler:1g -jar /shared/projects/spring-petclinic/target/spring-petclinic-2.5.0-SNAP
SHOT.jar
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (mallocTracker.cpp:146), pid=519822, tid=519836
# guarantee(false) failed: MallocLimit: category "Compiler" reached limit (size: 1073765608, limit: 1073741824)
#
...
# An error report file with more information is saved as:
# /shared/projects/openjdk/jdk-jdk/output-release/hs_err_pid519822.log
#
# Compiler replay data is saved as:
# /shared/projects/openjdk/jdk-jdk/output-release/replay_pid519822.log
#
-----
The patch:
- adds the option and its handling to NMT
- adds regression tests.
-------------
Commit messages:
- MallocLimit
Changes: https://git.openjdk.org/jdk/pull/9778/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9778&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8291878
Stats: 487 lines in 9 files changed: 480 ins; 3 del; 4 mod
Patch: https://git.openjdk.org/jdk/pull/9778.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/9778/head:pull/9778
PR: https://git.openjdk.org/jdk/pull/9778
More information about the hotspot-runtime-dev
mailing list