RFR: JDK-8291878: NMT: Diagnostic malloc limits

Thomas Stuefe stuefe at openjdk.org
Fri Aug 5 17:35:41 UTC 2022


This PR introduces malloc limits, similar to what `MallocMaxTestWords` was intending. `MallocMaxTestWords` is broken, but this one works fine since it is based on NMT. If this one is in, I'd like to remove `MallocMaxTestWords` or, if people really care, redirect it to the new switch.

----

Why this is useful:

We recently analyzed [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a jdk11u-specific regression that caused a compiler arena to explode. We used to have such problems in the past a lot, when our CPU ports were young. They are rarer nowadays but still happen.

A switch to limit compiler-related mallocs would have been nice: something to cause the VM to stop with a fatal error in the compiler allocation path when the compiler arena size reached a certain point. I first tried `MallocMaxTestWords`, but that turned out to be broken since it does not de-account memory allocations.

We finally managed to get a retry file by reproducing the bug locally and ulimit-ing the virtual process size, but it was cumbersome. A simple switch like `MallocMaxTestWords` would have been much better.

In addition to customer scenarios like these, such a switch could be used to add sanity checks to compiler jtreg tests. Maybe we could have caught [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919) before shipment.

-----

How it works:

Patch introduces a new diagnostic switch `-XX:MallocLimit`. That switch can be used in two ways:

1 impose a total global limit to the size hotspot is allowed to malloc: 

-XX:MallocLimit=<size>

2 impose a limit to a selected NMT category, or to multiple NMT categories: 

-XX:MallocLimit=<category>:<size>[,<category>:<size>...]


If the switch is set, and the VM mallocs more in total (1) or for the given category (2), it will now stop with a fatal error. That way we can e.g. limit compiler arenas to a certain maximum in situations where the compiler runs amok, and get a compiler retry file. See here, with an artificial compiler bug introduced:


thomas at starfish$ ./images/jdk/bin/java  -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:MallocLimit=compiler:1g -jar /shared/projects/spring-petclinic/target/spring-petclinic-2.5.0-SNAP
SHOT.jar
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (mallocTracker.cpp:146), pid=519822, tid=519836
#  guarantee(false) failed: MallocLimit: category "Compiler" reached limit (size: 1073765608, limit: 1073741824) 
#
...
# An error report file with more information is saved as:
# /shared/projects/openjdk/jdk-jdk/output-release/hs_err_pid519822.log
#
# Compiler replay data is saved as:
# /shared/projects/openjdk/jdk-jdk/output-release/replay_pid519822.log
#

-----

The patch:
- adds the option and its handling to NMT
- adds regression tests.

-------------

Commit messages:
 - MallocLimit

Changes: https://git.openjdk.org/jdk/pull/9778/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9778&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8291878
  Stats: 487 lines in 9 files changed: 480 ins; 3 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/9778.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9778/head:pull/9778

PR: https://git.openjdk.org/jdk/pull/9778


More information about the hotspot-runtime-dev mailing list