RFR: JDK-8321266: Add diagnostic RSS threshold
Thomas Stuefe
stuefe at openjdk.org
Mon Dec 4 14:10:01 UTC 2023
We have `MallocLimit`, a way to trigger errors when reaching a given malloc load threshold. This PR proposes
a complementary switch, `RSSLimit`, that does the same based on the Resident Set Size of the process.
---
Motivation:
The main usage for this option is to analyze situations that would lead to an OOM kill of the process. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory. In either case, one has little or no information to go on; often, one does not even know it was the OOM killer, or if the JVM was really responsible. In these situations, getting a voluntary abort *before* the process is killed can give us valuable information we would not get otherwise.
Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting, to catch obvious footprint degradations early.
Letting the JVM handle this Limit has many advantages:
- since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work.
- Re-using the normal error reporting mechanism is powerful since:
- hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc.
- Using `OnError`, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps.
- Using `ErrorLogToStd(out|err)` will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral.
----
Usage:
Limit is given either as an absolute number or as a relative percentage of the total memory of the machine or the container, e.g.
`-XX:RssLimit=2G` or `-XX:RssLimit=80%`.
If given as percent, JVM will also react to container limit updates.
Example: we run the JVM inside a container as the sole payload process. Limit its RSS to 90% of the container limit, and in case we run into the limit, fire a heap dump:
`java -XX:+UnlockDiagnosticVMOptions -XX:RssLimit=80% '-XX:OnError=jcmd %p GC.heap_dump my-dump' -Xlog:os+rss `
----
Patch:
Implemented for Linux, MacOS and Windows. Left out AIX since there we have a long-standing problem that RSS is not easily obtained since the normal memory usage numbers don't include system V shared memory, which is the Lion's share of JVM memory we use.
-------------
Commit messages:
- wip
- wip
- wip
- RssLimit
Changes: https://git.openjdk.org/jdk/pull/16938/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16938&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8321266
Stats: 395 lines in 13 files changed: 394 ins; 0 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/16938.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/16938/head:pull/16938
PR: https://git.openjdk.org/jdk/pull/16938
More information about the hotspot-dev
mailing list