RFR: JDK-8321266: Add diagnostic RSS threshold
David Holmes
dholmes at openjdk.org
Tue Dec 5 05:52:43 UTC 2023
On Sun, 3 Dec 2023 12:51:24 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> We have `MallocLimit`, a way to trigger errors when reaching a given malloc load threshold. This PR proposes
> a complementary switch, `RSSLimit`, that does the same based on the Resident Set Size of the process.
>
> ---
>
> Motivation:
>
> The main usage for this option is to analyze OOM kills. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory.
>
> One rarely has any information on the nature of the OOM, or if there even was one, and if yes, if the JVM was the culprit or just an innocent bystander. In these situations, getting a voluntary abort *before* the process gets killed from outside can give us valuable information.
>
> Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting to catch obvious footprint degradations early.
>
> Letting the JVM handle this Limit has many advantages:
>
> - since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work.
>
> - Re-using the normal error reporting mechanism is powerful since:
> - hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc.
> - Using `OnError`, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps.
> - Using `ErrorLogToStd(out|err)` will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral.
>
> ----
>
> Usage:
>
> Limit is given either as an absolute number or as a relative percentage of the total memory of the machine or the container, e.g.
> `-XX:RssLimit=2G` or `-XX:RssLimit=80%`.
>
> If given as percent, JVM will also react to container limit updates.
>
> Example: we run the JVM inside a container as the sole payload process. Limit its RSS to 90% of the container limit, and in case we run into the limit, fire a heap dump:
>
> `java -XX:+UnlockDiagnosticVMOptions -XX:RssLimit=80% '-XX:OnError=jcmd %p GC.heap_dump my-dump' -Xlog:os+rss `
>
> ----
>
> Patch:
>
> Implemented for Linux, MacOS and Windows. Left out AIX since there we have a long-...
Hi Thomas,
I've taken a first pass through this and it seems okay in principle. A number of initial comments/suggestions below.
Thanks.
src/hotspot/os/aix/os_aix.cpp line 1299:
> 1297: }
> 1298:
> 1299: // Unimplemented
Is this temporary or does AIX not support a way to get RSS?
src/hotspot/os/bsd/os_bsd.cpp line 1473:
> 1471: result = info.resident_size;
> 1472: }
> 1473: #endif // __APPLE__
Hmmm so no general BSD support either ...
src/hotspot/share/runtime/globals.hpp line 1372:
> 1370: "memory size (e.g. \"2G\") or as a percentage of " \
> 1371: "the total available memory on this machine or in this " \
> 1372: "container (e.g. \"-XX:RssLimit=80%%\"). A value of 0 (default) " \
It would be more usual to take this as a fraction of available memory e.g. 0.8.
That simplifies the parsing and validation logic.
src/hotspot/share/runtime/globals.hpp line 1378:
> 1376: "If RssLimit is set, interval, in ms, at which the JVM will " \
> 1377: "check the process resident set size." \
> 1378: range(10, UINT_MAX)) \
Can we actually handle enrolling a periodic task with a UINT_MAX interval?
src/hotspot/share/runtime/os.hpp line 774:
> 772:
> 773: // Returns the process working set size (rss); 0 if unsupported.
> 774: static size_t get_rss();
Nit: as it is an acronym `get_RSS` would be better IMO - just for this accessor; no need to rename everything to RSS.
src/hotspot/share/runtime/threads.cpp line 775:
> 773: if (RssLimit != nullptr) {
> 774: RssWatcher::initialize(RssLimit);
> 775: }
So I think if we are on AIX or regular BSD then we should at least give a warning that the flag will be ignored, and actually ignore it.
src/hotspot/share/services/rsswatch.cpp line 63:
> 61:
> 62: void update_limit() {
> 63: const size_t limit_100 = os::physical_memory();
Can this change dynamically?
src/hotspot/share/services/rsswatch.cpp line 113:
> 111: } else {
> 112: if (!parse_integer(s, (char**)&s, &limit) || limit == 0) {
> 113: vm_exit_during_initialization("Failed to parse RssLimit", "Not a valid limit size");
You specified that zero turned the feature off
src/hotspot/share/services/rsswatch.hpp line 2:
> 1: /*
> 2: * Copyright (c) 1999, 2023, Oracle and/or its affiliates. All rights reserved.
Copyright should not include 1999.
src/hotspot/share/services/rsswatch.hpp line 41:
> 39: };
> 40:
> 41: #endif // OS_LINUX_RSSWATCH_HPP
Comment is wrong
-------------
Changes requested by dholmes (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16938#pullrequestreview-1763989601
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414907775
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414908253
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414904350
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414909274
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414896587
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414910546
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414901304
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414905726
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414906442
PR Review Comment: https://git.openjdk.org/jdk/pull/16938#discussion_r1414906583
More information about the hotspot-dev
mailing list