RFR: JDK-8321266: Add diagnostic RSS threshold [v3]
Thomas Stuefe
stuefe at openjdk.org
Mon Jan 22 07:24:28 UTC 2024
On Wed, 6 Dec 2023 08:13:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> We have `MallocLimit`, a way to trigger errors when reaching a given malloc load threshold. This PR proposes
>> a complementary switch, `RSSLimit`, that does the same based on the Resident Set Size of the process.
>>
>> ---
>>
>> Motivation:
>>
>> The main usage for this option is to analyze OOM kills. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory.
>>
>> One rarely has any information on the nature of the OOM, or if there even was one, and if yes, if the JVM was the culprit or just an innocent bystander. In these situations, getting a voluntary abort *before* the process gets killed from outside can give us valuable information.
>>
>> Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting to catch obvious footprint degradations early.
>>
>> Letting the JVM handle this Limit has many advantages:
>>
>> - since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work.
>>
>> - Re-using the normal error reporting mechanism is powerful since:
>> - hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc.
>> - Using `OnError`, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps.
>> - Using `ErrorLogToStd(out|err)` will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral.
>>
>> ----
>>
>> Usage:
>>
>> Limit is given either as an absolute number or as a relative percentage of the total memory of the machine or the container, e.g.
>> `-XX:RssLimit=2G` or `-XX:RssLimit=80%`.
>>
>> If given as percent, JVM will also react to container limit updates.
>>
>> Example: we run the JVM inside a container as the sole payload process. Limit its RSS to 90% of the container limit, and in case we run into the limit, fire a heap dump:
>>
>> `java -XX:+UnlockDiagnosticVMOptions -XX:RssLimit=80% '-XX:OnError=jcmd %p GC.heap_dump my-dump' -Xlog:os+rss `
>>
>> ----
>>
>> Patch:
>>
>> Im...
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> Add specific percentage switch
not yet
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16938#issuecomment-1903396425
More information about the hotspot-dev
mailing list