RFR: 8321266: Add diagnostic RSS threshold [v3]

Kevin Walls kevinw at openjdk.org
Tue May 13 09:23:17 UTC 2025


On Wed, 6 Dec 2023 08:13:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> We have `MallocLimit`, a way to trigger errors when reaching a given malloc load threshold. This PR proposes
>> a complementary switch, `RSSLimit`, that does the same based on the Resident Set Size of the process.
>> 
>> ---
>> 
>> Motivation:
>> 
>> The main usage for this option is to analyze OOM kills. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory. 
>> 
>> One rarely has any information on the nature of the OOM, or if there even was one, and if yes, if the JVM was the culprit or just an innocent bystander. In these situations, getting a voluntary abort *before* the process gets killed from outside can give us valuable information.
>> 
>> Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting to catch obvious footprint degradations early.
>> 
>> Letting the JVM handle this Limit has many advantages:
>> 
>> - since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work.
>> 
>> - Re-using the normal error reporting mechanism is powerful since:
>>   - hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc.
>>   - Using `OnError`, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps.
>>   - Using `ErrorLogToStd(out|err)` will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral.
>> 
>> ----
>> 
>> Usage: 
>> 
>> Limit is given either as an absolute number or as a relative percentage of the total memory of the machine or the container, e.g. 
>> `-XX:RssLimit=2G` or `-XX:RssLimit=80%`. 
>> 
>> If given as percent, JVM will also react to container limit updates.
>> 
>> Example: we run the JVM inside a container as the sole payload process. Limit its RSS to 90% of the container limit, and in case we run into the limit, fire a heap dump:
>> 
>> `java -XX:+UnlockDiagnosticVMOptions -XX:RssLimit=80% '-XX:OnError=jcmd %p GC.heap_dump my-dump' -Xlog:os+rss `
>> 
>> ----
>> 
>> Patch:
>> 
>> Im...
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add specific percentage switch

Hi Thomas -

Looks good & useful.

The Percent question looks resolved with the RssLimitPercent option.  Glad to not have to state the "%" sign on the command line, even if it means there are two options there is no ambiguity.

The example usage in the description here could use an update.  


Linux OOM Killer is mentioned.  The risk is setting too high an RssLimit or too low an RssLimitCheckInterval, meaning the JVM gets killed by the OOM Killer before we trigger this feature.  Any advice on how to choose these figures? 8-)   That must depend on general memory pressure, so may not be easy to give a rule, and I see there is some discussion above, I'm just fishing for good advice.



Is this really a Diagnostic flag, or something people would want to run in production?

I see MallocLimit was diagnostic, but that seems like a diagnostic for non-production use, putting hard limit on what an app can allocate over time, so the app is being run for diagnostic purposes, not expected to be up for a long time.

But this RSS resource limiting looks like a production feature?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16938#issuecomment-2875693503


More information about the hotspot-dev mailing list