RFR: JDK-8321266: Add diagnostic RSS threshold [v2]

Wed Dec 6 07:12:35 UTC 2023

On Tue, 5 Dec 2023 10:36:29 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi Thomas,
>> 
>> I've taken a first pass through this and it seems okay in principle. A number of initial comments/suggestions below.
>> 
>> Thanks.
>
>> Hi Thomas,
>> 
>> I've taken a first pass through this and it seems okay in principle. A number of initial comments/suggestions below.
>> 
>> Thanks.
> 
> Thanks a lot, David!
> 
> Makes me happy to see this finds acceptance at least in principle.
> 
> I changed:
> - get_rss to get_RSS
> - removed the "0 means off" text, since I assume passing 0 would be likely a user error. Instead, I also added an error check for percentage = 0.0.
> - added a warning if the OS does not support this feature

> Hi @tstuefe this looks useful feature and seems to provides a way to deal with OOM killer in containers. If the user has set container memory limit to 256MB, then the RssLimit can be set to around 200MB. This would let the JVM catch the OOM before it is handled by the kernel. But I have one concern. The effectiveness of this solution really depends on how frequently the check is done. If there is a sudden memory spike, it should, ideally, last longer than `RssLimitCheckInterval` for RssWatcher to take the action. Flipping it the other way, we can say RssWatcher can catch memory spikes that last longer than `RssLimitCheckInterval`. Even then, it can catch the spike only as long as it is less than the container limit. This raises the question of determining the effective value of `RssLimit` and `RssLimitCheckInterval`. For instance, compilations can induce memory spike which may last for few hundred milliseconds at the most, which is much lesser than the default value of 5 secs for `
 RssLimitCheckInterval`. What are your thoughts on this?

> Hi @tstuefe this looks useful feature and seems to provides a way to deal with OOM killer in containers. If the user has set container memory limit to 256MB, then the RssLimit can be set to around 200MB. This would let the JVM catch the OOM before it is handled by the kernel. But I have one concern. The effectiveness of this solution really depends on how frequently the check is done. If there is a sudden memory spike, it should, ideally, last longer than `RssLimitCheckInterval` for RssWatcher to take the action. Flipping it the other way, we can say RssWatcher can catch memory spikes that last longer than `RssLimitCheckInterval`. Even then, it can catch the spike only as long as it is less than the container limit. This raises the question of determining the effective value of `RssLimit` and `RssLimitCheckInterval`. For instance, compilations can induce memory spike which may last for few hundred milliseconds at the most, which is much lesser than the default value of 5 secs for `
 RssLimitCheckInterval`. What are your thoughts on this?

Your concern is valid; there is no bullet-proof way to do this.

I originally chose to make the default interval low since I feared that reading procfs would be too expensive. However, after some testing I see that is at this interval, so much caution is not necessary; I will lower the default interval to 1 second.

Furthermore, I plan to make the interval adaptive: if we detect a large RSS spike or are within n% of the limit, I plan to lower the interval temporarily. Since that requires more testing and tuning, I will do this in a separate RFE.

For compiler (and for hotspot-induced mallocs generally) we already have -XX:MallocLimit, that is independent on polling and works real-time. In addition, for the compiler we have compilation memory limits via compile command.

But in the end, there remains an unknown that is unsolvable. Any spike can be shorter than any interval-based check we do. This PR is an abridged form for something I did for SAP: https://stuefe.de/posts/vitals/sapmachine-high-memory-reports/ - there, I use a system of three "danger zones" that each trigger different actions. Would love to get such a solution upstream at some point.

I wish the kernel would give us SIGDANGER like on AIX. That would be the real solution.

When I originally implemented the SAP solution, I had also looked at container-intrinsic solutions, but did not find any that were reliable. Note that OOM-kills can also come from some framework just scrapping the whole container; so it may not even be the kernel that kills us, the whole VM may go away.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16938#issuecomment-1842210562