RFR: JDK-8293313: NMT: Rework MallocLimit [v4]

Wed Jan 25 17:38:06 UTC 2023

On Wed, 25 Jan 2023 06:22:47 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/runtime/os.cpp line 707:
>> 
>>> 705: 
>>> 706:     // Observe MallocLimit
>>> 707:     if (size > old_size && MemTracker::check_exceeds_limit(size - old_size, memflags)) {
>> 
>> Can we make it a bit easier to parse by rewriting it as:
>> 
>> `    if ((size > old_size) && MemTracker::check_exceeds_limit(size - old_size, memflags)) {`
>> 
>> To me personally that is easier to read.
>> 
>> I see a bigger issue here, however. In the worst case scenario, where the os is unable to expand the memory chunk in place, it will need to allocate `size` bytes in a new region, so the total **potential** resources from the os point of view is `size+old_size` bytes for that particular allocation. Shouldn't we assume this worst case and test for that? I.e.
>> 
>> `    if ((size > old_size) && MemTracker::check_exceeds_limit(size, memflags)) {` 
>> 
>> We would need a comment here explaining this choice if we make this change.
>> 
>> I'd rather get some false positives that miss a single true positive...
>
>> Can we make it a bit easier to parse by rewriting it as:
>> 
>> ` if ((size > old_size) && MemTracker::check_exceeds_limit(size - old_size, memflags)) {`
>> 
>> To me personally that is easier to read.
>> 
> 
> Sure
> 
>> I see a bigger issue here, however. In the worst case scenario, where the os is unable to expand the memory chunk in place, it will need to allocate `size` bytes in a new region, so the total **potential** resources from the os point of view is `size+old_size` bytes for that particular allocation. Shouldn't we assume this worst case and test for that? I.e.
>> 
>> ` if ((size > old_size) && MemTracker::check_exceeds_limit(size, memflags)) {`
>> 
>> We would need a comment here explaining this choice if we make this change.
>> 
>> I'd rather get some false positives that miss a single true positive...
> 
> Nah, no need to overthink this. 
> 
> If you try to predict how much memory your malloc causes your process to allocate, it gets very fuzzy very quickly:
> 
> Below are at least two allocating layers (libc and the kernel's vm manager). Both of them balance memory and CPU footprint and allocation speed. No libc or kernel is the same, too. Each layer below you may return cached or partly cached memory and apply an often sizable overhead to what you are allocating. So our mallocs and frees have only a very indirect effect on RSS. And we don't even see all mallocs here, just the ones from libjvm.
> 
> Therefore it is best to limit the meaning of MallocLimit to "limit to how much hotspot is allowed to allocate". That is also the most useful. For limiting the memory of a process, different os-side tools exist (cgroup limits, ulimit, etc).

The entire point of this effort is to catch elusive OOM errors. You even moved the check for the memory requirements before we actually request it, from the current way where we check the limit only after we acquire new memory.

Isn't assuming the worst case scenario, where realloc(), might allocate new memory first, in the same vein as that and we are just being cautious, to catch those hard to reproduce bugs?

This cases might not occur very often, granted, but why not catch them if we can?

Aren't there any platforms where we run Java where realloc() is implemented in terms of malloc() and memcpy()?

-------------

PR: https://git.openjdk.org/jdk/pull/11371