RFR: JDK-8293313: NMT: Rework MallocLimit [v4]

Wed Jan 25 06:29:10 UTC 2023

On Tue, 24 Jan 2023 20:53:54 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert strchrnul
>
> src/hotspot/share/runtime/globals.hpp line 1344:
> 
>> 1342:           "Number of exits until ZombieALot kicks in")                      \
>> 1343:                                                                             \
>> 1344:   product(ccstr, MallocLimit, nullptr, DIAGNOSTIC,                          \
> 
> Pre-existing, but I wish this flag was named `NMTMallocLimit`, not `MallocLimit`.

I prefer MallocLimit, short and snappy. And the fact that NMT is supervising the limit is an implementation detail. It used to be different with MallocMaxTestWords and may be different in the future.

> Can we make it a bit easier to parse by rewriting it as:
> 
> ` if ((size > old_size) && MemTracker::check_exceeds_limit(size - old_size, memflags)) {`
> 
> To me personally that is easier to read.
> 

Sure

> I see a bigger issue here, however. In the worst case scenario, where the os is unable to expand the memory chunk in place, it will need to allocate `size` bytes in a new region, so the total **potential** resources from the os point of view is `size+old_size` bytes for that particular allocation. Shouldn't we assume this worst case and test for that? I.e.
> 
> ` if ((size > old_size) && MemTracker::check_exceeds_limit(size, memflags)) {`
> 
> We would need a comment here explaining this choice if we make this change.
> 
> I'd rather get some false positives that miss a single true positive...

Nah, no need to overthink this. 

If you try to predict how much memory your malloc causes your process to allocate, it gets very fuzzy very quickly:

Below are at least two allocating layers (libc and the kernel's vm manager). Both of them balance memory and CPU footprint and allocation speed. No libc or kernel is the same, too. Each layer below you may return cached or partly cached memory and apply an often sizable overhead to what you are allocating. So our mallocs and frees have only a very indirect effect on RSS. And we don't even see all mallocs here, just the ones from libjvm.

Therefore it is best to limit the meaning of MallocLimit to "limit to how much hotspot is allowed to allocate". That is also the most useful. For limiting the memory of a process, different os-side tools exist (cgroup limits, ulimit, etc).

> Why did we bother to wrap `VMError::is_error_reported()` into `suppress_limit_handling()`?
> 

Because during error handling, code may malloc() too (bad practice, but it can happen). If it does, I don't want circular assertions to fire; I want a clean, complete hs-err file.

> Are you anticipating more exclusions here in the future?

None I can think of.

-------------

PR: https://git.openjdk.org/jdk/pull/11371