RFR: 8352569: NMT: mmap limits [v2]
Thomas Stuefe
stuefe at openjdk.org
Tue Apr 8 07:33:17 UTC 2025
On Tue, 1 Apr 2025 19:23:56 GMT, Rui Li <duke at openjdk.org> wrote:
>> ### Notes
>>
>> With [JDK-8291878](https://bugs.openjdk.org/browse/JDK-8291878), we have a way to limit the native memory size created by malloc.
>>
>> It'll be nice to have a counterpart for mmap. E.g., [JDK-8350860](https://bugs.openjdk.org/browse/JDK-8350860) will have a good use of mmap limit jvm arg.
>>
>>
>> ### Usages
>> A new jvm arg `-XX:MmapLimit` is added. Usages:
>> - Impose a global limit to mem allocated by mmap() call: `-XX:MmapLimit=<size>`. e.g.: `-XX:MmapLimit=500m`
>> - Or, impose an nmt category to mem allocated by mmap call: `-XX:MmapLimit=<category>:<size>[,category=size]`. e.g.: `-XX:MmapLimit=gc:100m`. Notice that, not every category uses mmap. E.g.: compiler category. In this case, it would behave the same as the mem limit has not been exceeded.
>> - About failure mode: by default, when the limit is exceeded, the app exits in fatal mode. If we want to mimic os oom, we can do it by appending failure mode like `-XX:MmapLimit=<size>:oom` or `-XX:MmapLimit=<category:size>:oom`. e.g.: `-XX:MmapLimit=500m:oom` or `-XX:MmapLimit=gc:100m:oom`. (`-XX:MmapLimit=500m:fatal` is equivalent to `-XX:MmapLimit=500m` since `fatal` is the default)
>>
>> Sample error messages for fatal (`fatal` mode is default and can be omitted in the jvm arg):
>>
>>
>> x64 (8352569) % /workplace/ruiamzn/github/jdk/build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:NativeMemoryTracking=summary -XX:MmapLimit=gc:10k -version
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # Internal Error (/workplace/ruiamzn/github/jdk/src/hotspot/share/nmt/nMemoryLimitPrinter.cpp:77), pid=18248, tid=18249
>> # fatal error: MmapLimit: reached category "mtGC" limit (triggering allocation size: 836K, allocated so far: 836K, limit: 10240B)
>> #
>> # JRE version: (25.0) (fastdebug build )
>> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-adhoc.ruiamzn.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # V [libjvm.so+0x153a4a2] NMemoryLimitPrinter::category_limit_reached(MemTag, unsigned long, unsigned long, nMemlimit const*, NMemType)+0x182
>> #
>> # Core dump will be written. Default location: /workplace/ruiamzn/github/jdk/core.18248
>> #
>> # An error report file with more information is saved as:
>> # /workplace/ruiamzn/github/jdk/hs_err_pid18248.log
>>
>>
>> Sample error messages for oom:
>>
>> x64 (8352569) % /workplace/ruiamzn/github/jdk/build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:Na...
>
> Rui Li has updated the pull request incrementally with one additional commit since the last revision:
>
> Add headers
I appreciate the effort, but don't think this is a good approach we should take. Pity you did not ping us before. There are several reasons why this is less helpful than you may be thinking, and more complex as well.
And there is an alternative approach that is in the work (see bottom).
Difficulty:
Tracking mmap space is more difficult than tracking mmap memory. Simply counting size on commit/uncommit won't do, since memory regions can overlap, and they do sometimes (e.g. Metaspace). You can uncommit/commit partially uncommitted/ committed ranges.
With NMT we have an alternative version of the MemTracker planned that will use a binary tree - similar to an interval tree - that can give more precision here; but I am not convinced this is a good way to limit RSS.
Accuracy:
Assuming that the purpose of this mmap limit is to limit RSS usage by the JVM, tracking committed space to limit RSS is not really useful outside of simple examples. NMT tracks committed memory, not live memory. That is a really big blenish of NMT mmap tracking. See https://bugs.openjdk.org/browse/JDK-8249666 (that bug keeps slipping down the prio list, but it is not terribly complex to do). NMT committed memory tracking is mostly an okayish approximation of RSS since the JVM tends to commit what it plans to use shortly. But it can be really wildly off, e.g. when specifying -Xmx=-Xms.
In addition to that, NMT mmap tracking only accounts a small subset of mmaps int the JVM. It will not work for RSS increases caused by other sources (e.g. untracked mmaps from JDK/third-party libs like natty/system libs), thread stacks, third-party C-heap usage etc.
----
Alternative: 23/24 I worked on a prototype for RSS limiting that works by simply observing RSS; see https://github.com/openjdk/jdk/pull/16938. A small low-cost thread polling RSS at regular intervals, similar to how we trim the C-heap. The advantage of this is that it is independent of all shortcomings NMT has, and it reacts to the real, live memory increase, not whatever NMT is counting as committed.
Unfortunately I did not have time to continue work on this. I may find the time this year, maybe before 25 deadline. Lets see. The PR was mostly done at that point.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24343#issuecomment-2785505673
More information about the hotspot-runtime-dev
mailing list