RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock

Thomas Stuefe stuefe at openjdk.org
Thu Mar 27 08:17:22 UTC 2025


On Wed, 26 Mar 2025 16:43:21 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> > > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.
> > > 
> > > 
> > > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?
> > 
> > 
> > I second @stefank here.
> > Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a hole into a committed region, this would cause a new new VMA on the kernel-side. This may fail if we run against the limit for VMAs. Forgot what it was, some sysconf setting. All of this is Linux specific, though.
> 
> This happens when we hit the /proc/sys/vm/max_map_count limit, and this immediately crashes the JVM.

Yes, but maybe it shouldn't (see below).

> 
> > I don't think this should be unconditionally a fatal error. Since the allocator (whatever it is) can decide to re-commit the region later, and thus "self-heal" itself.
> 
> Is this referring to failures when we hit the max_map_count limit? I'm not convinced that you can recover from that without immediately hitting the same issue somewhere else in the code.

Well, you could scrape around for a while and maybe not trigger it. E.g. in Metaspace, I uncommit granules, but that is optional. I could just ignore uncommit errors there. In the heap, we could do the same thing. 

After a while, the memory may get reused and thus recommitted, thereby solving the problem.

I admit this problem is a bit theoretical, and it may be acceptable to (continue to) crash at that point, since other allocations - libc, heap etc - will face the same limit. Running against this limit seems rare in my experiences; we mostly saw it with ZGC in the past.

> 
> Or maybe you are thinking about some of the other reasons for the uncommit to fail?

Honestly, I don't know why else uncommit would fail.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2757093258


More information about the hotspot-dev mailing list