RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
Robert Toyonaga
duke at openjdk.org
Wed Mar 26 19:00:17 UTC 2025
On Wed, 26 Mar 2025 16:05:00 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
>>> What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?
> >
>> If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward.
>
> Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well.
Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is.
>>> I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.
> >
>> I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.
>
> I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?
[`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)]
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755468073
More information about the hotspot-dev
mailing list