RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
Stefan Karlsson
stefank at openjdk.org
Wed Mar 26 20:21:17 UTC 2025
On Wed, 26 Mar 2025 18:57:51 GMT, Robert Toyonaga <duke at openjdk.org> wrote:
> > > > What state is the memory in when such a failure happens? Do we even know if the memory is still committed if an uncommit fails?
> > >
> > >
> > > If release/uncommit fails, then it would be hard to know what state the target memory is in. If the arguments are invalid (bad base address), the target region may not even be allocated. Or, in the case of uncommit, if the base address is not aligned, maybe the target committed region does indeed exist but the uncommit still fails. So it would be hard to determine how to readjust the NMT accounting afterward.
> >
> >
> > Agreed. And this would be a pre-existing problem already. If a release/uncommit fails, then we have the similar issues for that as well.
>
> Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is.
My thinking is that if there is a failure you don't know what state the OS left the memory in. So, you don't know whether the memory was in fact unmapped as requested, or if it was left intact, or even something in-between. So, if you don't do the matching NMT bookkeeping there will be a mismatch between the state of the memory and what has been bookkeeped in NMT.
>
> > > > I don't understand why we don't treat that as a fatal error OR make sure that all call-sites handles that error, which they don't do today.
> > >
> > >
> > > I think release/uncommit failures should be handled by the callers. Currently, uncommit failure is handled in most places by the caller, release failure seems mostly not. Since, at least for uncommit, we could sometimes fail for valid reasons, I think we shouldn't fail fatally in the os:: functions.
> >
> >
> > I would like to drill a bit deeper into this. Do you have any concrete examples of an uncommit failure that should not be handled as a fatal error?
>
> [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)]
The above example shows code that assumes that it is OK to fail uncommitting and continuing. I'm trying to figure it that assumption is true. So, what I meant was that I was looking for a concrete example of a failure mode of uncommit that would be an acceptable (safe) failure to continue executing from. That is, a valid failure that don't mess up the memory in an unpredictable/unknowable way.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755656985
More information about the hotspot-dev
mailing list