RFR(M): 8152995: Solaris os::available_memory() doesn't do what we think it does

Fri Apr 8 12:36:53 UTC 2016

Hi Daniel,

Thanks for having a look at this.

On 2016-04-06 18:31, Daniel D. Daugherty wrote:
> Erik,
>
> Thanks for adding Runtime to this discussion. The topic is definitely
> of interest to Runtime folks...
>
> More below...
>
>
> On 2016-04-06 16:09, Erik Österlund wrote:
>> Hi,
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8152995
>> CR: http://cr.openjdk.java.net/~eosterlund/8152995/webrev.00/
>>
>> On Solaris, the os::available_memory() function is currently
>> calculated with sysconf(_SC_AVPHYS_PAGES).
>
> The Solaris man page for sysconf(_SC_AVPHYS_PAGES):
>
> _SC_AVPHYS_PAGES  Number of physical memory pages not
>                   currently in use by system

Yes. After drilling down the details, it returns the amount of physical 
memory not used by the virtual memory system. But when you mmap without 
NORESERVE, no physical memory is actually paged in until memory starts 
to be touched. Therefore this metric corresponds to how much memory is 
available *for the kernel* to satisfy page faults, rather than how much 
memory is available *for applications* to satisfy allocations. These two 
interpretations are completely orthogonal: the amount of memory 
available for the kernel has no relation to the amount of memory 
available for applications, and the amount of memory available for 
applications has no relation to the amount of memory available in the 
kernel. It is safe to say that what is being asked for, is available 
memory to satisfy allocations, and not how much memory the kernel has at 
its disposal for internal use.

>
>> Unfortunately this does not match intended semantics. The intended
>> semantics is to return how much memory can be allocated by mmap into
>> physical memory. But, what _SC_AVPHYS_PAGES does is to return how many
>> physical pages are available to be used by virtual memory as backing
>> storage on-demand once it is touched, without any connection
>> whatsoever to virtual memory.
>
> This part has me curious:
>
> > The intended semantics is to return how much memory can be allocated
> > by mmap into physical memory.
>
> since I don't understand where you found the "intended semantics".

Unfortunately I can't tell you this in the open list.

> Only one of the platforms has any comments about available_memory:
>
> src/os/bsd/vm/os_bsd.cpp:
>
> // available here means free
> julong os::Bsd::available_memory() {
>
> the rest just don't say...
>
> Personally, I've always interpreted available_memory() to mean
> available physical pages, as in pages that are not in use now.
> This matches the definition of _SC_AVPHYS_PAGES above...

So your version of available memory depends on how much memory is "in 
use". In use by who? As you can see, this also leaves room for 
interpretations - use by the kernel to satisfy page faults or use by 
applications to satisfy allocations. You just flipped it.

And yes, interpretations have indeed been personal and not very well 
defined, which is obvious from the code. Let me summarize the current 
situation:

Windows: Returns the amount of physical memory (RAM) that is available 
and can be used to satisfy allocations.

Linux: Returns the amount of physical memory in the freelist that can be 
used to satisfy allocations. But there is in fact a lot more memory 
available. The freelist can be seen as memory wasted by the OS. It tries 
to use physical memory for things like file caches to boost performance, 
but that memory is in fact available if anyone starts allocating. So it 
will return less memory than is in fact available, contrary to the name 
of the function. This is also misleading and arguably wrong.

BSD: Seems to also return the amount of free rather than available 
memory, to satisfy allocations, contrary to the name of the function.

Solaris: Returns the amount of physical memory not paged in, to satisfy 
page faults, invariantly of the amount of memory available to satisfy 
allocations.

In summary:
*All OS implementations seem to consistently disregard swap page and not 
consider it available memory.
* The meaning is mixed between available memory to satisfy application 
allocations, available memory for the kernel to use to satisfy page 
faults, and immediately free memory (rather than available memory) to 
satisfy allocations.

I think that due to the lack of a clear definition, people have used any 
preferred interpretation.

>
>> Even if we mmap to commit heap memory without NORESERVE, the
>> _SC_AVPHYS_PAGES metric does not change its value - at least not until
>> somebody actually touches the mmaped memory and it starts becoming
>> backed by actual physical memory. So the JVM can in theory commit the
>> whole physical memory, and _SC_AVPHYS_PAGES will still reply that all
>> that memory is still available given that it has not been touched yet.
>
> Yes, I believe that is exactly how things work and I think
> that available_memory() is returning that information
> correctly.

Solaris is AFAIK the only OS to make this interpretation, and it is 
arguably not useful for the users as it does not correspond to memory 
that applications can use to satisfy allocations. But due to the lack of 
a definition, I can't say you are wrong, just have opinions about it and 
point out Solaris is the only OS with this interpretation, and that 
people have been very confused about it.

>
>> It is likely that this is related to random swap-related test
>> failures, where too many JVMs are created based on this metric.
>
> Please explain further. What do you mean by "too many JVMs are
> created based on this metric"?

Well I guess the problem is that people run many JVMs at the same time 
and then some JVM crashes complaining that it is out of swap memory and 
hence can't run any longer, while available_memory says there are 
multiple gigabytes of available memory. This is very confusing for 
users. The reason is, as I said, that the memory has not been paged in 
yet, and therefore there appears to be lots of memory available, but 
none of it can be used to satisfy allocations, leading to failures.

>
>> Even
>> if it is not, the os::available_memory() call is still broken in its
>> current state and should be fixed regardless.
>
> I'm not yet convinced that available_memory() is broken
> and needs to be fixed. I don't see available_memory() being
> used in a lot of places and those uses that I do see are
> mostly just reports of the value...
>
> So what am I missing about how os::available_memory() is
> being used?

The reports are very problematic though. It's better to not report 
anything than to report misleading, incorrect numbers. If we report a 
number, then that number should be correct and useful.

I also think that whether available_memory() is broken or not has 
nothing to do with how often it is being used. It reports misleading 
numbers, and that leads to unnecessary confusion. People are very 
confused about Solaris swap issues. A contributing reason is that 
reported values are not what people think they are. Therefore it needs a 
fix.

Thanks,
/Erik

> Dan
>
>
>
>> My proposed fix uses kstat to get the available memory that can be
>> mmapped (which actually relates to virtual memory). It then uses
>> swapctl() to find out the amount of free swap, subtracting that from
>> the kstat value, to make sure we do not count swap memory as being
>> available for grabbing, to mimick the current behaviour of other
>> platforms. The code iterates over the potentially many swap resources
>> and adds up the free swap memory.
>>
>> kstat gives us all memory that can be made available, including memory
>> already used by the OS for things like file caches, and swap memory.
>> When this value is 0, mmap will fail. That's why I calculate the
>> amount of swap and remove that, assuming it is okay to use memory that
>> isn't immediately available but can be made available, as long as it
>> does not involve paging to the swap memory.
>>
>> Testing:
>> * JPRT
>> * Made my own test program that can be found in the comments of the
>> BUG to report on memory values, so I could verify what is going on and
>> that when the new os::available_memory() becomes 0, is indeed when
>> paging to swap starts happening using vmstat.
>>
>> I need a sponsor to push this if anyone is interested.
>>
>> Thanks,
>> /Erik
>