MemorySegment JVM memory leak

Wed Apr 22 13:36:07 UTC 2020

On 22/04/2020 13:46, Uwe Schindler wrote:
> Hi,
>
>>> Just some comments from opposite site:
>>>
>>>>> I am doing also some testing with C using mmap and sometimes having
>>>>> same java issue where memory consumption very high. But still doing
>>>>> investigation and not yet concluded.
>>>>>
>>>> I did exactly the same to narrow down the issue, and I too was having
>>>> very high memory consumption with big mappings.
>>>>
>>>> This is my main loop in C:
>>>>
>>>> char * region = mmap(.......);
>>>>
>>>> for (long l = 0 ; l < SIZE ; l+= 4096) {
>>>>        memcpy(buf, &region[l], 4096);
>>>>        madvise(region, l, MADV_DONTNEED); // <--------
>>>>      }
>>> This exact behavior is wanted for memory mapped files in most cases, the
>> resident memory should be cleaned up later and the OS kernel does a good job
>> with it. E.g., if Lucene/Solr/Elasticsearch would use MADV_DONTNEED its
>> whole IO would go crazy. Why Lucene/Solr/Elasticsearch relies on this is the
>> type of I/O its doing: https://blog.thetaphi.de/2012/07/use-lucenes-
>> mmapdirectory-on-64bit.html - Those servers are relying on the fact it behaves
>> like the linux kernel handles it by default! So please don't add anything like this
>> into MappedByteBuffer and the segment API! It's no memory leak, its just
>> normal behaviour!
>>
>> I'm not proposing to add this so that it's called automatically!
>>
>> What I did is to add an extra method (similar to load()) which does the
>> madvise. If you don't want to use it, you don't have to!
> OK, that's fine - actually after I sent my mail I had seen your comment about the new API. This is also something which might be useful directly on MappedByteBuffer, not only on the segment api, so consider adding it there, too. Should not be too complicated. We would really appreciate it, especially as we can't go to the new segment API at the moment because of the thread-confinement issues.
Ok - glad that we're on the same page. Have you looked into the new 
'unsafe' native segment creation? If you already have a mapped memory 
address you can basically create a custom memory segment with:

* custom address
* custom size
* _optional_ thread owner (meaning if there's no owner, you are unconfined)
* custom cleanup action (you'll need to do something to unmap the 
address here, perhaps in native code)

See here:

https://github.com/openjdk/panama-foreign/blob/foreign-memaccess/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java#L510

The only caveat is that, if you want to use this you need to pass a 
command line flag - which might or might not be ok in your case.

>
>>> I agree, it's a problem for anonymous mapping if they can't be cleaned up,
>> but those should be unmapped after usage. For mmapped files, the memory
>> consumption is not higher, it's just better use of file system cache. If the kernel
>> lacks enough space for other stuff that has no disk backend (anonymous
>> mapping), it will free the occupied resources or add a disk backend by using
>> swap file.
>>
>> This is not what I observed on my machines (and I suspect Ahmed also
>> seeing the same). If you just do a loop iterating over a 100G mapped
>> file, you will eventually run out of RAM and the system will start
>> swapping like crazy to the point of stopping being responsive. I don't
>> think this is an acceptable behavior, at least in this specific case.
> The problem is that you are looping from the beginning to the end of the region. I am not fully familiar with the Linux kernel code in recent Lucene versions, but it tries to be intelligent regarding memory mapping. If you read the whole file like this you are somehow misusing mmap API. Reading the file with sequential IO is much better. MMAP is ideal for random access to files where you need not all at once.
>
> If you touch every block one by one it's the same like MappedMyteBuffer#load(). Stuff that was recently loaded is preferred to be kept in physical memory, so stuff that was longer not access has to go to swap. How this happens depends on the vm.swappiness sysctl kernel setting (which is 60% by default in default, a bad setting e.g. for some worksloads on servers, see below). With 60% swapping out is preferred over just freeing recently acclaimed buffers. Especially with the sequential read antipattern, I would not be surprised, if Linux kernel has an optimization to assume this stuff seems to be needed more often (as sequential reads are mostly a sign of database scans, where file system caching is hardly required).
Right, I've been bitten by swappiness many times. I also thought that 
was probably the culprit here.
>
>
> If you remember my talk on the committers meeting in Brussels: Elasticsearch servers sometimes memorymap up to a terabyte of memory on 64 or 128 GiB pyhsical RAM machines and still work fine. All of this is mmapped in MappedByteBuffers each 1 GiB of size (due to 32 bit limitation). The difference to your stuff is: We use random access and not everyting is needed at same time. IO pressure is much lower for your synthetic test, where system has no time to cleanup, as it wants to swap in pages as fast as possible. If you have random access not with sequential access, the system also has time to free other resource and do a decision to free resources that were longer not used.

True, I suppose that access idiom accounts for the biggest difference.

Thanks
Maurizio