Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Wed Sep 9 14:45:46 UTC 2015

On 09/09/2015 04:21 PM, Peter Levart wrote:
> Hi Uwe,
>
> As I thought, the problem for some seems to be non-prompt unmapping of 
> mapped address space held by otherwise unreachable mapped byte 
> buffers. The mapped address space doesn't live in the Java heap and 
> doesn't represent a heap memory pressure, so GC doesn't kick-in 
> automatically when one would like. One could help by manually 
> triggering GC with System.gc() in such situations. The problem is how 
> to detect such situations. Direct byte buffers 
> (ByteBuffer.allocateDirect) maintain a count of bytes currently 
> allocated and don't allow allocation of native memory beyond certain 
> configured limit (-XX:MaxDirectMemorySize=<size>). Before throwing 
> OutOfMemoryError, the  ByteBuffer.allocateDirect() request tries it's 
> best to free direct memory allocated by otherwise unreachable direct 
> ByteBuffers (using System.gc() to trigger GC and helping process 
> references).
>
> Would similar approach - configured limit for FileChannel.map()ped 
> address space be of any help to Lucene applications? Is it possible to 
> estimate the max. amount of address space a particular Lucene 
> application may need at any one time so that mapping over such limit 
> could be considered an application error?

Perhaps the number of bytes mapped is not always a correct quantity to 
track. Maybe Lucene needs tracking the number of mapped regions or 
something else? I think it would be best to leave to the application to 
decide and implement the tracking and also triggering GC at times when 
it approaches the limit. All that is missing currently from 
MappedByteBuffer API for that purpose is a notification to the 
application after it has been unmapped.

Regards, Peter

>
> Regards, Peter
>
> On 09/09/2015 12:51 PM, Uwe Schindler wrote:
>> Hi,
>>
>> Dawid Weiss and I are both involved in the Apache Lucene project and 
>> we know the problems with MappedByteBuffer and unmapping. Dawid 
>> already responded with a source code link to our impl (which needs to 
>> use the hacky cleaner() approach; also look at the heavy 
>> documentation in this class): 
>> https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
>>
>> So we would be very happy to get this issue resolved! The cleaner() 
>> hack is enabled by default in Lucene if the JVM supports it (so we 
>> won't break if JIGSAW prevents this, but our *large* users would 
>> heavily complain).
>>
>>>> This is fundamentally about *integrity* of the runtime. It follows 
>>>> there
>>>> are security implications, but it’s still fundamentally an 
>>>> integrity issue
>>>> and guarding an unsafe operation with a Security Manager is
>>>> unfortunately an insufficient solution.
>>> Right, and just to add that there has been many attempts over the years
>>> to find solutions to this issue. I think the closest was atomimcally
>>> remapping but that wasn't feasible on all platforms and also didn't 
>>> free
>>> up the address space in a timely manner.
>> So we should really find a solution here. I was talking with several 
>> people on various conferences (Rory O'Donnel or Mark Reinhold) and we 
>> had some ideas how to solve this. My idea how to solve this is 
>> explained below (I am not a JVM internals or Hotspot guy, so excuse 
>> some obviously "wrong" assumptions):
>>
>> Actually there are 2 issues, not only one. The first issue is, as 
>> mentioned before: you cannot unmap via API. This is needed for many 
>> apps, including Apache Lucene, for a reason which comes more from 
>> "another" bug, and this is my issue #2 (see below).
>>
>> First, unmapping for Lucene is very important at the moment, because 
>> we operate on the Lucene indexes purely using mmap (see [1]), which 
>> may be several hundreds of Gigabytes easily. On highly dynamic 
>> systems, Lucene often maps new files (also very largeones ) and 
>> relies on the fact, that older, deleted files are unmapped in time 
>> (this does not need to be ASAP, just "in time"). So we have those 2 
>> "bugs", which force us to unmap:
>>
>> (1) disk space issues / delete after last close (POSIX) vs. No delete 
>> at all (Windows)
>>
>> - disk space: we have seen customers running out of disk space on 
>> Lucene, because unmapping wasn’t done in time and therefore POSIX 
>> with delete on last close cannot free the disk space, although the 
>> file was already deleted. The problem you are seeing on Windows that 
>> you cannot delete, is therefore worse on Linux, because it is hidden 
>> to the user - you cannot free the disk space of the deleted file! 
>> Lucene creates and deletes files all the time while indexing realtime 
>> data (e.g. think of Github's very dynamic code search index, which is 
>> backed by Lucene/Elasticsearch).
>> - virtual memory: If you map huge files (several hundreds of 
>> Gigabytes) and they are not unmapped in time, you may run out of 
>> virtual address space. This especially affects Windows, because it 
>> does not use the full 46 bits (or like that) of addresses. So 
>> effectively you can only map like 4 Terabytes on Windows. If you have 
>> fragmentation of address space this gets worse (In Lucene, we map in 
>> chunks of 1 GiB because of the signed 32 bit integer limit of 
>> ByteBuffer, so fragmentation is not our biggest issue).
>>
>> (2) It takes veeeeeeeeeeeeeeeery long time until the unmapping 
>> actually occurs!
>>
>> This is the real bug! If the garbage collector would clean up the 
>> buffers asap, we would not need to unmap from user code. In Lucene we 
>> just delay the file delete on Windows, so we are not really affected 
>> by the file deletion inability (but that would be nice if it could be 
>> fixed).
>>
>> If you look at the usage pattern of those huge, mapped files, you 
>> will see why they are in most cases *never ever* unmapped 
>> automatically: Lucene maps very large files and uses them for longer 
>> time. So the MappedByteBuffer object gets migrated to older 
>> generations on the heap. Garbage collection there happens, of course, 
>> very delayed. That would not be the most problematic part, but there 
>> is a second issue: The MappedByteBuffer object is just a very small 
>> object (in heap size measurement: just an object header and a few 
>> pointers), so the garbage collector does not see it as heavy! It's 
>> just a very small like 30 bytes object instance. Why should the 
>> Garbage collector clean it up? And in fact it will almost never do 
>> this! The garbage collector cannot see that our 30 bytes object 
>> instance "sits" on something like 300 Gigabytes of virtual memory and 
>> disk space!
>>
>> One proposal to fix this would be to add something like an internal 
>> OpenJDK Java Annotation or similar where you can "mark" heavy 
>> objects, so Garbage collector could free them by preference (similar 
>> to sun.misc.Contended).
>>
>> For the Apache Lucene team,
>> Uwe
>>
>> [1] 
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> -----
>> Uwe Schindler
>> uschindler at apache.org
>> ASF Member, Apache Lucene PMC / Committer
>> Bremen, Germany
>> http://lucene.apache.org/
>>
>>
>