Memory Mapped file segment (file is empty)

Thu Jun 25 16:03:39 UTC 2020

On 6/25/20 8:53 AM, Maurizio Cimadamore wrote:
>
>> And from what I understand about memory mapped files -- not related 
>> to your API -- but in general it makes sense to map the whole 
>> "database"-file and not just a small page-portion, right? As it will 
>> fetch the page on-demand into the non heap memory location.
>
> In the Java 15 API you can map a smaller portion of a file (by passing 
> an offset and a size); this was possible with the ByteBuffer API, but 
> that option was partially omitted in the segment API.
>
> I think when it comes to mapped files, mileage can vary - I believe 
> some of my colleagues (and people in this very mailing list) are much 
> more knowledgeable than I am when it comes to fine tuning the size of 
> mapped memory regions :-)
>

FWIW, I'll share some personal experience and info I've found since I've 
been working with this too.

According to Wikipedia[1], pages(4096 bytes) are indeed only loaded into 
memory once accessed:

A possible benefit of memory-mapped files is a "lazy loading", thus 
using small amounts of RAM even for a very large file. Trying to load 
the entire contents of a file that is significantly larger than the 
amount of memory available can cause severe thrashing as the operating 
system reads from disk into memory and simultaneously writes pages from 
memory back to disk. Memory-mapping may not only bypass the page file 
completely, but also allow smaller page-sized sections to be loaded as 
data is being edited, similarly to demand paging used for programs.

(random platform differences aside)

Going by the code that was linked, I feel like there is a 
misunderstanding the relationship between MemorySegment.mapFromPath and 
the file being mapped. The "size" argument has little to do with the 
size of the file but rather the amount of accessible data from within 
FMA. FMA will expand this amount for you if the file's size is under the 
desired size to be mapped.

In other words, if you pass 128 bytes instead of the file's size, the 
size of the underlying file will be 128 bytes assuming it's not already 
expanded beyond 128 bytes.

You can then expand the file further by recalling the factory method 
with a larger size using the same file. All newly expanded bytes will be 
zero'd. If you need to know where to start reading from, you may want to 
try reserving the first 8 bytes as a long offset or keeping an entirely 
different file that has any important offset you may need.

(side note: it would be nice if an "expand(long bytes)" method was added 
to MappedMemorySegment)

The good news is that an expanded MappedMemorySegment has no affect on 
any other smaller MappedMemorySegment instances or their slices and 
since they are backed by the same file, any slices can continue to work 
as expected. *You may want to force() and unload() just to be safe*. 
Here is some basic code to show this:

File file = new File("./test");

         if(!file.exists())
             file.createNewFile();
         else
         {
             file.delete();
             file.createNewFile();

         }

         MappedMemorySegment segment = 
MemorySegment.mapFromPath(file.toPath(), 0, 128, 
FileChannel.MapMode.READ_WRITE);

         MappedMemorySegment segment2 = 
MemorySegment.mapFromPath(file.toPath(), 0, 196, 
FileChannel.MapMode.READ_WRITE);

         segment2.close();

The bad news is that you somehow need to manage all this without leaking 
on-heap memory or creating a lot of garbage. If you close the older, 
smaller MappedMemorySegment then all slices will also close but, again, 
the larger MappedMemorySegment has access to all the same data. Managing 
old vs. new is a bit of a headache and can blow up if you aren't 
careful. Avoiding slicing and using VarHandles would probably be a good 
way to make things easier for yourself and VarHandles are faster too.

If the data is being repeated like so in a reliable format like:

1. 80-byte string

2. struct with int(4-bytes), 4-byte padding, and an 8-byte long

3. int(4-bytes)

(total 100 bytes)

You could, I think, use MemoryHandles.withOffset then to create a 
VarHandle that would allow you to iterate through these entries. You 
could further use VarHandles to reduce the amount slicing for each entry.

This may require use of unsafe though, I think? Not sure.

Not a JDK developer or an expert by any means but I hope some of this 
helps at least a little.

[1] https://en.wikipedia.org/wiki/Memory-mapped_file#Benefits