Best practices with reading/writing to a Memory Mapped File

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Jun 29 15:29:15 UTC 2020


Hi Johannes,
glad that you managed to make everything work.

While I'm not an expert in mmap fine-tuning, one thing that comes to 
mind is that memory mapped files are mapped into main memory one page at 
a time, so if your pattern of access is really random/sparse, maybe 
there's not a lot to be gained by using mapped file in your use case.

Also, looking at the code, it seems like you are creating a mapped 
segment for each page write, which seems odd - typically you'd want a 
mapped segment to contain all the memory you need to access, and then 
let the loading/unloading of pages to the OS, which generally knows 
better. It seems to me that your application is, instead selecting with 
PageReference to write, then creates a mapped segment for that page, and 
then persists the changes via the mapped segment; I think doing this 
probably nullifies all the advantages of keeping the contents of the 
file in memory. In fact, with your approach, since the mapped segment is 
not stashed anywhere, I don't think the file will be even kept in memory 
(you map and then discard soon after, page after page).

I'd expect some state to remain cached from one write to the next (e.g. 
the mapped segment should, ideally, be stashed in some field, and only 
discarded if, for some reason, the original bounds are no longer valid - 
e.g. because the file is truncated, or expanded). But, assuming your 
file size remains stable, your code should keep accessing memory using 
_the same_ mapped segment, and the OS will load/unload pages for you as 
it sees fit (using heuristics to keep frequently used pages loaded, and 
discard the ones that have been used less frequently - all taking into 
account how much memory your system has).

Maurizio

On 27/06/2020 11:50, Johannes Lichtenberger wrote:
> Hi,
>
> I've fixed my Memory Mapped file implementation using your Foreign Memory
> API.
>
> https://github.com/sirixdb/sirix/tree/master/bundles/sirix-core/src/main/java/org/sirix/io/memorymapped
>
> Running my tests (mostly simple integration tests, which test if the stuff
> I'm storing can be retrieved again or the result of queries are what I
> expect), I can't see a clear performance difference between the
> RandomAccessFile implementation
>
> https://github.com/sirixdb/sirix/tree/master/bundles/sirix-core/src/main/java/org/sirix/io/file
>
> and the new memorymapped implementation.
>
> So far, I have to create a new mapping everytime I'm appending to the
> memory mapped segment of the underlying file I guess (otherwise the bounds
> checks will obviously fail):
>
> https://github.com/sirixdb/sirix/blob/627fa5a57a302b04d7165aad75a780d74e14c2e9/bundles/sirix-core/src/main/java/org/sirix/io/memorymapped/MemoryMappedFileWriter.java#L141
>
> I'm only ever appending data when writing or reading randomly based on
> offsets.
>
> I haven't done any microbenchmarks as of now and did not check bigger files
> ranging from 1Gb to much more nor did I use a profiler to check what's
> going on. However, maybe creating the mapping often times is costly and
> maybe you can simply spot a performance issue. Or it's IntelliJ and my
> rather small flles for testing as of now.
>
> Will next check if importing a 3,8 Gb JSON file is faster or iterating
> through the whole imported file with around 400_000_000 nodes :-)
>
> If anyone wants to check it it's simply changing
>
> private static final StorageType STORAGE = StorageType.FILE;
>
> to
>
> private static final StorageType STORAGE = StorageType.MEMORY_MAPPED;
>
> in the class: org.sirix.access.ResourceConfiguration
>
> Thanks for all the suggestions and hints so far
> Johannes


More information about the panama-dev mailing list