Allocation scheme using mmap

Sun Oct 27 13:33:33 UTC 2024

Hello,

I'm trying to implement something similar to the BufferManager described in
2.1 in [1] for UmbraDB:

So I wanted to create the first buffer size class with fixed sized pages as
follows:

// Allocate the memory
MemorySegment reservedMemory = bufferManager.allocateMemory(totalMemorySize);

// Partition into page-sized chunks
List<MemorySegment> pages = new ArrayList<>();

for (long offset = 0; offset + pageSize < totalMemorySize; offset += pageSize) {
  MemorySegment pageSegment = reservedMemory.asSlice(offset, pageSize);
  pages.add(pageSegment);
}

Using `mmap` to create the virtual address mapping for the big chunk
allocation like this:

public MemorySegment allocateMemory(long size) throws Throwable {
  // Call mmap to reserve virtual memory
  MemorySegment addr = (MemorySegment) mmap.invoke(MemorySegment.NULL,
   // Let OS choose the starting address
                                                   size,
         // Size of the memory to reserve
                                                   PROT_READ |
PROT_WRITE,      // Read and write permissions
                                                   MAP_PRIVATE |
MAP_ANONYMOUS, // Private, anonymous mapping
                                                   -1,
         // No file descriptor
                                                   0
         // No offset
  );
  if (addr == MemorySegment.NULL) {
    throw new OutOfMemoryError("Failed to allocate memory via mmap");
  }
  return addr;
}

First thing I noticed is that I need addr.reinterpret(size) here, but now I
wonder how Arena::allocate is actually implemented for Linux (calling
malloc?). I think the native memory shouldn't be allocated in this case up
until something is written to the MemorySegment slices, right? Of course we
already have a lot of MemorySegment instances on the Java heap which are
allocated, in comparison to a C or C++ version.

Next, I'm not sure if I missed something, but it seems ridiculous hard to
get a file descriptor (the actual int) for some syscalls, I guess as it's
platform specific, but if I didn't miss something I'd have to use
reflection, right? For instance if you have a FileChannel.

My idea of using something like this is based on the idea of reducing
allocations on the Java heap, as I described the problem a couple of months
ago. Up until now I never "recycled/reused" pages when a read from disk was
issued and when the page was not cached. I've always created new instances
of these potentially big objects after a disk read, so in addition I'd
implement something to cache and reuse Java pages (which kind of wrap the
MemorySegments):

In addition to the actual native memory I want to buffer page instances
which use the MemorySegments on top, which can be reclaimed through filling
the underlying MemorySegments with 0 again and some other cleanup stuff. So
essentially I'd never unmap or use madvice don't need, as long as the
process is running.

Hope that makes sense and hopefully the ConcurrentHashMap to retrieve a
page with a certain size, plus taking the first entry from a Deque of free
pages... doesn't add more CPU cycles and synchronization overhead, but the
allocation rate was 2,7Gb for a single txn.

Kind regards
Johannes

[1] https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241027/0c19e0d4/attachment.htm>