"Malloc/Free" Callbacks for Dynamic Off-heap MemorySegments
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Apr 16 10:55:08 UTC 2021
Just so I'm clear - are you referring to the ability to hook in a custom
allocator?
Or, to the possibility of "growing" an existing memory segment (e.g.
reallocate) ?
The answer to the former is "yes!" (and we do so via the
SegmentAllocator interface in the new API, which accepted anywhere a
segment needs to be allocated).
The answer to the latter is "no!" - because having variable-sized
segments would completely kill the performances of memory segment bound
checks.
Maurizio
On 16/04/2021 07:20, leerho wrote:
> *Summary*
> The content and internal structure of an off-heap MemorySegment can be
> dynamic. Special aggregators, for example, Sketches
> <https://datasketches.apache.org>, can start with very small memory
> requirements, just a few bytes, and then over time as more and more data is
> presented to them, can grow larger, commonly to the kilobyte range, but
> also into the many megabytes range.
>
> In the context of large systems that must process and analyze massive data
> there can be millions to billions (not an exaggeration) of these sketches
> in memory. Fortunately, as the size of the overall data grows larger and
> is also fragmented into hundreds to thousands of dimensions required for
> deep analysis of the data, there is natural law that predicts that the
> overall distribution of sizes of these fragments roughly follows a
> power-law distribution. In other words, there will be relatively few
> fragments that are very large in size, and millions of fragments that
> remain very tiny, having captured only a few data points.
>
> Now put yourself in the shoes of the system that is managing the allocation
> of memory to all of these fragments, which we will declare to be
> MemorySegments (likely slices of much larger segments), millions of them.
> The first challenge is that you don't know *a priori*, which segments will
> grow and which segments will remain small. If you allocate all the
> segments the same amount of memory of some average size, the space not used
> by the millions of small ones will be wasted, and the segments that need
> more than the average size will run out of space and fail. If you allocate
> all the segments the maximum predicted size required, the amount of total
> wasted memory will be orders-of-magnitude larger.
>
> Next, put yourself in the shoes of the code of the Sketch aggregator. What
> has been really useful, in this context, is a simple callback mechanism
> whereby the sketch code that is managing the dynamic operation of these
> sketches can signal the data system, which is responsible for overall
> memory allocation and management, and request more space for its segment; a
> "malloc" so to speak. But not a malloc to the JVM or OS, but a memory
> request to the data system, which would also have the right to refuse that
> request or perhaps allocate the memory on the Java Heap. Because some of
> these increases in size can be temporary, the sketch code also needs to
> signal the Data System to "free" a segment that is no longer needed. With
> this kind of callback mechanism the overall utilization of memory is vastly
> improved.
>
> There are already examples of this kind of capability. The C++ PostgreSQL
> database provides "pmalloc(...)" and "pfree(...)" interfaces for dynamic,
> user contributed aggregators. PostgreSQL intercepts these requests so it
> can track and manage overall memory usage.
>
> The datasketches-memory (a.k.a Memory)
> <https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html>
> component
> (a 2017, JDK8 primitive version of FMA) of the Apache DataSketches
> <https://datasketches.apache.org> project has a simple MemoryRequestServer
> <https://datasketches.apache.org/api/memory/snapshot/apidocs/org/apache/datasketches/memory/MemoryRequestServer.html>
> interface
> that provides a similar capability. Finally, the WritableMemory
> <https://datasketches.apache.org/api/memory/snapshot/apidocs/org/apache/datasketches/memory/WritableMemory.html>
> class,
> which roughly corresponds to the MemorySegment, has the method
> *MemoryRequestServer
> getMemoryRequestServer()*.
>
> What is conceptually quite different about this memory model compared with
> the normal Java memory model, is that the code managing the sketch does not
> "own" the memory it is operating with. The constructor of the sketch is
> provided with a MemorySegment, which it must initialize and use for all of
> its dynamic internal data. And as with the MemorySegment the underlying
> memory could be on-heap, off-heap, a memory-mapped file, or a wrapped
> ByteBuffer.
>
> Not all applications that use sketches would require this callback
> mechanism, but for the very large data management systems that
> predominantly use off-heap memory, this call callback mechanism is very
> valuable.
>
> This is a request to add a similar capability to FMA.
>
> Thank you,
>
> Lee.
More information about the panama-dev
mailing list