FFM API allocation shootout

Tue Aug 15 15:46:30 UTC 2023

After thinking some more, I think we're starting to see how the various 
pieces might fit together.

The main problem, and overhead, associated with NMT, in the way it's 
done today, is that it has to track memory usage across plain 
malloc/free calls. Now, it's easy to bump a counter when you do a malloc 
(as malloc takes the size as a parameter). But what about the 
corresponding free? That just takes a pointer.

For this reason, NMT "decorates" all pointers returned by os::malloc to 
include a header which contains the size of the allocated region of 
memory. This size is then consulted when doing a "free", so that we can 
decrement the corresponding counter(s).

Now, you can see how this process is redundant, for both MemorySegment 
and ByteBuffer, as these API always know the size of the memory that has 
to be deallocated. So, longer term, we think that Unsafe should provide 
_two_ different ways to allocate and deallocate - one with known size, 
and one with unknown size (the current one). The latter will keep 
decorating pointers, as now. Hopefully this should allow us to implement 
NMT in a way that (a) covers both byte buffers and memory segments and 
(b) doesn't break the bank.

In the short term, I think we'd still like to see some of the overhead 
associated with the Unsafe call to go away. We're considering the 
following options:

1) Make Unsafe::allocateMemory/freeMemory JVM intrinsics
2) Make Unsafe::allocateMemory/freeMemory simple native entries, not JVM 
entries
3) Speed up JNI transitions when the target is a VM entry - to avoid for 
double thread-state transition change (e.g. first Java -> Native then 
Native -> VM).

Of course (2) is the simplest, but we need to make sure that it doesn't 
interact badly with NMT in any way. (1) is a bit more convoluted, albeit 
a very well-undesrtood trick. Finally, (3) is more ambitious, and might 
pay more dividends (as it will cover all JDK calls to internal VM 
entries, not just Unsafe::allocateMemory/freeMemory).

Cheers
Maurizio

On 09/08/2023 11:30, Maurizio Cimadamore wrote:
> econdly, how much do we care about NMT ? In principle it's a nice 
> monitoring tool to have at our disposal, but it's far from being cheap 
> or free. I tried enabling it in some of my benchmarks, in its cheapest 
> "summary" mode, and that alone adds another 50ns to each 
> Unsafe::allocateMemory call. Which seems to be quite steep. I'm not 
> sure that the average FFM API client that wants to allocate a segment 
> really wants to sign up for all of that. Especially given that JNI 
> doesn't do _any_ of that (e.g. the JNI function GetStringUTFChars ends 
> up calling plain malloc).
>
> So, I think there's some more decision to be made in this space - do 
> we want to keep using Unsafe to allocate and free memory, or do we 
> want to target malloc/free more directly (and give up NMT) ? I think 
> this is a bit like the discussion we had for Bits::reserveMemory. 
> While the idea of NMT is appealing (tracking all usages of native 
> memory inside the JVM in one place), I'm a little skeptical that the 
> design goals of NMT align with the way in which FFM API wants to use 
> the allocation functionalities (again, an argument in favor of this 
> thesis is that JNI itself does __not__ use NMT) --- NMT's goal was 
> really to "internal memory usage for a HotSpot JVM" [1]. Does FFM fall 
> into this bucket?