FFM API allocation shootout

Wed Aug 9 18:30:52 UTC 2023

On 08/08/2023 10:32, Maurizio Cimadamore wrote:
> Just did another benchmark where I accessed the allocated memory 
> before returning - there's no change, malloc still seems quite a bit 
> faster than Unsafe::allocateMemory. We're still investigating as to 
> why that might be the case. 

We finally managed to isolate what the issue is (thanks to Erik 
Osterlund for pointing us in the right direction!). We all know that 
there's a state transition when going from Java to Native. Now, when you 
call one of the native methods in Unsafe, you have to additionally go 
through another state transition, from Native to VM. This second 
transition is the reason behind the performance gap between calling 
"malloc" directly using the Linker vs. calling "malloc" using Unsafe. 
That is, in the former case we only do one transition, in the latter we 
do two (and the second transition seems particularly expensive at that).

Now, what can we do to improve this? Here's some random thoughts.

First, do Unsafe::allocateMemory and Unsafe::freeMemory even need to be 
executed in VM state? The answer here is not 100% clear (but we'll try 
to get some more clarity). Unsafe::allocateMemory uses the 
NativeMemoryTracking (NMT) mechanism, so in principle it might need to 
run in some more privileged mode. That said, NMT is disabled by default, 
so it would be theoretically possible to detect whether NMT is enabled 
or not, and, if disabled just call "malloc"/"free" using the linker, and 
avoid Unsafe entirely.

Secondly, how much do we care about NMT ? In principle it's a nice 
monitoring tool to have at our disposal, but it's far from being cheap 
or free. I tried enabling it in some of my benchmarks, in its cheapest 
"summary" mode, and that alone adds another 50ns to each 
Unsafe::allocateMemory call. Which seems to be quite steep. I'm not sure 
that the average FFM API client that wants to allocate a segment really 
wants to sign up for all of that. Especially given that JNI doesn't do 
_any_ of that (e.g. the JNI function GetStringUTFChars ends up calling 
plain malloc).

So, I think there's some more decision to be made in this space - do we 
want to keep using Unsafe to allocate and free memory, or do we want to 
target malloc/free more directly (and give up NMT) ? I think this is a 
bit like the discussion we had for Bits::reserveMemory. While the idea 
of NMT is appealing (tracking all usages of native memory inside the JVM 
in one place), I'm a little skeptical that the design goals of NMT align 
with the way in which FFM API wants to use the allocation 
functionalities (again, an argument in favor of this thesis is that JNI 
itself does __not__ use NMT) --- NMT's goal was really to "internal 
memory usage for a HotSpot JVM" [1]. Does FFM fall into this bucket?

The good news is that at least on paper, FFM API's default allocation 
could be _really_ fast, and the reasons as to why allocation is current 
slower, compared to basic malloc/free, are more accidental than anything 
else. That said, I think we need to think a bit about what kind of 
"default" behavior we want to provide for FFM API users "out of the 
box". While the message has always been "if you need faster allocation, 
write your own arena" (and LWJGL confirms that), I'm also a little 
uncomfortable to leave the "default" at a competitive disadvantage 
compared with "malloc" (and, by extension, JNI), as that could put some 
developers off, and hinder adoption.

Maurizio

[1] - 
https://docs.oracle.com/en/java/javase/20/vm/native-memory-tracking.html#GUID-710CAEA1-7C6D-4D80-AB0C-B0958E329407