FFM API allocation shootout
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Aug 9 18:30:52 UTC 2023
On 08/08/2023 10:32, Maurizio Cimadamore wrote:
> Just did another benchmark where I accessed the allocated memory
> before returning - there's no change, malloc still seems quite a bit
> faster than Unsafe::allocateMemory. We're still investigating as to
> why that might be the case.
We finally managed to isolate what the issue is (thanks to Erik
Osterlund for pointing us in the right direction!). We all know that
there's a state transition when going from Java to Native. Now, when you
call one of the native methods in Unsafe, you have to additionally go
through another state transition, from Native to VM. This second
transition is the reason behind the performance gap between calling
"malloc" directly using the Linker vs. calling "malloc" using Unsafe.
That is, in the former case we only do one transition, in the latter we
do two (and the second transition seems particularly expensive at that).
Now, what can we do to improve this? Here's some random thoughts.
First, do Unsafe::allocateMemory and Unsafe::freeMemory even need to be
executed in VM state? The answer here is not 100% clear (but we'll try
to get some more clarity). Unsafe::allocateMemory uses the
NativeMemoryTracking (NMT) mechanism, so in principle it might need to
run in some more privileged mode. That said, NMT is disabled by default,
so it would be theoretically possible to detect whether NMT is enabled
or not, and, if disabled just call "malloc"/"free" using the linker, and
avoid Unsafe entirely.
Secondly, how much do we care about NMT ? In principle it's a nice
monitoring tool to have at our disposal, but it's far from being cheap
or free. I tried enabling it in some of my benchmarks, in its cheapest
"summary" mode, and that alone adds another 50ns to each
Unsafe::allocateMemory call. Which seems to be quite steep. I'm not sure
that the average FFM API client that wants to allocate a segment really
wants to sign up for all of that. Especially given that JNI doesn't do
_any_ of that (e.g. the JNI function GetStringUTFChars ends up calling
plain malloc).
So, I think there's some more decision to be made in this space - do we
want to keep using Unsafe to allocate and free memory, or do we want to
target malloc/free more directly (and give up NMT) ? I think this is a
bit like the discussion we had for Bits::reserveMemory. While the idea
of NMT is appealing (tracking all usages of native memory inside the JVM
in one place), I'm a little skeptical that the design goals of NMT align
with the way in which FFM API wants to use the allocation
functionalities (again, an argument in favor of this thesis is that JNI
itself does __not__ use NMT) --- NMT's goal was really to "internal
memory usage for a HotSpot JVM" [1]. Does FFM fall into this bucket?
The good news is that at least on paper, FFM API's default allocation
could be _really_ fast, and the reasons as to why allocation is current
slower, compared to basic malloc/free, are more accidental than anything
else. That said, I think we need to think a bit about what kind of
"default" behavior we want to provide for FFM API users "out of the
box". While the message has always been "if you need faster allocation,
write your own arena" (and LWJGL confirms that), I'm also a little
uncomfortable to leave the "default" at a competitive disadvantage
compared with "malloc" (and, by extension, JNI), as that could put some
developers off, and hinder adoption.
Maurizio
[1] -
https://docs.oracle.com/en/java/javase/20/vm/native-memory-tracking.html#GUID-710CAEA1-7C6D-4D80-AB0C-B0958E329407
More information about the panama-dev
mailing list