FFM API allocation shootout
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Aug 8 17:32:01 UTC 2023
> I don't have a good technical explanation, just a gut feeling that, if
> we're going to include the cost of freeing in the benchmark, we should
> also include the cost of doing something, anything, with the allocated
> memory block. Especially when comparing with a baseline that always
> touches every byte. Anyway, I wrote this ~5 years ago and don't remember
> exactly what prompted me to do it this way.
Thanks for the info.
Just did another benchmark where I accessed the allocated memory before
returning - there's no change, malloc still seems quite a bit faster
than Unsafe::allocateMemory. We're still investigating as to why that
might be the case.
> Of course, if the assumption is that clients of the FFM API will always
> see initialized memory, then eliminating unnecessary internal zeroing
> will be beneficial.
>
> However, when comparing the relative costs of what makes up a complete
> allocation, if a particular component is "artificially" expensive it
> makes other components look cheaper than they actually are. Especially
> for small allocations. Afaict, all FFM allocations will go through at
> least one Unsafe::setMemory or Unsafe::copyMemory. If either of them is
> (much) more expensive than it could be, the benchmark authors should take
> that into account. It is why I thought it was worth mentioning.
>
> Btw, Unsafe::copyMemory also suffered from bad performance back in JDK 8,
> but it's much faster since JDK 10. A custom loop is still beneficial for
> very small aligned copies (up to 64 bytes), but we use copyMemory for
> everything else, never fall back to native memcpy.
This doesn't 100% correspond with what we saw, in the sense that we have
seen cases where, with big arrays, a linker call to "memcpy" can be 2x
faster than Unsafe, Profiling revealed that libc is going doing some AVX
optimized version, whereas Unsafe, at least on all the machines I
tested, uses a plain copy using longs - which is slower for very big arrays.
Summing up, I agree that, moving forward we have to pay more attention
at all these little costs, and make sure that they remain small (or
comparable with what libc provides).
Maurizio
More information about the panama-dev
mailing list