Performance findings and questions about Arena allocation and JExtract
David
david.vlijmincx at gmail.com
Mon Jan 20 15:29:08 UTC 2025
Hi,
Thanks for the amazing work on project Panama. I've been using it to
develop a Java library that provides bindings to Linux's io_uring, and the
API has been a joy to work with.
During development, I encountered some performance characteristics around
memory allocation that I wanted to share and get your thoughts on. While
working on the library, I discovered that arena.allocate() became a
performance bottleneck compared to malloc + fill(0) or calloc operations
(and creating a MemorySegment out of the return value). Here are the
benchmark results using JDK 24 build 31 (2025/1/9):
[Zeroed memory allocation comparison] (allocation
size)
Benchmarks.memory.MemoryAllocationBench.arenaAlloc 512 thrpt 5
15147.191 ± 943.526 ops/ms
Benchmarks.memory.MemoryAllocationBench.arenaAlloc 1024 thrpt 5
10551.065 ± 1005.304 ops/ms
Benchmarks.memory.MemoryAllocationBench.arenaAlloc 4096 thrpt 5
3248.519 ± 3.210 ops/ms
Benchmarks.memory.MemoryAllocationBench.callocAlloc 512 thrpt 5
13718.615 ± 994.042 ops/ms
Benchmarks.memory.MemoryAllocationBench.callocAlloc 1024 thrpt 5
10104.415 ± 128.425 ops/ms
Benchmarks.memory.MemoryAllocationBench.callocAlloc 4096 thrpt 5
4883.802 ± 212.922 ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc 512 thrpt 5
20054.526 ± 1844.846 ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc 1024 thrpt 5
12370.954 ± 1859.726 ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc 4096 thrpt 5
3332.564 ± 142.788 ops/ms
Arena is fast, but never faster than the other options, which made
performing better than the Filechannel API difficult. For my use case,
where a filling a memorySegment with zeros isn't required, the performance
gap widens even further:
[malloc performance]
Benchmarks.memory.MemoryAllocationBench.mallocAlloc 512 thrpt
5 27901.228 ± 1300.511 ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc 1024 thrpt
5 19637.654 ± 1548.356 ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc 4096 thrpt
5 5780.638 ± 332.062 ops/ms
I stopped using Arena in performance-critical paths, though this meant
giving up some of the convenient memory management features. I have two
questions, and I'd love to learn more about:
1. Are there any performance improvements planned for memorySegment
allocation using Arena?
2. Would implementing a custom Arena be the recommended approach for cases
where allocation performance is important?
Additionally, regarding JExtract: Is there a way to configure the default
access modifier for generated code? Currently, all generated classes and
methods are public, which requires manual modification to make them
package-private when trying to control the API surface.
Thank you for your time and feedback.
Kind regards,
David
For reference, here's part of the benchmark code:
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(1)
@Benchmark
public void callocAlloc(Blackhole blackhole, ExecutionPlanMemoryVariables
vars) throws Throwable {
MemorySegment segment = ((MemorySegment) calloc.invokeExact(1L,
(long)vars.blockSize)).reinterpret(vars.blockSize);
segment.fill((byte) 3);
blackhole.consume(segment);
free.invokeExact(segment);
}
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(1)
@Benchmark
public void arenaAllocWithArenaCreation(Blackhole blackhole,
ExecutionPlanMemoryVariables vars) {
try (Arena arena = Arena.ofConfined()) {
MemorySegment allocate = arena.allocate(vars.blockSize);
allocate.fill((byte) 3);
blackhole.consume(allocate);
}
}
public class LibCWrapper {
public static final MethodHandle free;
public static final MethodHandle malloc;
public static final MethodHandle calloc;
static {
Linker linker = Linker.nativeLinker();
free = linker.downcallHandle(
linker.defaultLookup().find("free").orElseThrow(),
FunctionDescriptor.ofVoid(ADDRESS)
);
malloc = linker.downcallHandle(
linker.defaultLookup().find("malloc").orElseThrow(),
FunctionDescriptor.of(ADDRESS, JAVA_LONG)
);
calloc = linker.downcallHandle(
linker.defaultLookup().find("calloc").orElseThrow(),
FunctionDescriptor.of(ADDRESS, JAVA_LONG,JAVA_LONG)
);
}
private LibCWrapper() {
}
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250120/7dee4945/attachment-0001.htm>
More information about the panama-dev
mailing list