Performance findings and questions about Arena allocation and JExtract

David david.vlijmincx at gmail.com
Mon Jan 20 15:29:08 UTC 2025


Hi,

Thanks for the amazing work on project Panama. I've been using it to
develop a Java library that provides bindings to Linux's io_uring, and the
API has been a joy to work with.

During development, I encountered some performance characteristics around
memory allocation that I wanted to share and get your thoughts on. While
working on the library, I discovered that arena.allocate() became a
performance bottleneck compared to malloc + fill(0) or calloc operations
(and creating a MemorySegment out of the return value). Here are the
benchmark results using JDK 24 build 31 (2025/1/9):

[Zeroed memory allocation comparison]                          (allocation
size)
Benchmarks.memory.MemoryAllocationBench.arenaAlloc     512   thrpt    5
 15147.191 ±  943.526  ops/ms
Benchmarks.memory.MemoryAllocationBench.arenaAlloc     1024  thrpt    5
 10551.065 ± 1005.304  ops/ms
Benchmarks.memory.MemoryAllocationBench.arenaAlloc     4096  thrpt    5
3248.519 ±    3.210  ops/ms

Benchmarks.memory.MemoryAllocationBench.callocAlloc    512   thrpt    5
 13718.615 ±  994.042  ops/ms
Benchmarks.memory.MemoryAllocationBench.callocAlloc    1024  thrpt    5
 10104.415 ±  128.425  ops/ms
Benchmarks.memory.MemoryAllocationBench.callocAlloc    4096  thrpt    5
4883.802 ±  212.922  ops/ms

Benchmarks.memory.MemoryAllocationBench.mallocAlloc   512   thrpt    5
 20054.526 ± 1844.846  ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc   1024  thrpt    5
 12370.954 ± 1859.726  ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc   4096  thrpt    5
3332.564 ±  142.788  ops/ms

Arena is fast, but never faster than the other options, which made
performing better than the Filechannel API difficult. For my use case,
where a filling a memorySegment with zeros isn't required, the performance
gap widens even further:

[malloc performance]
Benchmarks.memory.MemoryAllocationBench.mallocAlloc         512   thrpt
 5  27901.228 ± 1300.511  ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc         1024  thrpt
 5  19637.654 ± 1548.356  ops/ms
Benchmarks.memory.MemoryAllocationBench.mallocAlloc         4096  thrpt
 5   5780.638 ±  332.062  ops/ms

I stopped using Arena in performance-critical paths, though this meant
giving up some of the convenient memory management features. I have two
questions, and I'd love to learn more about:

1. Are there any performance improvements planned for memorySegment
allocation using Arena?
2. Would implementing a custom Arena be the recommended approach for cases
where allocation performance is important?

Additionally, regarding JExtract: Is there a way to configure the default
access modifier for generated code? Currently, all generated classes and
methods are public, which requires manual modification to make them
package-private when trying to control the API surface.

Thank you for your time and feedback.

Kind regards,
David

For reference, here's part of the benchmark code:

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(1)
@Benchmark
public void callocAlloc(Blackhole blackhole, ExecutionPlanMemoryVariables
vars) throws Throwable {
     MemorySegment segment = ((MemorySegment) calloc.invokeExact(1L,
(long)vars.blockSize)).reinterpret(vars.blockSize);

    segment.fill((byte) 3);
    blackhole.consume(segment);

    free.invokeExact(segment);
}

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(1)
@Benchmark
public void arenaAllocWithArenaCreation(Blackhole blackhole,
ExecutionPlanMemoryVariables vars) {
    try (Arena arena = Arena.ofConfined()) {
        MemorySegment allocate = arena.allocate(vars.blockSize);
        allocate.fill((byte) 3);
        blackhole.consume(allocate);
    }
}

public class LibCWrapper {

    public static final MethodHandle free;
    public static final MethodHandle malloc;
    public static final MethodHandle calloc;

    static {
        Linker linker = Linker.nativeLinker();

        free = linker.downcallHandle(
                linker.defaultLookup().find("free").orElseThrow(),
                FunctionDescriptor.ofVoid(ADDRESS)
        );

        malloc = linker.downcallHandle(
                linker.defaultLookup().find("malloc").orElseThrow(),
                FunctionDescriptor.of(ADDRESS, JAVA_LONG)
        );

        calloc = linker.downcallHandle(
                linker.defaultLookup().find("calloc").orElseThrow(),
                FunctionDescriptor.of(ADDRESS, JAVA_LONG,JAVA_LONG)
        );

    }

    private LibCWrapper() {
    }

}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250120/7dee4945/attachment-0001.htm>


More information about the panama-dev mailing list