RFR: 8345687: Improve the implementation of SegmentFactories::allocateSegment
Quan Anh Mai
qamai at openjdk.org
Fri Dec 6 16:50:06 UTC 2024
On Fri, 6 Dec 2024 16:30:47 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
> Hi,
>
> This patch improves the performance of a typical `Arena::allocate` in several ways:
>
> - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge of the instance with the one obtained from the call in the uncommon path, increasing the chance the object being scalar replaced.
> - Split the allocation of over-aligned memory to a slow-path method.
> - Align the memory to 8 bytes, allowing faster zeroing.
> - Use a dedicated method to zero the just-allocated native memory, reduce code size and make it more straightforward.
> - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that `false` value can be constant folded.
>
> Please take a look and leave your reviews, thanks a lot.
The results with the modified `AllocTest`:
Before After
Benchmark (size) Mode Cnt Score Error Score Error Units
AllocTest.alloc_confined 5 avgt 30 24.188 ± 0.305 17.221 ± 1.299 ns/op
AllocTest.alloc_confined 20 avgt 30 24.690 ± 0.168 19.571 ± 3.108 ns/op
AllocTest.alloc_confined 100 avgt 30 26.714 ± 0.061 17.819 ± 0.095 ns/op
AllocTest.alloc_confined 500 avgt 30 38.907 ± 0.113 19.716 ± 0.060 ns/op
AllocTest.alloc_confined 2000 avgt 30 60.056 ± 3.087 43.373 ± 0.564 ns/op
AllocTest.alloc_confined 8000 avgt 30 141.535 ± 1.546 75.110 ± 3.482 ns/op
The overall `AllocTest` results:
Benchmark (size) Mode Cnt Score Error Units
AllocTest.alloc_calloc_arena 5 avgt 30 19.604 ± 0.075 ns/op
AllocTest.alloc_calloc_arena 20 avgt 30 19.750 ± 0.105 ns/op
AllocTest.alloc_calloc_arena 100 avgt 30 20.335 ± 0.103 ns/op
AllocTest.alloc_calloc_arena 500 avgt 30 36.676 ± 0.403 ns/op
AllocTest.alloc_calloc_arena 2000 avgt 30 47.928 ± 2.754 ns/op
AllocTest.alloc_calloc_arena 8000 avgt 30 83.762 ± 1.829 ns/op
AllocTest.alloc_confined 5 avgt 30 17.221 ± 1.299 ns/op
AllocTest.alloc_confined 20 avgt 30 19.571 ± 3.108 ns/op
AllocTest.alloc_confined 100 avgt 30 17.819 ± 0.095 ns/op
AllocTest.alloc_confined 500 avgt 30 19.716 ± 0.060 ns/op
AllocTest.alloc_confined 2000 avgt 30 43.373 ± 0.564 ns/op
AllocTest.alloc_confined 8000 avgt 30 75.110 ± 3.482 ns/op
AllocTest.alloc_unsafe_arena 5 avgt 30 18.810 ± 0.074 ns/op
AllocTest.alloc_unsafe_arena 20 avgt 30 18.858 ± 0.068 ns/op
AllocTest.alloc_unsafe_arena 100 avgt 30 21.820 ± 0.077 ns/op
AllocTest.alloc_unsafe_arena 500 avgt 30 32.685 ± 0.062 ns/op
AllocTest.alloc_unsafe_arena 2000 avgt 30 61.172 ± 1.464 ns/op
AllocTest.alloc_unsafe_arena 8000 avgt 30 133.842 ± 0.337 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22610#issuecomment-2523693086
More information about the core-libs-dev
mailing list