RFR: 8345687: Improve the implementation of SegmentFactories::allocateSegment

Quan Anh Mai qamai at openjdk.org
Fri Dec 6 16:50:06 UTC 2024


On Fri, 6 Dec 2024 16:30:47 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

> Hi,
> 
> This patch improves the performance of a typical `Arena::allocate` in several ways:
> 
> - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge of the instance with the one obtained from the call in the uncommon path, increasing the chance the object being scalar replaced.
> - Split the allocation of over-aligned memory to a slow-path method.
> - Align the memory to 8 bytes, allowing faster zeroing.
> - Use a dedicated method to zero the just-allocated native memory, reduce code size and make it more straightforward.
> - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that `false` value can be constant folded.
> 
> Please take a look and leave your reviews, thanks a lot.

The results with the modified  `AllocTest`:

                                                      Before            After
    Benchmark                 (size)  Mode  Cnt    Score   Error    Score   Error  Units
    AllocTest.alloc_confined       5  avgt   30   24.188 ± 0.305   17.221 ± 1.299  ns/op
    AllocTest.alloc_confined      20  avgt   30   24.690 ± 0.168   19.571 ± 3.108  ns/op
    AllocTest.alloc_confined     100  avgt   30   26.714 ± 0.061   17.819 ± 0.095  ns/op
    AllocTest.alloc_confined     500  avgt   30   38.907 ± 0.113   19.716 ± 0.060  ns/op
    AllocTest.alloc_confined    2000  avgt   30   60.056 ± 3.087   43.373 ± 0.564  ns/op
    AllocTest.alloc_confined    8000  avgt   30  141.535 ± 1.546   75.110 ± 3.482  ns/op

The overall `AllocTest` results:

    Benchmark                     (size)  Mode  Cnt    Score   Error  Units
    AllocTest.alloc_calloc_arena       5  avgt   30   19.604 ± 0.075  ns/op
    AllocTest.alloc_calloc_arena      20  avgt   30   19.750 ± 0.105  ns/op
    AllocTest.alloc_calloc_arena     100  avgt   30   20.335 ± 0.103  ns/op
    AllocTest.alloc_calloc_arena     500  avgt   30   36.676 ± 0.403  ns/op
    AllocTest.alloc_calloc_arena    2000  avgt   30   47.928 ± 2.754  ns/op
    AllocTest.alloc_calloc_arena    8000  avgt   30   83.762 ± 1.829  ns/op
    AllocTest.alloc_confined           5  avgt   30   17.221 ± 1.299  ns/op
    AllocTest.alloc_confined          20  avgt   30   19.571 ± 3.108  ns/op
    AllocTest.alloc_confined         100  avgt   30   17.819 ± 0.095  ns/op
    AllocTest.alloc_confined         500  avgt   30   19.716 ± 0.060  ns/op
    AllocTest.alloc_confined        2000  avgt   30   43.373 ± 0.564  ns/op
    AllocTest.alloc_confined        8000  avgt   30   75.110 ± 3.482  ns/op
    AllocTest.alloc_unsafe_arena       5  avgt   30   18.810 ± 0.074  ns/op
    AllocTest.alloc_unsafe_arena      20  avgt   30   18.858 ± 0.068  ns/op
    AllocTest.alloc_unsafe_arena     100  avgt   30   21.820 ± 0.077  ns/op
    AllocTest.alloc_unsafe_arena     500  avgt   30   32.685 ± 0.062  ns/op
    AllocTest.alloc_unsafe_arena    2000  avgt   30   61.172 ± 1.464  ns/op
    AllocTest.alloc_unsafe_arena    8000  avgt   30  133.842 ± 0.337  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22610#issuecomment-2523693086


More information about the core-libs-dev mailing list