RFR: 8357531: The `SegmentBulkOperations::fill` method can be improved using overlaps
Per Minborg
pminborg at openjdk.org
Thu May 22 07:46:03 UTC 2025
On Thu, 22 May 2025 07:34:08 GMT, Per Minborg <pminborg at openjdk.org> wrote:
> This PR builds on a concept John Rose told me about some time ago. Instead of combining memory operations of various sizes, a single large and skewed memory operation can be made to clean up the tail of remaining bytes.
>
> This has the effect of simplifying and shortening the code while improving performance. The number of branches to evaluate is reduced.
Performance on an M1 Mac (Sequoia 15.4.1)
Base:
Benchmark (ELEM_SIZE) Mode Cnt Score Error Units
SegmentBulkFill.nativeSegmentFillJava 2 avgt 30 1.618 ± 0.060 ns/op
SegmentBulkFill.nativeSegmentFillJava 3 avgt 30 1.602 ± 0.042 ns/op
SegmentBulkFill.nativeSegmentFillJava 4 avgt 30 1.775 ± 0.070 ns/op
SegmentBulkFill.nativeSegmentFillJava 5 avgt 30 1.759 ± 0.051 ns/op
SegmentBulkFill.nativeSegmentFillJava 6 avgt 30 1.771 ± 0.051 ns/op
SegmentBulkFill.nativeSegmentFillJava 7 avgt 30 1.785 ± 0.049 ns/op
SegmentBulkFill.nativeSegmentFillJava 8 avgt 30 2.383 ± 0.061 ns/op
SegmentBulkFill.nativeSegmentFillJava 64 avgt 30 4.010 ± 0.255 ns/op
SegmentBulkFill.nativeSegmentFillJava 512 avgt 30 6.622 ± 0.246 ns/op
SegmentBulkFill.nativeSegmentFillJava 4096 avgt 30 44.431 ± 0.832 ns/op
SegmentBulkFill.nativeSegmentFillJava 32768 avgt 30 331.429 ± 3.073 ns/op
SegmentBulkFill.nativeSegmentFillJava 262144 avgt 30 4174.795 ± 76.096 ns/op
SegmentBulkFill.nativeSegmentFillJava 2097152 avgt 30 33084.699 ± 53.530 ns/op
SegmentBulkFill.nativeSegmentFillJava 16777216 avgt 30 298953.004 ± 11241.262 ns/op
SegmentBulkFill.nativeSegmentFillJava 134217728 avgt 30 2857973.939 ± 128453.291 ns/op
Patch
Benchmark (ELEM_SIZE) Mode Cnt Score Error Units
SegmentBulkFill.nativeSegmentFillJava 2 avgt 30 1.317 ± 0.022 ns/op
SegmentBulkFill.nativeSegmentFillJava 3 avgt 30 1.313 ± 0.006 ns/op
SegmentBulkFill.nativeSegmentFillJava 4 avgt 30 1.319 ± 0.018 ns/op
SegmentBulkFill.nativeSegmentFillJava 5 avgt 30 1.317 ± 0.019 ns/op
SegmentBulkFill.nativeSegmentFillJava 6 avgt 30 1.316 ± 0.016 ns/op
SegmentBulkFill.nativeSegmentFillJava 7 avgt 30 1.320 ± 0.019 ns/op
SegmentBulkFill.nativeSegmentFillJava 8 avgt 30 2.239 ± 0.047 ns/op
SegmentBulkFill.nativeSegmentFillJava 64 avgt 30 3.487 ± 0.074 ns/op
SegmentBulkFill.nativeSegmentFillJava 512 avgt 30 6.659 ± 0.102 ns/op
SegmentBulkFill.nativeSegmentFillJava 4096 avgt 30 44.461 ± 0.666 ns/op
SegmentBulkFill.nativeSegmentFillJava 32768 avgt 30 331.159 ± 5.928 ns/op
SegmentBulkFill.nativeSegmentFillJava 262144 avgt 30 4171.649 ± 60.867 ns/op
SegmentBulkFill.nativeSegmentFillJava 2097152 avgt 30 34718.817 ± 697.494 ns/op
SegmentBulkFill.nativeSegmentFillJava 16777216 avgt 30 305446.597 ± 11087.702 ns/op
SegmentBulkFill.nativeSegmentFillJava 134217728 avgt 30 2905051.303 ± 114905.125 ns/op

-------------
PR Comment: https://git.openjdk.org/jdk/pull/25383#issuecomment-2900213674
More information about the core-libs-dev
mailing list