RFR: 8357531: The `SegmentBulkOperations::fill` method can be improved using overlaps

Per Minborg pminborg at openjdk.org
Thu May 22 07:46:03 UTC 2025


On Thu, 22 May 2025 07:34:08 GMT, Per Minborg <pminborg at openjdk.org> wrote:

> This PR builds on a concept John Rose told me about some time ago. Instead of combining memory operations of various sizes, a single large and skewed memory operation can be made to clean up the tail of remaining bytes.
> 
> This has the effect of simplifying and shortening the code while improving performance. The number of branches to evaluate is reduced.

Performance on an M1 Mac (Sequoia 15.4.1)

Base:


Benchmark                                   (ELEM_SIZE)  Mode  Cnt        Score        Error  Units
SegmentBulkFill.nativeSegmentFillJava                 2  avgt   30        1.618 ±      0.060  ns/op
SegmentBulkFill.nativeSegmentFillJava                 3  avgt   30        1.602 ±      0.042  ns/op
SegmentBulkFill.nativeSegmentFillJava                 4  avgt   30        1.775 ±      0.070  ns/op
SegmentBulkFill.nativeSegmentFillJava                 5  avgt   30        1.759 ±      0.051  ns/op
SegmentBulkFill.nativeSegmentFillJava                 6  avgt   30        1.771 ±      0.051  ns/op
SegmentBulkFill.nativeSegmentFillJava                 7  avgt   30        1.785 ±      0.049  ns/op
SegmentBulkFill.nativeSegmentFillJava                 8  avgt   30        2.383 ±      0.061  ns/op
SegmentBulkFill.nativeSegmentFillJava                64  avgt   30        4.010 ±      0.255  ns/op
SegmentBulkFill.nativeSegmentFillJava               512  avgt   30        6.622 ±      0.246  ns/op
SegmentBulkFill.nativeSegmentFillJava              4096  avgt   30       44.431 ±      0.832  ns/op
SegmentBulkFill.nativeSegmentFillJava             32768  avgt   30      331.429 ±      3.073  ns/op
SegmentBulkFill.nativeSegmentFillJava            262144  avgt   30     4174.795 ±     76.096  ns/op
SegmentBulkFill.nativeSegmentFillJava           2097152  avgt   30    33084.699 ±     53.530  ns/op
SegmentBulkFill.nativeSegmentFillJava          16777216  avgt   30   298953.004 ±  11241.262  ns/op
SegmentBulkFill.nativeSegmentFillJava         134217728  avgt   30  2857973.939 ± 128453.291  ns/op


Patch

Benchmark                              (ELEM_SIZE)  Mode  Cnt        Score        Error  Units
SegmentBulkFill.nativeSegmentFillJava            2  avgt   30        1.317 ±      0.022  ns/op
SegmentBulkFill.nativeSegmentFillJava            3  avgt   30        1.313 ±      0.006  ns/op
SegmentBulkFill.nativeSegmentFillJava            4  avgt   30        1.319 ±      0.018  ns/op
SegmentBulkFill.nativeSegmentFillJava            5  avgt   30        1.317 ±      0.019  ns/op
SegmentBulkFill.nativeSegmentFillJava            6  avgt   30        1.316 ±      0.016  ns/op
SegmentBulkFill.nativeSegmentFillJava            7  avgt   30        1.320 ±      0.019  ns/op
SegmentBulkFill.nativeSegmentFillJava            8  avgt   30        2.239 ±      0.047  ns/op
SegmentBulkFill.nativeSegmentFillJava           64  avgt   30        3.487 ±      0.074  ns/op
SegmentBulkFill.nativeSegmentFillJava          512  avgt   30        6.659 ±      0.102  ns/op
SegmentBulkFill.nativeSegmentFillJava         4096  avgt   30       44.461 ±      0.666  ns/op
SegmentBulkFill.nativeSegmentFillJava        32768  avgt   30      331.159 ±      5.928  ns/op
SegmentBulkFill.nativeSegmentFillJava       262144  avgt   30     4171.649 ±     60.867  ns/op
SegmentBulkFill.nativeSegmentFillJava      2097152  avgt   30    34718.817 ±    697.494  ns/op
SegmentBulkFill.nativeSegmentFillJava     16777216  avgt   30   305446.597 ±  11087.702  ns/op
SegmentBulkFill.nativeSegmentFillJava    134217728  avgt   30  2905051.303 ± 114905.125  ns/op


![image](https://github.com/user-attachments/assets/df4888ab-67d9-49fe-982b-8018d949cee3)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25383#issuecomment-2900213674


More information about the core-libs-dev mailing list