RFR: 8357460: RISC-V: Optimize array fill stub for small size [v2]

Feilong Jiang fjiang at openjdk.org
Fri May 23 15:38:36 UTC 2025


On Fri, 23 May 2025 06:19:37 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

>> Please consider.
>> As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8.
>> 
>> This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions.
>> 
>> 
>> Before:
>> Benchmark                 (size)  Mode  Cnt   Score   Error  Units
>> ArrayFill.fillByteArray        7  avgt   12  27.215 ± 0.073  ns/op
>> ArrayFill.fillByteArray       15  avgt   12  32.687 ± 0.904  ns/op
>> ArrayFill.fillIntArray         7  avgt   12  28.629 ± 0.006  ns/op
>> ArrayFill.fillIntArray        15  avgt   12  29.351 ± 0.009  ns/op
>> ArrayFill.fillShortArray       7  avgt   12  30.776 ± 0.006  ns/op
>> ArrayFill.fillShortArray      15  avgt   12  31.724 ± 0.447  ns/op
>> ArrayFill.zeroByteArray        7  avgt   12  27.199 ± 0.006  ns/op
>> ArrayFill.zeroByteArray       15  avgt   12  32.685 ± 0.900  ns/op
>> ArrayFill.zeroIntArray         7  avgt   12  28.630 ± 0.007  ns/op
>> ArrayFill.zeroIntArray        15  avgt   12  29.352 ± 0.011  ns/op
>> ArrayFill.zeroShortArray       7  avgt   12  30.776 ± 0.006  ns/op
>> ArrayFill.zeroShortArray      15  avgt   12  31.497 ± 0.012  ns/op
>> 
>> After:
>> Benchmark                 (size)  Mode  Cnt   Score   Error  Units
>> ArrayFill.fillByteArray        7  avgt   12  20.137 ± 0.042  ns/op
>> ArrayFill.fillByteArray       15  avgt   12  32.928 ± 0.004  ns/op
>> ArrayFill.fillIntArray         7  avgt   12  28.630 ± 0.004  ns/op
>> ArrayFill.fillIntArray        15  avgt   12  29.344 ± 0.005  ns/op
>> ArrayFill.fillShortArray       7  avgt   12  31.494 ± 0.004  ns/op
>> ArrayFill.fillShortArray      15  avgt   12  31.492 ± 0.008  ns/op
>> ArrayFill.zeroByteArray        7  avgt   12  19.980 ± 0.164  ns/op
>> ArrayFill.zeroByteArray       15  avgt   12  32.927 ± 0.004  ns/op
>> ArrayFill.zeroIntArray         7  avgt   12  28.629 ± 0.005  ns/op
>> ArrayFill.zeroIntArray        15  avgt   12  29.346 ± 0.006  ns/op
>> ArrayFill.zeroShortArray       7  avgt   12  32.193 ± 0.027  ns/op
>> ArrayFill.zeroShortArray      15  avgt   12  31.495 ± 0.010  ns/op
>> 
>> 
>> Testing:
>> - [x] tier1
>
> Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into riscv-optimize-generate-fill
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill
>  - optimize array fill stub for small size

Here is the `ArrayFill` jmh result for the length from 1-7:
Before:

Benchmark                 (size)  Mode  Cnt   Score   Error  Units
ArrayFill.fillByteArray        1  avgt   12  20.052 ± 0.014  ns/op
ArrayFill.fillByteArray        2  avgt   12  19.977 ± 0.049  ns/op
ArrayFill.fillByteArray        3  avgt   12  21.474 ± 0.005  ns/op
ArrayFill.fillByteArray        4  avgt   12  22.904 ± 0.005  ns/op
ArrayFill.fillByteArray        5  avgt   12  24.336 ± 0.005  ns/op
ArrayFill.fillByteArray        6  avgt   12  25.764 ± 0.001  ns/op
ArrayFill.fillByteArray        7  avgt   12  27.199 ± 0.005  ns/op
ArrayFill.fillByteArray       15  avgt   12  32.210 ± 0.005  ns/op
ArrayFill.fillIntArray         1  avgt   12  21.191 ± 1.095  ns/op
ArrayFill.fillIntArray         2  avgt   12  27.913 ± 0.004  ns/op
ArrayFill.fillIntArray         3  avgt   12  28.628 ± 0.002  ns/op
ArrayFill.fillIntArray         4  avgt   12  29.346 ± 0.005  ns/op
ArrayFill.fillIntArray         5  avgt   12  29.348 ± 0.004  ns/op
ArrayFill.fillIntArray         6  avgt   12  28.629 ± 0.005  ns/op
ArrayFill.fillIntArray         7  avgt   12  28.636 ± 0.013  ns/op
ArrayFill.fillIntArray        15  avgt   12  29.345 ± 0.007  ns/op
ArrayFill.fillShortArray       1  avgt   12  19.474 ± 0.065  ns/op
ArrayFill.fillShortArray       2  avgt   12  19.338 ± 0.058  ns/op
ArrayFill.fillShortArray       3  avgt   12  20.143 ± 0.192  ns/op
ArrayFill.fillShortArray       4  avgt   12  30.776 ± 0.004  ns/op
ArrayFill.fillShortArray       5  avgt   12  30.778 ± 0.004  ns/op
ArrayFill.fillShortArray       6  avgt   12  30.776 ± 0.006  ns/op
ArrayFill.fillShortArray       7  avgt   12  30.779 ± 0.004  ns/op
ArrayFill.fillShortArray      15  avgt   12  31.495 ± 0.005  ns/op
ArrayFill.zeroByteArray        1  avgt   12  19.690 ± 0.288  ns/op
ArrayFill.zeroByteArray        2  avgt   12  19.884 ± 0.093  ns/op
ArrayFill.zeroByteArray        3  avgt   12  21.475 ± 0.005  ns/op
ArrayFill.zeroByteArray        4  avgt   12  22.905 ± 0.005  ns/op
ArrayFill.zeroByteArray        5  avgt   12  24.337 ± 0.005  ns/op
ArrayFill.zeroByteArray        6  avgt   12  25.772 ± 0.011  ns/op
ArrayFill.zeroByteArray        7  avgt   12  27.199 ± 0.004  ns/op
ArrayFill.zeroByteArray       15  avgt   12  32.209 ± 0.005  ns/op
ArrayFill.zeroIntArray         1  avgt   12  19.609 ± 0.414  ns/op
ArrayFill.zeroIntArray         2  avgt   12  27.919 ± 0.006  ns/op
ArrayFill.zeroIntArray         3  avgt   12  28.631 ± 0.005  ns/op
ArrayFill.zeroIntArray         4  avgt   12  29.353 ± 0.014  ns/op
ArrayFill.zeroIntArray         5  avgt   12  29.345 ± 0.005  ns/op
ArrayFill.zeroIntArray         6  avgt   12  28.632 ± 0.005  ns/op
ArrayFill.zeroIntArray         7  avgt   12  28.630 ± 0.004  ns/op
ArrayFill.zeroIntArray        15  avgt   12  29.362 ± 0.030  ns/op
ArrayFill.zeroShortArray       1  avgt   12  20.099 ± 0.102  ns/op
ArrayFill.zeroShortArray       2  avgt   12  19.563 ± 0.452  ns/op
ArrayFill.zeroShortArray       3  avgt   12  20.198 ± 0.443  ns/op
ArrayFill.zeroShortArray       4  avgt   12  30.776 ± 0.004  ns/op
ArrayFill.zeroShortArray       5  avgt   12  30.775 ± 0.004  ns/op
ArrayFill.zeroShortArray       6  avgt   12  30.777 ± 0.006  ns/op
ArrayFill.zeroShortArray       7  avgt   12  30.776 ± 0.005  ns/op
ArrayFill.zeroShortArray      15  avgt   12  31.492 ± 0.005  ns/op


After:

Benchmark                 (size)  Mode  Cnt   Score   Error  Units
ArrayFill.fillByteArray        1  avgt   12  19.442 ± 0.031  ns/op
ArrayFill.fillByteArray        2  avgt   12  19.324 ± 0.001  ns/op
ArrayFill.fillByteArray        3  avgt   12  19.326 ± 0.003  ns/op
ArrayFill.fillByteArray        4  avgt   12  19.324 ± 0.002  ns/op
ArrayFill.fillByteArray        5  avgt   12  19.566 ± 0.452  ns/op
ArrayFill.fillByteArray        6  avgt   12  19.327 ± 0.004  ns/op
ArrayFill.fillByteArray        7  avgt   12  20.146 ± 0.039  ns/op
ArrayFill.fillByteArray       15  avgt   12  32.924 ± 0.005  ns/op
ArrayFill.fillIntArray         1  avgt   12  20.040 ± 0.003  ns/op
ArrayFill.fillIntArray         2  avgt   12  28.151 ± 0.449  ns/op
ArrayFill.fillIntArray         3  avgt   12  28.634 ± 0.003  ns/op
ArrayFill.fillIntArray         4  avgt   12  29.348 ± 0.005  ns/op
ArrayFill.fillIntArray         5  avgt   12  29.338 ± 0.010  ns/op
ArrayFill.fillIntArray         6  avgt   12  28.631 ± 0.007  ns/op
ArrayFill.fillIntArray         7  avgt   12  28.629 ± 0.005  ns/op
ArrayFill.fillIntArray        15  avgt   12  29.347 ± 0.006  ns/op
ArrayFill.fillShortArray       1  avgt   12  20.675 ± 0.058  ns/op
ArrayFill.fillShortArray       2  avgt   12  20.624 ± 0.942  ns/op
ArrayFill.fillShortArray       3  avgt   12  19.852 ± 0.337  ns/op
ArrayFill.fillShortArray       4  avgt   12  30.777 ± 0.005  ns/op
ArrayFill.fillShortArray       5  avgt   12  30.538 ± 0.453  ns/op
ArrayFill.fillShortArray       6  avgt   12  30.776 ± 0.005  ns/op
ArrayFill.fillShortArray       7  avgt   12  31.493 ± 0.004  ns/op
ArrayFill.fillShortArray      15  avgt   12  31.494 ± 0.005  ns/op
ArrayFill.zeroByteArray        1  avgt   12  19.423 ± 0.018  ns/op
ArrayFill.zeroByteArray        2  avgt   12  19.327 ± 0.003  ns/op
ArrayFill.zeroByteArray        3  avgt   12  19.327 ± 0.003  ns/op
ArrayFill.zeroByteArray        4  avgt   12  19.327 ± 0.003  ns/op
ArrayFill.zeroByteArray        5  avgt   12  19.802 ± 0.452  ns/op
ArrayFill.zeroByteArray        6  avgt   12  19.326 ± 0.003  ns/op
ArrayFill.zeroByteArray        7  avgt   12  19.891 ± 0.139  ns/op
ArrayFill.zeroByteArray       15  avgt   12  33.170 ± 0.464  ns/op
ArrayFill.zeroIntArray         1  avgt   12  19.983 ± 0.112  ns/op
ArrayFill.zeroIntArray         2  avgt   12  27.914 ± 0.004  ns/op
ArrayFill.zeroIntArray         3  avgt   12  28.629 ± 0.004  ns/op
ArrayFill.zeroIntArray         4  avgt   12  29.346 ± 0.004  ns/op
ArrayFill.zeroIntArray         5  avgt   12  29.346 ± 0.005  ns/op
ArrayFill.zeroIntArray         6  avgt   12  28.629 ± 0.003  ns/op
ArrayFill.zeroIntArray         7  avgt   12  28.627 ± 0.003  ns/op
ArrayFill.zeroIntArray        15  avgt   12  29.354 ± 0.018  ns/op
ArrayFill.zeroShortArray       1  avgt   12  19.818 ± 0.339  ns/op
ArrayFill.zeroShortArray       2  avgt   12  19.325 ± 0.003  ns/op
ArrayFill.zeroShortArray       3  avgt   12  19.325 ± 0.003  ns/op
ArrayFill.zeroShortArray       4  avgt   12  30.777 ± 0.005  ns/op
ArrayFill.zeroShortArray       5  avgt   12  30.777 ± 0.005  ns/op
ArrayFill.zeroShortArray       6  avgt   12  30.776 ± 0.006  ns/op
ArrayFill.zeroShortArray       7  avgt   12  31.732 ± 0.905  ns/op
ArrayFill.zeroShortArray      15  avgt   12  31.492 ± 0.003  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25350#issuecomment-2904829785


More information about the hotspot-compiler-dev mailing list