RFR: 8357460: RISC-V: Optimize array fill stub for small size [v2]
Feilong Jiang
fjiang at openjdk.org
Fri May 23 15:38:36 UTC 2025
On Fri, 23 May 2025 06:19:37 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:
>> Please consider.
>> As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8.
>>
>> This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions.
>>
>>
>> Before:
>> Benchmark (size) Mode Cnt Score Error Units
>> ArrayFill.fillByteArray 7 avgt 12 27.215 ± 0.073 ns/op
>> ArrayFill.fillByteArray 15 avgt 12 32.687 ± 0.904 ns/op
>> ArrayFill.fillIntArray 7 avgt 12 28.629 ± 0.006 ns/op
>> ArrayFill.fillIntArray 15 avgt 12 29.351 ± 0.009 ns/op
>> ArrayFill.fillShortArray 7 avgt 12 30.776 ± 0.006 ns/op
>> ArrayFill.fillShortArray 15 avgt 12 31.724 ± 0.447 ns/op
>> ArrayFill.zeroByteArray 7 avgt 12 27.199 ± 0.006 ns/op
>> ArrayFill.zeroByteArray 15 avgt 12 32.685 ± 0.900 ns/op
>> ArrayFill.zeroIntArray 7 avgt 12 28.630 ± 0.007 ns/op
>> ArrayFill.zeroIntArray 15 avgt 12 29.352 ± 0.011 ns/op
>> ArrayFill.zeroShortArray 7 avgt 12 30.776 ± 0.006 ns/op
>> ArrayFill.zeroShortArray 15 avgt 12 31.497 ± 0.012 ns/op
>>
>> After:
>> Benchmark (size) Mode Cnt Score Error Units
>> ArrayFill.fillByteArray 7 avgt 12 20.137 ± 0.042 ns/op
>> ArrayFill.fillByteArray 15 avgt 12 32.928 ± 0.004 ns/op
>> ArrayFill.fillIntArray 7 avgt 12 28.630 ± 0.004 ns/op
>> ArrayFill.fillIntArray 15 avgt 12 29.344 ± 0.005 ns/op
>> ArrayFill.fillShortArray 7 avgt 12 31.494 ± 0.004 ns/op
>> ArrayFill.fillShortArray 15 avgt 12 31.492 ± 0.008 ns/op
>> ArrayFill.zeroByteArray 7 avgt 12 19.980 ± 0.164 ns/op
>> ArrayFill.zeroByteArray 15 avgt 12 32.927 ± 0.004 ns/op
>> ArrayFill.zeroIntArray 7 avgt 12 28.629 ± 0.005 ns/op
>> ArrayFill.zeroIntArray 15 avgt 12 29.346 ± 0.006 ns/op
>> ArrayFill.zeroShortArray 7 avgt 12 32.193 ± 0.027 ns/op
>> ArrayFill.zeroShortArray 15 avgt 12 31.495 ± 0.010 ns/op
>>
>>
>> Testing:
>> - [x] tier1
>
> Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
> - Merge branch 'openjdk:master' into riscv-optimize-generate-fill
> - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill
> - optimize array fill stub for small size
Here is the `ArrayFill` jmh result for the length from 1-7:
Before:
Benchmark (size) Mode Cnt Score Error Units
ArrayFill.fillByteArray 1 avgt 12 20.052 ± 0.014 ns/op
ArrayFill.fillByteArray 2 avgt 12 19.977 ± 0.049 ns/op
ArrayFill.fillByteArray 3 avgt 12 21.474 ± 0.005 ns/op
ArrayFill.fillByteArray 4 avgt 12 22.904 ± 0.005 ns/op
ArrayFill.fillByteArray 5 avgt 12 24.336 ± 0.005 ns/op
ArrayFill.fillByteArray 6 avgt 12 25.764 ± 0.001 ns/op
ArrayFill.fillByteArray 7 avgt 12 27.199 ± 0.005 ns/op
ArrayFill.fillByteArray 15 avgt 12 32.210 ± 0.005 ns/op
ArrayFill.fillIntArray 1 avgt 12 21.191 ± 1.095 ns/op
ArrayFill.fillIntArray 2 avgt 12 27.913 ± 0.004 ns/op
ArrayFill.fillIntArray 3 avgt 12 28.628 ± 0.002 ns/op
ArrayFill.fillIntArray 4 avgt 12 29.346 ± 0.005 ns/op
ArrayFill.fillIntArray 5 avgt 12 29.348 ± 0.004 ns/op
ArrayFill.fillIntArray 6 avgt 12 28.629 ± 0.005 ns/op
ArrayFill.fillIntArray 7 avgt 12 28.636 ± 0.013 ns/op
ArrayFill.fillIntArray 15 avgt 12 29.345 ± 0.007 ns/op
ArrayFill.fillShortArray 1 avgt 12 19.474 ± 0.065 ns/op
ArrayFill.fillShortArray 2 avgt 12 19.338 ± 0.058 ns/op
ArrayFill.fillShortArray 3 avgt 12 20.143 ± 0.192 ns/op
ArrayFill.fillShortArray 4 avgt 12 30.776 ± 0.004 ns/op
ArrayFill.fillShortArray 5 avgt 12 30.778 ± 0.004 ns/op
ArrayFill.fillShortArray 6 avgt 12 30.776 ± 0.006 ns/op
ArrayFill.fillShortArray 7 avgt 12 30.779 ± 0.004 ns/op
ArrayFill.fillShortArray 15 avgt 12 31.495 ± 0.005 ns/op
ArrayFill.zeroByteArray 1 avgt 12 19.690 ± 0.288 ns/op
ArrayFill.zeroByteArray 2 avgt 12 19.884 ± 0.093 ns/op
ArrayFill.zeroByteArray 3 avgt 12 21.475 ± 0.005 ns/op
ArrayFill.zeroByteArray 4 avgt 12 22.905 ± 0.005 ns/op
ArrayFill.zeroByteArray 5 avgt 12 24.337 ± 0.005 ns/op
ArrayFill.zeroByteArray 6 avgt 12 25.772 ± 0.011 ns/op
ArrayFill.zeroByteArray 7 avgt 12 27.199 ± 0.004 ns/op
ArrayFill.zeroByteArray 15 avgt 12 32.209 ± 0.005 ns/op
ArrayFill.zeroIntArray 1 avgt 12 19.609 ± 0.414 ns/op
ArrayFill.zeroIntArray 2 avgt 12 27.919 ± 0.006 ns/op
ArrayFill.zeroIntArray 3 avgt 12 28.631 ± 0.005 ns/op
ArrayFill.zeroIntArray 4 avgt 12 29.353 ± 0.014 ns/op
ArrayFill.zeroIntArray 5 avgt 12 29.345 ± 0.005 ns/op
ArrayFill.zeroIntArray 6 avgt 12 28.632 ± 0.005 ns/op
ArrayFill.zeroIntArray 7 avgt 12 28.630 ± 0.004 ns/op
ArrayFill.zeroIntArray 15 avgt 12 29.362 ± 0.030 ns/op
ArrayFill.zeroShortArray 1 avgt 12 20.099 ± 0.102 ns/op
ArrayFill.zeroShortArray 2 avgt 12 19.563 ± 0.452 ns/op
ArrayFill.zeroShortArray 3 avgt 12 20.198 ± 0.443 ns/op
ArrayFill.zeroShortArray 4 avgt 12 30.776 ± 0.004 ns/op
ArrayFill.zeroShortArray 5 avgt 12 30.775 ± 0.004 ns/op
ArrayFill.zeroShortArray 6 avgt 12 30.777 ± 0.006 ns/op
ArrayFill.zeroShortArray 7 avgt 12 30.776 ± 0.005 ns/op
ArrayFill.zeroShortArray 15 avgt 12 31.492 ± 0.005 ns/op
After:
Benchmark (size) Mode Cnt Score Error Units
ArrayFill.fillByteArray 1 avgt 12 19.442 ± 0.031 ns/op
ArrayFill.fillByteArray 2 avgt 12 19.324 ± 0.001 ns/op
ArrayFill.fillByteArray 3 avgt 12 19.326 ± 0.003 ns/op
ArrayFill.fillByteArray 4 avgt 12 19.324 ± 0.002 ns/op
ArrayFill.fillByteArray 5 avgt 12 19.566 ± 0.452 ns/op
ArrayFill.fillByteArray 6 avgt 12 19.327 ± 0.004 ns/op
ArrayFill.fillByteArray 7 avgt 12 20.146 ± 0.039 ns/op
ArrayFill.fillByteArray 15 avgt 12 32.924 ± 0.005 ns/op
ArrayFill.fillIntArray 1 avgt 12 20.040 ± 0.003 ns/op
ArrayFill.fillIntArray 2 avgt 12 28.151 ± 0.449 ns/op
ArrayFill.fillIntArray 3 avgt 12 28.634 ± 0.003 ns/op
ArrayFill.fillIntArray 4 avgt 12 29.348 ± 0.005 ns/op
ArrayFill.fillIntArray 5 avgt 12 29.338 ± 0.010 ns/op
ArrayFill.fillIntArray 6 avgt 12 28.631 ± 0.007 ns/op
ArrayFill.fillIntArray 7 avgt 12 28.629 ± 0.005 ns/op
ArrayFill.fillIntArray 15 avgt 12 29.347 ± 0.006 ns/op
ArrayFill.fillShortArray 1 avgt 12 20.675 ± 0.058 ns/op
ArrayFill.fillShortArray 2 avgt 12 20.624 ± 0.942 ns/op
ArrayFill.fillShortArray 3 avgt 12 19.852 ± 0.337 ns/op
ArrayFill.fillShortArray 4 avgt 12 30.777 ± 0.005 ns/op
ArrayFill.fillShortArray 5 avgt 12 30.538 ± 0.453 ns/op
ArrayFill.fillShortArray 6 avgt 12 30.776 ± 0.005 ns/op
ArrayFill.fillShortArray 7 avgt 12 31.493 ± 0.004 ns/op
ArrayFill.fillShortArray 15 avgt 12 31.494 ± 0.005 ns/op
ArrayFill.zeroByteArray 1 avgt 12 19.423 ± 0.018 ns/op
ArrayFill.zeroByteArray 2 avgt 12 19.327 ± 0.003 ns/op
ArrayFill.zeroByteArray 3 avgt 12 19.327 ± 0.003 ns/op
ArrayFill.zeroByteArray 4 avgt 12 19.327 ± 0.003 ns/op
ArrayFill.zeroByteArray 5 avgt 12 19.802 ± 0.452 ns/op
ArrayFill.zeroByteArray 6 avgt 12 19.326 ± 0.003 ns/op
ArrayFill.zeroByteArray 7 avgt 12 19.891 ± 0.139 ns/op
ArrayFill.zeroByteArray 15 avgt 12 33.170 ± 0.464 ns/op
ArrayFill.zeroIntArray 1 avgt 12 19.983 ± 0.112 ns/op
ArrayFill.zeroIntArray 2 avgt 12 27.914 ± 0.004 ns/op
ArrayFill.zeroIntArray 3 avgt 12 28.629 ± 0.004 ns/op
ArrayFill.zeroIntArray 4 avgt 12 29.346 ± 0.004 ns/op
ArrayFill.zeroIntArray 5 avgt 12 29.346 ± 0.005 ns/op
ArrayFill.zeroIntArray 6 avgt 12 28.629 ± 0.003 ns/op
ArrayFill.zeroIntArray 7 avgt 12 28.627 ± 0.003 ns/op
ArrayFill.zeroIntArray 15 avgt 12 29.354 ± 0.018 ns/op
ArrayFill.zeroShortArray 1 avgt 12 19.818 ± 0.339 ns/op
ArrayFill.zeroShortArray 2 avgt 12 19.325 ± 0.003 ns/op
ArrayFill.zeroShortArray 3 avgt 12 19.325 ± 0.003 ns/op
ArrayFill.zeroShortArray 4 avgt 12 30.777 ± 0.005 ns/op
ArrayFill.zeroShortArray 5 avgt 12 30.777 ± 0.005 ns/op
ArrayFill.zeroShortArray 6 avgt 12 30.776 ± 0.006 ns/op
ArrayFill.zeroShortArray 7 avgt 12 31.732 ± 0.905 ns/op
ArrayFill.zeroShortArray 15 avgt 12 31.492 ± 0.003 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25350#issuecomment-2904829785
More information about the hotspot-compiler-dev
mailing list