RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154
Feilong Jiang
fjiang at openjdk.org
Wed Jun 25 12:44:37 UTC 2025
Hi, please consider.
[JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V.
The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones.
If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node.
This may lead to incorrect selection of the arraycopy function when `unaligned` flag for arraycopy is set.
We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled.
This pr adds additional checks for unaligned case on RISC-V to ensure the arraycopy function is selected correctly.
JMH data on P550 SBC for reference (w/o and w/ the patch):
Before:
Without COH:
Benchmark (size) Mode Cnt Score Error Units
ArrayClone.byteArraycopy 0 avgt 15 50.854 ± 0.379 ns/op
ArrayClone.byteArraycopy 10 avgt 15 74.294 ± 0.449 ns/op
ArrayClone.byteArraycopy 100 avgt 15 81.847 ± 0.082 ns/op
ArrayClone.byteArraycopy 1000 avgt 15 480.106 ± 0.369 ns/op
ArrayClone.byteClone 0 avgt 15 90.146 ± 0.299 ns/op
ArrayClone.byteClone 10 avgt 15 130.525 ± 0.384 ns/op
ArrayClone.byteClone 100 avgt 15 251.942 ± 0.122 ns/op
ArrayClone.byteClone 1000 avgt 15 407.580 ± 0.318 ns/op
ArrayClone.intArraycopy 0 avgt 15 49.984 ± 0.436 ns/op
ArrayClone.intArraycopy 10 avgt 15 76.302 ± 1.388 ns/op
ArrayClone.intArraycopy 100 avgt 15 267.487 ± 0.329 ns/op
ArrayClone.intArraycopy 1000 avgt 15 1157.444 ± 1.588 ns/op
ArrayClone.intClone 0 avgt 15 90.130 ± 0.257 ns/op
ArrayClone.intClone 10 avgt 15 183.619 ± 0.588 ns/op
ArrayClone.intClone 100 avgt 15 296.491 ± 0.246 ns/op
ArrayClone.intClone 1000 avgt 15 828.695 ± 1.501 ns/op
-------------------------------------------------------------------------
With COH:
Benchmark (size) Mode Cnt Score Error Units
ArrayClone.byteArraycopy 0 avgt 15 50.667 ± 0.622 ns/op
ArrayClone.byteArraycopy 10 avgt 15 76.917 ± 0.914 ns/op
ArrayClone.byteArraycopy 100 avgt 15 82.928 ± 0.056 ns/op
ArrayClone.byteArraycopy 1000 avgt 15 485.806 ± 0.653 ns/op
ArrayClone.byteClone 0 avgt 15 90.417 ± 1.059 ns/op
ArrayClone.byteClone 10 avgt 15 1634.691 ± 9.870 ns/op
ArrayClone.byteClone 100 avgt 15 18637.149 ± 30.985 ns/op
ArrayClone.byteClone 1000 avgt 15 193437.253 ± 435.771 ns/op
ArrayClone.intArraycopy 0 avgt 15 50.475 ± 0.545 ns/op
ArrayClone.intArraycopy 10 avgt 15 77.515 ± 0.958 ns/op
ArrayClone.intArraycopy 100 avgt 15 264.586 ± 0.237 ns/op
ArrayClone.intArraycopy 1000 avgt 15 1160.459 ± 1.394 ns/op
ArrayClone.intClone 0 avgt 15 90.776 ± 0.309 ns/op
ArrayClone.intClone 10 avgt 15 7794.589 ± 13.752 ns/op
ArrayClone.intClone 100 avgt 15 77303.097 ± 154.991 ns/op
ArrayClone.intClone 1000 avgt 15 773291.729 ± 1505.788 ns/op
After:
Without COH:
Benchmark (size) Mode Cnt Score Error Units
ArrayClone.byteArraycopy 0 avgt 15 49.421 ± 0.588 ns/op
ArrayClone.byteArraycopy 10 avgt 15 71.687 ± 0.828 ns/op
ArrayClone.byteArraycopy 100 avgt 15 82.570 ± 0.068 ns/op
ArrayClone.byteArraycopy 1000 avgt 15 478.411 ± 0.505 ns/op
ArrayClone.byteClone 0 avgt 15 90.660 ± 0.314 ns/op
ArrayClone.byteClone 10 avgt 15 131.243 ± 0.407 ns/op
ArrayClone.byteClone 100 avgt 15 251.823 ± 0.192 ns/op
ArrayClone.byteClone 1000 avgt 15 404.857 ± 1.985 ns/op
ArrayClone.intArraycopy 0 avgt 15 49.672 ± 0.466 ns/op
ArrayClone.intArraycopy 10 avgt 15 78.996 ± 1.522 ns/op
ArrayClone.intArraycopy 100 avgt 15 263.690 ± 0.175 ns/op
ArrayClone.intArraycopy 1000 avgt 15 1155.155 ± 2.549 ns/op
ArrayClone.intClone 0 avgt 15 90.495 ± 0.296 ns/op
ArrayClone.intClone 10 avgt 15 184.500 ± 0.554 ns/op
ArrayClone.intClone 100 avgt 15 294.608 ± 0.139 ns/op
ArrayClone.intClone 1000 avgt 15 817.005 ± 0.551 ns/op
-------------------------------------------------------------------------
With COH:
Benchmark (size) Mode Cnt Score Error Units
ArrayClone.byteArraycopy 0 avgt 15 51.322 ± 0.519 ns/op
ArrayClone.byteArraycopy 10 avgt 15 76.479 ± 0.679 ns/op
ArrayClone.byteArraycopy 100 avgt 15 82.936 ± 0.060 ns/op
ArrayClone.byteArraycopy 1000 avgt 15 487.030 ± 0.464 ns/op
ArrayClone.byteClone 0 avgt 15 89.688 ± 0.276 ns/op
ArrayClone.byteClone 10 avgt 15 109.446 ± 0.379 ns/op
ArrayClone.byteClone 100 avgt 15 221.747 ± 0.176 ns/op
ArrayClone.byteClone 1000 avgt 15 430.846 ± 0.370 ns/op
ArrayClone.intArraycopy 0 avgt 15 50.534 ± 0.524 ns/op
ArrayClone.intArraycopy 10 avgt 15 78.986 ± 1.341 ns/op
ArrayClone.intArraycopy 100 avgt 15 263.473 ± 0.168 ns/op
ArrayClone.intArraycopy 1000 avgt 15 1155.394 ± 1.396 ns/op
ArrayClone.intClone 0 avgt 15 89.698 ± 0.217 ns/op
ArrayClone.intClone 10 avgt 15 185.278 ± 0.673 ns/op
ArrayClone.intClone 100 avgt 15 375.374 ± 0.200 ns/op
ArrayClone.intClone 1000 avgt 15 872.398 ± 1.780 ns/op
-------------
Commit messages:
- riscv: fix c1 primitive array clone intrinsic regression
Changes: https://git.openjdk.org/jdk/pull/25976/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8360520
Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod
Patch: https://git.openjdk.org/jdk/pull/25976.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/25976/head:pull/25976
PR: https://git.openjdk.org/jdk/pull/25976
More information about the hotspot-compiler-dev
mailing list