RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154

Feilong Jiang fjiang at openjdk.org
Wed Jun 25 12:44:37 UTC 2025


Hi, please consider.
[JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V.
The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones.
If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node.
This may lead to incorrect selection of the arraycopy function when `unaligned` flag for arraycopy is set.
We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled.

This pr adds additional checks for unaligned case on RISC-V to ensure the arraycopy function is selected correctly.


JMH data on P550 SBC for reference (w/o and w/ the patch):

Before:

Without COH:

Benchmark                 (size)  Mode  Cnt     Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    50.854 ± 0.379  ns/op
ArrayClone.byteArraycopy      10  avgt   15    74.294 ± 0.449  ns/op
ArrayClone.byteArraycopy     100  avgt   15    81.847 ± 0.082  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   480.106 ± 0.369  ns/op
ArrayClone.byteClone           0  avgt   15    90.146 ± 0.299  ns/op
ArrayClone.byteClone          10  avgt   15   130.525 ± 0.384  ns/op
ArrayClone.byteClone         100  avgt   15   251.942 ± 0.122  ns/op
ArrayClone.byteClone        1000  avgt   15   407.580 ± 0.318  ns/op
ArrayClone.intArraycopy        0  avgt   15    49.984 ± 0.436  ns/op
ArrayClone.intArraycopy       10  avgt   15    76.302 ± 1.388  ns/op
ArrayClone.intArraycopy      100  avgt   15   267.487 ± 0.329  ns/op
ArrayClone.intArraycopy     1000  avgt   15  1157.444 ± 1.588  ns/op
ArrayClone.intClone            0  avgt   15    90.130 ± 0.257  ns/op
ArrayClone.intClone           10  avgt   15   183.619 ± 0.588  ns/op
ArrayClone.intClone          100  avgt   15   296.491 ± 0.246  ns/op
ArrayClone.intClone         1000  avgt   15   828.695 ± 1.501  ns/op

-------------------------------------------------------------------------
With COH:

Benchmark                 (size)  Mode  Cnt       Score      Error  Units
ArrayClone.byteArraycopy       0  avgt   15      50.667 ±    0.622  ns/op
ArrayClone.byteArraycopy      10  avgt   15      76.917 ±    0.914  ns/op
ArrayClone.byteArraycopy     100  avgt   15      82.928 ±    0.056  ns/op
ArrayClone.byteArraycopy    1000  avgt   15     485.806 ±    0.653  ns/op
ArrayClone.byteClone           0  avgt   15      90.417 ±    1.059  ns/op
ArrayClone.byteClone          10  avgt   15    1634.691 ±    9.870  ns/op
ArrayClone.byteClone         100  avgt   15   18637.149 ±   30.985  ns/op
ArrayClone.byteClone        1000  avgt   15  193437.253 ±  435.771  ns/op
ArrayClone.intArraycopy        0  avgt   15      50.475 ±    0.545  ns/op
ArrayClone.intArraycopy       10  avgt   15      77.515 ±    0.958  ns/op
ArrayClone.intArraycopy      100  avgt   15     264.586 ±    0.237  ns/op
ArrayClone.intArraycopy     1000  avgt   15    1160.459 ±    1.394  ns/op
ArrayClone.intClone            0  avgt   15      90.776 ±    0.309  ns/op
ArrayClone.intClone           10  avgt   15    7794.589 ±   13.752  ns/op
ArrayClone.intClone          100  avgt   15   77303.097 ±  154.991  ns/op
ArrayClone.intClone         1000  avgt   15  773291.729 ± 1505.788  ns/op


After:

Without COH:

Benchmark                 (size)  Mode  Cnt     Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    49.421 ± 0.588  ns/op
ArrayClone.byteArraycopy      10  avgt   15    71.687 ± 0.828  ns/op
ArrayClone.byteArraycopy     100  avgt   15    82.570 ± 0.068  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   478.411 ± 0.505  ns/op
ArrayClone.byteClone           0  avgt   15    90.660 ± 0.314  ns/op
ArrayClone.byteClone          10  avgt   15   131.243 ± 0.407  ns/op
ArrayClone.byteClone         100  avgt   15   251.823 ± 0.192  ns/op
ArrayClone.byteClone        1000  avgt   15   404.857 ± 1.985  ns/op
ArrayClone.intArraycopy        0  avgt   15    49.672 ± 0.466  ns/op
ArrayClone.intArraycopy       10  avgt   15    78.996 ± 1.522  ns/op
ArrayClone.intArraycopy      100  avgt   15   263.690 ± 0.175  ns/op
ArrayClone.intArraycopy     1000  avgt   15  1155.155 ± 2.549  ns/op
ArrayClone.intClone            0  avgt   15    90.495 ± 0.296  ns/op
ArrayClone.intClone           10  avgt   15   184.500 ± 0.554  ns/op
ArrayClone.intClone          100  avgt   15   294.608 ± 0.139  ns/op
ArrayClone.intClone         1000  avgt   15   817.005 ± 0.551  ns/op

-------------------------------------------------------------------------

With COH:
Benchmark                 (size)  Mode  Cnt     Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    51.322 ± 0.519  ns/op
ArrayClone.byteArraycopy      10  avgt   15    76.479 ± 0.679  ns/op
ArrayClone.byteArraycopy     100  avgt   15    82.936 ± 0.060  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   487.030 ± 0.464  ns/op
ArrayClone.byteClone           0  avgt   15    89.688 ± 0.276  ns/op
ArrayClone.byteClone          10  avgt   15   109.446 ± 0.379  ns/op
ArrayClone.byteClone         100  avgt   15   221.747 ± 0.176  ns/op
ArrayClone.byteClone        1000  avgt   15   430.846 ± 0.370  ns/op
ArrayClone.intArraycopy        0  avgt   15    50.534 ± 0.524  ns/op
ArrayClone.intArraycopy       10  avgt   15    78.986 ± 1.341  ns/op
ArrayClone.intArraycopy      100  avgt   15   263.473 ± 0.168  ns/op
ArrayClone.intArraycopy     1000  avgt   15  1155.394 ± 1.396  ns/op
ArrayClone.intClone            0  avgt   15    89.698 ± 0.217  ns/op
ArrayClone.intClone           10  avgt   15   185.278 ± 0.673  ns/op
ArrayClone.intClone          100  avgt   15   375.374 ± 0.200  ns/op
ArrayClone.intClone         1000  avgt   15   872.398 ± 1.780  ns/op

-------------

Commit messages:
 - riscv: fix c1 primitive array clone intrinsic regression

Changes: https://git.openjdk.org/jdk/pull/25976/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8360520
  Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/25976.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25976/head:pull/25976

PR: https://git.openjdk.org/jdk/pull/25976


More information about the hotspot-compiler-dev mailing list