RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v4]

Dean Long dlong at openjdk.org
Thu Jul 10 22:45:42 UTC 2025


On Wed, 9 Jul 2025 10:07:31 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

>> Hi, please consider.
>> [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V.
>> The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones.
>> If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node.
>> This may lead to incorrect selection of the arraycopy function when `-XX:+UseCompactObjectHeaders` is enabled, causing the `unaligned` flag to be set for arraycopy.
>> We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled.
>> 
>> This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. 
>> The other platforms are not affected as the flag is always `0` when `OmitChecksFlag` is true. 
>> 
>> Test on linux-riscv64:
>> - [x] Tier1-3
>> 
>> JMH data on P550 SBC for reference (w/o and w/ the patch):
>> 
>> Before:
>> 
>> Without COH:
>> 
>> Benchmark                 (size)  Mode  Cnt     Score   Error  Units
>> ArrayClone.byteArraycopy       0  avgt   15    50.854 ± 0.379  ns/op
>> ArrayClone.byteArraycopy      10  avgt   15    74.294 ± 0.449  ns/op
>> ArrayClone.byteArraycopy     100  avgt   15    81.847 ± 0.082  ns/op
>> ArrayClone.byteArraycopy    1000  avgt   15   480.106 ± 0.369  ns/op
>> ArrayClone.byteClone           0  avgt   15    90.146 ± 0.299  ns/op
>> ArrayClone.byteClone          10  avgt   15   130.525 ± 0.384  ns/op
>> ArrayClone.byteClone         100  avgt   15   251.942 ± 0.122  ns/op
>> ArrayClone.byteClone        1000  avgt   15   407.580 ± 0.318  ns/op
>> ArrayClone.intArraycopy        0  avgt   15    49.984 ± 0.436  ns/op
>> ArrayClone.intArraycopy       10  avgt   15    76.302 ± 1.388  ns/op
>> ArrayClone.intArraycopy      100  avgt   15   267.487 ± 0.329  ns/op
>> ArrayClone.intArraycopy     1000  avgt   15  1157.444 ± 1.588  ns/op
>> ArrayClone.intClone            0  avgt   15    90.130 ± 0.257  ns/op
>> ArrayClone.intClone           10  avgt   15   183.619 ± 0.588  ns/op
>> ArrayClone.intClone          100  avgt   15   296.491 ± 0.246  ns/op
>> ArrayClone.intClone         1000  avgt   15   828.695 ± 1.501  ns/op
>> 
>> -------------------------------------------------------------------------
>> With COH:
>> 
>> Benchmark                 (size)  Mode  Cnt       Score      Error  Un...
>
> Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - Revert RISCV Macro modification
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses
>  - riscv: fix c1 primitive array clone intrinsic regression

src/hotspot/cpu/riscv/c1_LIRGenerator_riscv.cpp line 775:

> 773:   arraycopy_helper(x, &flags, &expected_type);
> 774:   if (x->check_flag(Instruction::OmitChecksFlag)) {
> 775:     flags = (flags & LIR_OpArrayCopy::unaligned);

Should be LIR_OpArrayCopy::unaligned|LIR_OpArrayCopy::overlapping? See below.

src/hotspot/share/c1/c1_LIR.cpp line 353:

> 351:   , _expected_type(expected_type)
> 352:   , _flags(flags) {
> 353: #if defined(X86) || defined(AARCH64) || defined(S390) || defined(RISCV64) || defined(PPC64)

Do we still need this #if?  It would be nice if we can eventually remove it, but I guess arm32 support is missing.

src/hotspot/share/c1/c1_LIR.cpp line 354:

> 352:   , _flags(flags) {
> 353: #if defined(X86) || defined(AARCH64) || defined(S390) || defined(RISCV64) || defined(PPC64)
> 354:   if (expected_type != nullptr && ((flags & ~LIR_OpArrayCopy::unaligned) == 0)) {

I was concerned that this is platform-specific, but I checked and all platforms can handle unaligned or overlapping w/o using the stub.  So maybe this should be using LIR_OpArrayCopy::unaligned|LIR_OpArrayCopy::overlapping?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25976#discussion_r2198916477
PR Review Comment: https://git.openjdk.org/jdk/pull/25976#discussion_r2198915263
PR Review Comment: https://git.openjdk.org/jdk/pull/25976#discussion_r2198911343


More information about the hotspot-compiler-dev mailing list