RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v5]

Galder Zamarreño galder at openjdk.org
Mon Jul 21 08:30:41 UTC 2025


On Fri, 11 Jul 2025 11:50:33 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

>> Hi, please consider.
>> [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V.
>> The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones.
>> If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node.
>> This may lead to incorrect selection of the arraycopy function when `-XX:+UseCompactObjectHeaders` is enabled, causing the `unaligned` flag to be set for arraycopy.
>> We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled.
>> 
>> This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. 
>> The other platforms are not affected as the flag is always `0` when `OmitChecksFlag` is true. 
>> 
>> Test on linux-riscv64:
>> - [x] Tier1-3
>> 
>> JMH data on P550 SBC for reference (w/o and w/ the patch):
>> 
>> Before:
>> 
>> Without COH:
>> 
>> Benchmark                 (size)  Mode  Cnt     Score   Error  Units
>> ArrayClone.byteArraycopy       0  avgt   15    50.854 ± 0.379  ns/op
>> ArrayClone.byteArraycopy      10  avgt   15    74.294 ± 0.449  ns/op
>> ArrayClone.byteArraycopy     100  avgt   15    81.847 ± 0.082  ns/op
>> ArrayClone.byteArraycopy    1000  avgt   15   480.106 ± 0.369  ns/op
>> ArrayClone.byteClone           0  avgt   15    90.146 ± 0.299  ns/op
>> ArrayClone.byteClone          10  avgt   15   130.525 ± 0.384  ns/op
>> ArrayClone.byteClone         100  avgt   15   251.942 ± 0.122  ns/op
>> ArrayClone.byteClone        1000  avgt   15   407.580 ± 0.318  ns/op
>> ArrayClone.intArraycopy        0  avgt   15    49.984 ± 0.436  ns/op
>> ArrayClone.intArraycopy       10  avgt   15    76.302 ± 1.388  ns/op
>> ArrayClone.intArraycopy      100  avgt   15   267.487 ± 0.329  ns/op
>> ArrayClone.intArraycopy     1000  avgt   15  1157.444 ± 1.588  ns/op
>> ArrayClone.intClone            0  avgt   15    90.130 ± 0.257  ns/op
>> ArrayClone.intClone           10  avgt   15   183.619 ± 0.588  ns/op
>> ArrayClone.intClone          100  avgt   15   296.491 ± 0.246  ns/op
>> ArrayClone.intClone         1000  avgt   15   828.695 ± 1.501  ns/op
>> 
>> -------------------------------------------------------------------------
>> With COH:
>> 
>> Benchmark                 (size)  Mode  Cnt       Score      Error  Un...
>
> Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - also keep overlapping flag
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - Revert RISCV Macro modification
>  - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone
>  - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses
>  - riscv: fix c1 primitive array clone intrinsic regression

Changes requested by galder (Author).

Also, some comments on why these flags are needed when doing array copies would be good for future reference.

src/hotspot/cpu/riscv/c1_LIRGenerator_riscv.cpp line 775:

> 773:   arraycopy_helper(x, &flags, &expected_type);
> 774:   if (x->check_flag(Instruction::OmitChecksFlag)) {
> 775:     flags = (flags & (LIR_OpArrayCopy::unaligned | LIR_OpArrayCopy::overlapping));

The changes in the two files need to be in synch, so I wonder if `LIR_OpArrayCopy::unaligned | LIR_OpArrayCopy::overlapping` could be abstracted away within a function in `LIR_OpArrayCopy`.

So something like this (apologies for any syntactic/semantic errors):


flags = (flags & LIR_OpArrayGopy::get_array_copy_flags());


Then on the other method something like:


((flags & ~(LIR_OpArrayGopy::get_array_copy_flags())) == 0)


Function name is just an example, feel free to suggest some other if you think it fits better.

Thoughts?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25976#pullrequestreview-3037040733
PR Comment: https://git.openjdk.org/jdk/pull/25976#issuecomment-3095714032
PR Review Comment: https://git.openjdk.org/jdk/pull/25976#discussion_r2218510943


More information about the hotspot-compiler-dev mailing list