RFR: 8296602: RISC-V: improve performance of copy_memory stub [v2]

Fei Yang fyang at openjdk.org
Thu Nov 17 07:34:31 UTC 2022


On Wed, 9 Nov 2022 11:34:43 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> Please review this change to improve the performance of copy_memory stub on risc-v
>> 
>> This change has three parts
>> 1) use copy32 if possible to do 4 ld and 4 st per loop cycle
>> 2) don't produce precopy code if is_aligned is true, it's not executed.
>> 3) in the end of loop8 and loop32, remove data dependency between two addi opcodes, to allow them to be scheduled simultaneously
>> 
>> testing: org.openjdk.bench.vm.compiler.ArrayCopyObject, hotspot_compiler_arraycopy, hotspot:tier1, hotspot:tier2 - all ok
>> hotspot:tier2 is on the way.
>> 
>> and for the benchmark results, using 
>> org.openjdk.bench.vm.compiler.ArrayCopyObject.conjoint_micro
>> 
>> thead rvb-ice c910
>> thead
>> 
>> Before ( copy8 only )
>> Benchmark            	   (size) Mode Cnt  Score  Error  Units
>> ArrayCopyObject.conjoint_micro    31 thrpt 25 6653.095 ± 251.565 ops/ms
>> ArrayCopyObject.conjoint_micro    63 thrpt 25 4933.970 ± 77.559 ops/ms
>> ArrayCopyObject.conjoint_micro   127 thrpt 25 3627.454 ± 34.589 ops/ms
>> ArrayCopyObject.conjoint_micro  2047 thrpt 25 368.249 ± 0.453  ops/ms
>> ArrayCopyObject.conjoint_micro  4095 thrpt 25 187.776 ± 0.306  ops/ms
>> ArrayCopyObject.conjoint_micro  8191 thrpt 25  94.477 ± 0.340   ops/ms
>> 
>> after ( with copy32 )
>> ArrayCopyObject.conjoint_micro    31 thrpt 25 7620.546 ± 69.756 ops/ms
>> ArrayCopyObject.conjoint_micro    63 thrpt 25 6677.978 ± 33.112 ops/ms
>> ArrayCopyObject.conjoint_micro   127 thrpt 25 5206.973 ± 22.612 ops/ms
>> ArrayCopyObject.conjoint_micro  2047 thrpt 25 653.655 ± 31.494 ops/ms
>> ArrayCopyObject.conjoint_micro  4095 thrpt 25 352.905 ± 7.390  ops/ms
>> ArrayCopyObject.conjoint_micro  8191 thrpt 25 165.127 ± 0.832  ops/ms 
>> 
>> after ( copy32 with dead code elimination and independent addis )
>> ArrayCopyObject.conjoint_micro      31  thrpt   25  7576.346 ?  94.487  ops/ms
>> ArrayCopyObject.conjoint_micro      63  thrpt   25  6475.730 ? 252.590  ops/ms
>> ArrayCopyObject.conjoint_micro     127  thrpt   25  5221.764 ?  20.415  ops/ms
>> ArrayCopyObject.conjoint_micro    2047  thrpt   25   691.847 ?   1.102  ops/ms
>> ArrayCopyObject.conjoint_micro    4095  thrpt   25   360.269 ?   1.091  ops/ms
>> ArrayCopyObject.conjoint_micro    8191  thrpt   25   179.733 ?   3.012  ops/ms
>> 
>> on hifive unmatched:
>> 
>> before:
>> Benchmark                       (size)   Mode  Cnt     Score     Error   Units
>> ArrayCopyObject.conjoint_micro      31  thrpt   25  5391.575 ± 152.984  ops/ms
>> ArrayCopyObject.conjoint_micro      63  thrpt   25  3700.946 ±  43.175  ops/ms
>> ArrayCopyObject.conjoint_micro     127  thrpt   25  2316.160 ±  24.734  ops/ms
>> ArrayCopyObject.conjoint_micro    2047  thrpt   25   188.616 ±   0.151  ops/ms
>> ArrayCopyObject.conjoint_micro    4095  thrpt   25    95.323 ±   0.053  ops/ms
>> ArrayCopyObject.conjoint_micro    8191  thrpt   25    46.935 ±   0.041  ops/ms
>> 
>> after:
>> Benchmark                       (size)   Mode  Cnt     Score     Error   Units
>> ArrayCopyObject.conjoint_micro      31  thrpt   25  6136.169 ± 330.409  ops/ms
>> ArrayCopyObject.conjoint_micro      63  thrpt   25  4924.020 ±  78.529  ops/ms
>> ArrayCopyObject.conjoint_micro     127  thrpt   25  3732.561 ±  89.606  ops/ms
>> ArrayCopyObject.conjoint_micro    2047  thrpt   25   431.103 ±   0.505  ops/ms
>> ArrayCopyObject.conjoint_micro    4095  thrpt   25   221.543 ±   0.363  ops/ms
>> ArrayCopyObject.conjoint_micro    8191  thrpt   25   100.586 ±   0.197  ops/ms
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove excessive comments

Nice numbers! Overall looks good to me. Several minor nits.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 902:

> 900:    * if ((dst % 8) == (src % 8)) {
> 901:    *   aligned;
> 902:    *   goto copy_big;

You might want to update code comment at line 884 too.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1042:

> 1040:     __ bind(copy32);
> 1041:     if (is_backwards) {
> 1042:       __ addi(src, src, -wordSize*4);

Can you leave a space before and after the operator here and other places?

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1059:

> 1057:       __ addi(dst, dst, wordSize*4);
> 1058:     }
> 1059:     __ addi(tmp4, cnt, -(32+wordSize*4));

Can we use 'tmp' instead of 'tmp4' here? Then it will be consistent in register usage with other places where we check 'cnt'.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1065:

> 1063:     __ beqz(cnt, done); // if that's all - done
> 1064: 
> 1065:     __ addi(tmp4, cnt, -8); // if not - copy the reminder

Similar here.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1079:

> 1077:       __ addi(dst, dst, wordSize);
> 1078:     }
> 1079:     __ addi(tmp4, cnt, -(8+wordSize));

Similar here.

-------------

PR: https://git.openjdk.org/jdk/pull/11058


More information about the hotspot-compiler-dev mailing list