RFR: 8318723: RISC-V: C2 UDivL

Wed Oct 25 09:16:38 UTC 2023

On Wed, 25 Oct 2023 07:34:25 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2436:
>> 
>>> 2434:     } else {
>>> 2435:       Label Lltz, Ldone;
>>> 2436:       bltz(rs2, Lltz);
>> 
>> I am not quite sure what this `bltz` branch is for. Is this a minor performance tunning here? And How would this make a difference then if that's true? I didn't see much difference from the LongDivMod.testDivideUnsigned `negative` jmh test result.
>
> +1. It's also the only test case where there is a regression on the JMH numbers, or at least not a clear improvement (before: 6385.280, after: 6433.223)
> 
> On your JMH numbers, how many iterations have you run for each benchmark? I don't see the standard deviation which would be useful to better understand noise.

`For the algorithm details, check j.l.Long::divideUnsigned` in the jdk lib source, it mentions this algorithm, I also pointed to it in this patch.

It's not related to the difference between negative and positive test cases, it's related to the cost of divxx instructions, compared to the lines between 2440 ~ 2443 in src/hotspot/cpu/riscv/macroAssembler_riscv.cpp, the divu cost for negative value is still very high.

int_def ALU_COST             (  100,  1 * DEFAULT_COST);
int_def BRANCH_COST          (  200,  2 * DEFAULT_COST);
int_def IDIVDI_COST          ( 6600, 66 * DEFAULT_COST);

I have also re-run the benchmark with more warmup (5) and iteration (10), please check the data in pr desc.
I also attach the diff between v1 and v2 intrinsic. v2 is this patch. v1 is diff based on v2, it just use riscv divxx directly without optimization for negative value brong by the algorithm (i.e. without the bltz and related other codes).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371415242