RFR: 8318723: RISC-V: C2 UDivL
Ludovic Henry
luhenry at openjdk.org
Wed Oct 25 13:17:35 UTC 2023
On Wed, 25 Oct 2023 09:13:59 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> `For the algorithm details, check j.l.Long::divideUnsigned` in the jdk lib source, it mentions this algorithm, I also pointed to it in this patch.
>>
>> It's not related to the difference between negative and positive test cases, it's related to the cost of divxx instructions, compared to the lines between 2440 ~ 2443 in src/hotspot/cpu/riscv/macroAssembler_riscv.cpp, the divu cost for negative value is still very high.
>>
>>
>> int_def ALU_COST ( 100, 1 * DEFAULT_COST);
>> int_def BRANCH_COST ( 200, 2 * DEFAULT_COST);
>> int_def IDIVDI_COST ( 6600, 66 * DEFAULT_COST);
>>
>>
>> I have also re-run the benchmark with more warmup (5) and iteration (10), please check the data in pr desc.
>> I also attach the diff between v1 and v2 intrinsic. v2 is this patch. v1 is diff based on v2, it just use riscv divxx directly without optimization for negative value brong by the algorithm (i.e. without the bltz and related other codes).
>
> I don't know why the previous jmh data has no `error` part, maybe because it's too low to show.
IIUC, with the branch, the results are `6376.674 ± 16.869 ns/op`, and without the branch, they are `29518.033 ± 49.056`, correct? If so, the branch makes more sense, at least of the board you've tested.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371743833
More information about the hotspot-dev
mailing list