RFR: 8350095: RISC-V: Refactor string_compare [v3]

Mon Feb 24 03:34:57 UTC 2025

On Fri, 21 Feb 2025 14:48:12 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review this patch?
>> 
>> Currently, `string_compare` code is a bit complicated, main reasons include:
>> 1. it has 2 piece of code respectively for LU and UL case, this is not necessary, basically LU and UL behaviour quite similar.
>> 2. it mixed LL/UU and LU/UL case together, better to separate them, as they are quite different from each other.
>> 
>> This is not good for code reading and maintaining.
>> 
>> 
>> So, this patch does following refactoring:
>> 1. merge LU and UL code into one, i.e. remove UL code.
>> 2. seperate the code into 2 methods:  LL/UU and LU/UL.
>> 3. some other misc improvement.
>> 
>> I could do the following refactoring in another following pr, as in this pr I'm just moving code and removing code, it's easier to do it and review it. In particular the first one, as it needs to rewrite the existing code for UL/LU case.
>> 1. move alignment code of `generate_compare_long_string_different_encoding` upwards to `string_compare_long_LU`.
>> 2. make `SHORT_STRING` case simpler.
>> 
>> 
>> 
>> Thanks
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - fix UL and test
>  - Merge branch 'master' into refactor-string-compare
>  - minor
>  - fix temp registers; move code
>  - blank lines
>  - simplify
>  - clean
>  - merge UL and LU
>  - move to functions
>  - move alignment code of LL&UU down from common code path
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/70ff57cd...4f5ae272

Thanks for the update. A couple of comments remain.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1605:

> 1603:       string_compare_long_same_encoding(result, str1, str2,
> 1604:                                 cnt1, cnt2, tmp1, tmp2, tmp3,
> 1605:                                 isLL, isLL ? base_offset1 : base_offset2, minCharsInWord,

`base_offset` is only used by the two new assembler subroutines, so it's more reasonable to calculate there. And `minCharsInWord` is simple and I think this param could also be saved. This way the argument list will be shorter.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1611:

> 1609:                              isLU ? str1 : str2,
> 1610:                              isLU ? str2 : str1,
> 1611:                              isUL,

Both `isLU` and `isUL` are used to prepare the params. Personally, I prefer to keep it simple and only use `isLU` here. Using `isLU` will also be kind of consistent with the order of the second and third params of this subroutine (renamed param `isUL` to `IsLU`):

void C2_MacroAssembler::string_compare_long_different_encoding(Register result, Register strL, Register strU,
                                               bool isLU, Register cnt1, Register cnt2,

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1618:

> 1616:   }
> 1617: 
> 1618:   bind(STUB);

Consider moving this STUB generation code to the new subroutines at the same time.

-------------

PR Review: https://git.openjdk.org/jdk/pull/23633#pullrequestreview-2635951599
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1967002792
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1967004147
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1967006761