RFR: 8350095: RISC-V: Refactor string_compare [v2]

Wed Feb 19 02:09:58 UTC 2025

On Tue, 18 Feb 2025 11:15:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review this patch?
>> 
>> Currently, `string_compare` code is a bit complicated, main reasons include:
>> 1. it has 2 piece of code respectively for LU and UL case, this is not necessary, basically LU and UL behaviour quite similar.
>> 2. it mixed LL/UU and LU/UL case together, better to separate them, as they are quite different from each other.
>> 
>> This is not good for code reading and maintaining.
>> 
>> 
>> So, this patch does following refactoring:
>> 1. merge LU and UL code into one, i.e. remove UL code.
>> 2. seperate the code into 2 methods:  LL/UU and LU/UL.
>> 3. some other misc improvement.
>> 
>> I could do the following refactoring in another following pr, as in this pr I'm just moving code and removing code, it's easier to do it and review it. In particular the first one, as it needs to rewrite the existing code for UL/LU case.
>> 1. move alignment code of `generate_compare_long_string_different_encoding` upwards to `string_compare_long_LU`.
>> 2. make `SHORT_STRING` case simpler.
>> 
>> 
>> 
>> Thanks
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix temp registers; move code

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1386:

> 1384: 
> 1385: // Compare longwords
> 1386: void C2_MacroAssembler::string_compare_long_LL_UU(Register result, Register str1, Register str2,

Do you mind renaming this to `C2_MacroAssembler::string_compare_long_same_encoding`?

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1452:

> 1450:   beq(tmp1, tmp2, *DONE);
> 1451: 
> 1452: 

A single empty line will do I think.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1474:

> 1472: 
> 1473: // Compare longwords
> 1474: void C2_MacroAssembler::string_compare_long_LU(Register result, Register strL, Register strU,

And rename this to `C2_MacroAssembler::string_compare_long_different_encoding`. We can pass one extra param (say `const bool isLU`) to distinguish the two different cases. Also I think we need to pass the `str1` and `str2` from the callsite directly as the final difference calculation needs to repect the order. The current approach doesn't seem correct: it can only distinguish L and U from the two strings, but it doesn't know the order of the two strings at all.

Java program that hopefully helps demo the effect of the order of the two strings:

String author = "author";
String book = "book";
String duplicateBook = "book";

assertThat(author.compareTo(book))
  .isEqualTo(-1);
assertThat(book.compareTo(author))
  .isEqualTo(1);
assertThat(duplicateBook.compareTo(book))
  .isEqualTo(0);

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1522:

> 1520:   beq(tmpL, tmpU, *DONE);
> 1521: 
> 1522: 

Similar here. Let's keep a single empty line.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1960822148
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1960821131
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1960826250
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1960821485