RFR: 8350095: RISC-V: Refactor string_compare [v7]

Fri Feb 28 01:27:05 UTC 2025

On Wed, 26 Feb 2025 12:00:47 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review this patch?
>> 
>> Currently, `string_compare` code is a bit complicated, main reasons include:
>> 1. it has 2 piece of code respectively for LU and UL case, this is not necessary, basically LU and UL behaviour quite similar.
>> 2. it mixed LL/UU and LU/UL case together, better to separate them, as they are quite different from each other.
>> 
>> This is not good for code reading and maintaining.
>> 
>> 
>> So, this patch does following refactoring:
>> 1. merge LU and UL code into one, i.e. remove UL code.
>> 2. seperate the code into 2 methods:  LL/UU and LU/UL.
>> 3. some other misc improvement.
>> 
>> I could do the following refactoring in another following pr, as in this pr I'm just moving code and removing code, it's easier to do it and review it. In particular the first one, as it needs to rewrite the existing code for UL/LU case.
>> 1. move alignment code of `generate_compare_long_string_different_encoding` upwards to `string_compare_long_LU`.
>> 2. make `SHORT_STRING` case simpler.
>> 
>> 
>> 
>> Thanks
>
> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - check short string
>  - rename

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1393:

> 1391: 
> 1392:   const int base_offset = isLL ? arrayOopDesc::base_offset_in_bytes(T_BYTE)
> 1393:                                : arrayOopDesc::base_offset_in_bytes(T_CHAR);

Sorry, But I don't think we need to distinuish `T_BYTE` and `T_CHAR` here. The reason is that the character storage used for both Latin1 and UTF16 strings is always a byte array [1]. So we should always use `T_BYTE` for both cases.

[1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L160

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1487:

> 1485:   Label TAIL, NEXT_WORD, DIFFERENCE;
> 1486: 
> 1487:   const int base_offset = arrayOopDesc::base_offset_in_bytes(T_CHAR);

Similar here. Use `T_BYTE` instead of `T_CHAR`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1974591975
PR Review Comment: https://git.openjdk.org/jdk/pull/23633#discussion_r1974592263