RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v14]

Feilong Jiang fjiang at openjdk.org
Fri May 12 09:06:52 UTC 2023


On Thu, 11 May 2023 16:41:19 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk.
>> 
>> The patch has two main parts:
>>  - opcodes loads/stores is now using put_native_uX/get_native_uX
>>  - some code in template interp got changed to prevent misaligned loads
>>  
>> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: 
>> 
>>  169598      trp_lam                                          
>>   13562      trp_sam  
>> 
>> 
>> after the patch both numbers are zeroes.
>> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw)
>> 
>> tier testing on hw is in progress
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Create load_long_misaligned and start using it

Some comments for new changes.

src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 188:

> 186:     lhu(reg, Address(xbcp, bcp_offset));
> 187:   }
> 188:   revb_h(reg, reg);

Similiar to `sipush`, `revb_h` is not needed for misaligned load. And since here we only load a short, looks like `revb_h_h_u` is enough.
Suggestion:

  if (AvoidUnalignedAccesses && (bcp_offset % 2)) {
    lbu(t1, Address(xbcp, bcp_offset));
    lbu(reg, Address(xbcp, bcp_offset + 1));
    slli(t1, t1, 8);
    add(reg, reg, t1);
  } else {
    lhu(reg, Address(xbcp, bcp_offset));
    revb_h_h_u(reg, reg);
  }

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1693:

> 1691: }
> 1692: 
> 1693: void MacroAssembler::load_int_misaligned(Register dst, Address src, Register tmp, bool is_signed) {

`load_long_misaligned` provides `granularity`, maybe we add this to `load_int_misaligned` too? If granularity is 2, we can just use two `lh`s to load an int.

-------------

PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1424077484
PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1192106814
PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1192107439


More information about the hotspot-dev mailing list