RFR: 8332689: RISC-V: Use load instead of trampolines [v11]

Hamlin Li mli at openjdk.org
Fri Jun 14 15:13:30 UTC 2024


On Thu, 13 Jun 2024 17:26:35 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits:
> 
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Remove tmp file
>  - Prepare for dynamic NativeCall size
>  - Only allow one calling convetion, i.e. fixed sized
>  - Merge branch 'master' into 8332689
>  - Review comments
>  - Move shart/far code to cpp
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 109:

> 107: 
> 108: //-----------------------------------------------------------------------------
> 109: // NativeShortCall

Both Far and Short call here are named `patchable far calls` in the comment in macroAssembler_riscv.hpp.
So, it will be helpful to unify the naming.

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 169:

> 167:   address addr = addr_at(0);
> 168:   if (NativeShortCall::is_at(addr)) {
> 169:     NativeShortCall* call = NativeShortCall::at(addr);

Are these lines necessary? As this is an instance method (rather than static), so `NativeShortCall::is_at(addr)` must already be true?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 198:

> 196:   Assembler::patch(pInsn, 30, 21, (offset >> 1) & 0x3ff);
> 197:   Assembler::patch(pInsn, 20, 20, (offset >> 11) & 0x1);
> 198:   Assembler::patch(pInsn, 19, 12, (offset >> 12) & 0xff);

should we reuse `MacroAssembler::pd_patch_instruction_size`?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 248:

> 246: }
> 247: 
> 248: bool NativeShortCall::reloc_set_destination(address dest) {

`reloc_set_destination` and `set_destination_mt_safe` are almost same, maybe `set_destination_mt_safe` could call `reloc_set_destination`?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 257:

> 255:     assert(!NativeShortCallTrampolineStub::is_at(dest), "chained trampolines");
> 256:     NativeShortCallTrampolineStub::at(trampoline_stub_addr)->set_destination(dest);
> 257:   }

Maybe move these lines into `else` block below? as `Assembler::reachable_from_branch_at(call_addr, dest)` condition check does not depends on these `trampoline_stub_addr` related check & set.

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 382:

> 380: }
> 381: 
> 382: address NativeFarCall::reloc_destination(address orig_address) {

argument `orig_address` is not used

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 387:

> 385:   CodeBlob *code = CodeCache::find_blob(call_addr);
> 386:   assert(code != nullptr, "Could not find the containing code blob");
> 387:   address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code);

should there be an assert like `assert(code->is_nmethod())`?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 392:

> 390:     stub_addr = MacroAssembler::target_addr_for_insn(call_addr);
> 391:   }
> 392:   return stub_addr;

Naming here is confusing, as the returned value is not stub addr, but target addr of a jump.
Suggestion:

  if (stub_addr != nullptr) {
    return MacroAssembler::target_addr_for_insn(call_addr);
  }
  return nullptr;

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 410:

> 408: }
> 409: 
> 410: bool NativeFarCall::set_destination_mt_safe(address dest, bool assert_lock) {

Seems no caller will pass `assert_lock  == false`

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 410:

> 408: }
> 409: 
> 410: bool NativeFarCall::set_destination_mt_safe(address dest, bool assert_lock) {

For NativeShortCall, reloc_set_destination and set_destination_mt_safe are almost same, but for NativeFarCall they're different, is this expected?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 423:

> 421: 
> 422:   if (stub_addr != nullptr) {
> 423:     set_stub_address_destination_at(stub_addr, dest);

Is `ICache::invalidate_range` needed here?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 430:

> 428: }
> 429: 
> 430: bool NativeFarCall::reloc_set_destination(address dest) {

argument `dest` is not used.

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 436:

> 434:   CodeBlob *code = CodeCache::find_blob(call_addr);
> 435:   assert(code != nullptr, "Could not find the containing code blob");
> 436:   address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code);

should there be an assert like `assert(code->is_nmethod())`?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 438:

> 436:   address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code);
> 437: 
> 438:   if (stub_addr != nullptr) {

Could `stub_addr == nullptr`? If positive, then it should return false when it's nullptr, if negative, then should the `if` be converted to an `assert`?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 439:

> 437: 
> 438:   if (stub_addr != nullptr) {
> 439:     MacroAssembler::pd_patch_instruction_size(call_addr, stub_addr);

I could be wrong. `stub_addr` should be `dest`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638848549
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639489946
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639598854
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639653121
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639544121
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638764917
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638834002
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638840966
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639630967
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639654496
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639648835
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638767104
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638834176
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638873919
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639597667


More information about the hotspot-dev mailing list