RFR: 8332689: RISC-V: Use load instead of trampolines [v5]

Robbin Ehn rehn at openjdk.org
Wed Jun 5 08:08:17 UTC 2024


> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

  Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19453/files
  - new: https://git.openjdk.org/jdk/pull/19453/files/c4c02f2e..193a9343

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=03-04

  Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/19453.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453

PR: https://git.openjdk.org/jdk/pull/19453


More information about the hotspot-dev mailing list