RFR: 8332689: RISC-V: Use load instead of trampolines [v2]
Robbin Ehn
rehn at openjdk.org
Mon Jun 3 12:55:17 UTC 2024
> Hi all, please consider!
>
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
>
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
>
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
>
>
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
>
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
>
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
>
> An experimental option for turning trampolines back on exists.
>
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>
> fop (msec) 2239 | 2128 = 0.950424
> h2 (msec) 18660 | 16594 = 0.889282
> jython (msec) 22022 | 21925 = 0.995595
> luindex (msec) 2866 | 2842 = 0.991626
> lusearch (msec) 4108 | 4311 = 1.04942
> lusearch-fix (msec) 4406 | 4116 = 0.934181
> pmd (msec) 5976 | 5897 = 0.98678
> jython (msec) 22022 | 21925 = 0.995595
> Avg: 0.974112
> fop(xcomp) (msec) 2721 | 2714 = 0.997427
> h2(xcomp) (msec) 37719 | 38004 = 1.00756
> jython(xcomp) ...
Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
- Merge branch 'master' into 8332689
- Remove accidental files
- Remove accidental files
- Baseline
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/19453/files
- new: https://git.openjdk.org/jdk/pull/19453/files/41e576cc..3c5db819
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=00-01
Stats: 9920 lines in 326 files changed: 5604 ins; 3137 del; 1179 mod
Patch: https://git.openjdk.org/jdk/pull/19453.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453
PR: https://git.openjdk.org/jdk/pull/19453
More information about the hotspot-dev
mailing list