RFR: 8332689: RISC-V: Use load instead of trampolines [v23]

Fei Yang fyang at openjdk.org
Wed Jul 10 20:12:07 UTC 2024


On Thu, 4 Jul 2024 14:48:36 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   _ld to ld

Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine.
Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise.


diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index b39ac79be6b..e349eab3177 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
   assert_cond(source != nullptr);
   int64_t distance = source - pc();
   assert(is_simm32(distance), "Must be");
-  Assembler::auipc(temp, (int32_t)distance + 0x800);
-  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);
-  Assembler::jalr(x1, temp, 0);
+  auipc(temp, (int32_t)distance + 0x800);
+  ld(temp, Address(temp, ((int32_t)distance << 20) >> 20));
+  jalr(temp);
 }

 void MacroAssembler::jump_link(const address dest, Register temp) {
@@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) {
   int64_t distance = dest - pc();
   assert(is_simm21(distance), "Must be");
   assert((distance % 2) == 0, "Must be");
-  Assembler::jal(x1, distance);
+  jal(x1, distance);
 }

 void MacroAssembler::j(const address dest, Register temp) {

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2220677274


More information about the hotspot-dev mailing list