RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12]

Fei Yang fyang at openjdk.org
Tue May 14 02:41:03 UTC 2024


On Mon, 13 May 2024 10:20:30 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi, please consider.
>> 
>> We have code that directly use the asm for call/jumps instead masm.
>> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics.
>> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master)
>> 
>> j offset	jal x0, offset	Jump
>> jal offset	jal x1, offset	Jump and link
>> jr rs	        jalr x0, rs, 0	Jump register
>> jalr rs	        jalr x1, rs, 0	Jump and link register
>> ret	        jalr x0, x1, 0	Return from subroutine
>> call offset	auipc x1, offset[31:12]; jalr x1, x1, offset[11:0]	Call far-away subroutine	
>> tail offset	auipc x6, offset[31:12]; jalr x0, x6, offset[11:0]	Tail call far-away subroutine
>> 
>> But these can only be implemented like this if you have small enough application.
>> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable).
>> We don't have GOT, instead we materialize, so there is still differences between these and ours.
>> 
>> This patch:
>> - Tries to follow these suggested mappings as good we can.
>> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention)
>> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming.
>>   E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'.
>> - I enabled c.j, but right now we never generate it.
>> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags))
>> 
>> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump.
>> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1)
>> While looking into our calls it was a bit confusing, this helps. 
>> 
>> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4)
>> Re-running tests, had some last minute changes.
>> 
>> Thanks, Robbin
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use la() instead movptr where ok.

So now `MacroAssembler::rt_call` is effectively this after this PR change:

void MacroAssembler::rt_call(address dest, Register tmp) {
  RuntimeAddress target(dest);
  relocate(target.rspec(), [&] {
    int32_t offset;
    la(tmp, target.target(), offset);
    jalr(tmp, offset);
  });
}


Here are some of my thoughts on the next step/cleanup for reference:

1. For `MacroAssembler::far_call` and `MacroAssembler::far_jump`, I would suggest we use direct `auipc`
instead of `la` for them as the destination is expected to be in code cache. This will help distinguish
these two functions from `MacroAssembler::rt_call`.

2. Since there are only a few uses of `MacroAssembler::call`, we might want to remove this function
replacing its callsites with `MacroAssembler::rt_call`. This would help avoid possible confusion
between `call` and `rt_call`. I don't think the relocation added by `rt_call` would make a difference.

3. As we discussed in this PR, we also want to fix Shenandoah related callsites to `call_VM_leaf`. And let `call_VM_leaf` to use `rt_call` so that we emit `auipc` when possible, which should be better in performance.

4. We might want to turn some other places in the code where we do `mv + jalr` into a `rt_call`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2109162337


More information about the hotspot-dev mailing list