RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12]
Fei Yang
fyang at openjdk.org
Tue May 14 02:41:03 UTC 2024
On Mon, 13 May 2024 10:20:30 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> Hi, please consider.
>>
>> We have code that directly use the asm for call/jumps instead masm.
>> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics.
>> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master)
>>
>> j offset jal x0, offset Jump
>> jal offset jal x1, offset Jump and link
>> jr rs jalr x0, rs, 0 Jump register
>> jalr rs jalr x1, rs, 0 Jump and link register
>> ret jalr x0, x1, 0 Return from subroutine
>> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine
>> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine
>>
>> But these can only be implemented like this if you have small enough application.
>> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable).
>> We don't have GOT, instead we materialize, so there is still differences between these and ours.
>>
>> This patch:
>> - Tries to follow these suggested mappings as good we can.
>> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention)
>> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming.
>> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'.
>> - I enabled c.j, but right now we never generate it.
>> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags))
>>
>> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump.
>> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1)
>> While looking into our calls it was a bit confusing, this helps.
>>
>> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4)
>> Re-running tests, had some last minute changes.
>>
>> Thanks, Robbin
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
> Use la() instead movptr where ok.
So now `MacroAssembler::rt_call` is effectively this after this PR change:
void MacroAssembler::rt_call(address dest, Register tmp) {
RuntimeAddress target(dest);
relocate(target.rspec(), [&] {
int32_t offset;
la(tmp, target.target(), offset);
jalr(tmp, offset);
});
}
Here are some of my thoughts on the next step/cleanup for reference:
1. For `MacroAssembler::far_call` and `MacroAssembler::far_jump`, I would suggest we use direct `auipc`
instead of `la` for them as the destination is expected to be in code cache. This will help distinguish
these two functions from `MacroAssembler::rt_call`.
2. Since there are only a few uses of `MacroAssembler::call`, we might want to remove this function
replacing its callsites with `MacroAssembler::rt_call`. This would help avoid possible confusion
between `call` and `rt_call`. I don't think the relocation added by `rt_call` would make a difference.
3. As we discussed in this PR, we also want to fix Shenandoah related callsites to `call_VM_leaf`. And let `call_VM_leaf` to use `rt_call` so that we emit `auipc` when possible, which should be better in performance.
4. We might want to turn some other places in the code where we do `mv + jalr` into a `rt_call`.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2109162337
More information about the hotspot-dev
mailing list