RFR: 8326306: RISC-V: Re-structure MASM calls and jumps
Robbin Ehn
rehn at openjdk.org
Fri Apr 26 07:25:33 UTC 2024
On Fri, 26 Apr 2024 06:38:32 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> Hi, please consider.
>>
>> We have code that directly use the asm for call/jumps instead masm.
>> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics.
>> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master)
>>
>> j offset jal x0, offset Jump
>> jal offset jal x1, offset Jump and link
>> jr rs jalr x0, rs, 0 Jump register
>> jalr rs jalr x1, rs, 0 Jump and link register
>> ret jalr x0, x1, 0 Return from subroutine
>> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine
>> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine
>>
>> But these can only be implemented like this if you have small enough application.
>> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable).
>> We don't have GOT, instead we materialize, so there is still differences between these and ours.
>>
>> This patch:
>> - Tries to follow these suggested mappings as good we can.
>> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention)
>> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming.
>> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'.
>> - I enabled c.j, but right now we never generate it.
>> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags))
>>
>> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump.
>> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1)
>> While looking into our calls it was a bit confusing, this helps.
>>
>> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4)
>> Re-running tests, had some last minute changes.
>>
>> Thanks, Robbin
>
> src/hotspot/cpu/riscv/gc/shenandoah/shenandoahBarrierSetAssembler_riscv.cpp line 303:
>
>> 301: target = CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier_weak);
>> 302: }
>> 303: __ rt_call(target);
>
> Question: does it make sense to replace `call` with `rt_call` when we are invoking the VM code (C++ code)? Here is what I see the difference between the two: `rt_call` emits code (`auipc` or `movptr`) depending on whether the destination could be found in code cache, while `call` depends on `is_32bit_offset_from_codeache`. So it's still possible for `call` to emit the short `auipc` code if not far even when the target is not there in the code cache like this case. But `rt_call` will always emit a long `movptr` sequence for this case, which I think is not good in performance.
A couple of point, all calls to VM runtime should use "call_VM_leaf".
E.g.
` __ call_VM_leaf(Continuation::freeze_entry(), 2);`
AFIACT it is only Shenandoah which calls VM is this 'wrong' way.
call_VM_leaf always use mv -> li.
- It would be much better to change call_VM_leaf to use auipc is possible. (and fix Shenandoah to use call_VM_leaf)
- We can probably remove rt_call, and just have call.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1580584271
More information about the hotspot-dev
mailing list