RFR: 8285487: Do not generate trampolines for runtime calls if they are not needed [v3]

Fri Apr 29 22:40:32 UTC 2022

On Fri, 29 Apr 2022 21:45:24 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> Hi Vladimir,
>> Thank you for the detailed comment. 
>> 
>>> The optimization is that for other calls (inside CodeCache) you need only 2 types of call: far and near. And  MacroAssembler::far_call() handles it correctly.
>> 
>> Yes., I've checked the uses of `trampoline_call`. There are no calls outside CodeCache. They must be replaced with `far_call`. I did this and got a crash of the debug build. Debug builds reduce the branch range from 128M to 2M to be able to test trampolines. Debug builds with the 128M branch range work. With the 2M branch range the debug build fails at:
>> 
>> ScopeDesc* CompiledMethod::scope_desc_at(address pc) {
>>    PcDesc* pd = pc_desc_at(pc);
>>    guarantee(pd != NULL, "scope must be present");
>>    return new ScopeDesc(this, pd);
>> }
>> 
>> with the stack:
>> 
>> V  [libjvm.so+0xa8f138]  CompiledMethod::scope_desc_at(unsigned char*)+0x40
>> V  [libjvm.so+0x1567290]  compiledVFrame::compiledVFrame(frame const*, RegisterMap const*, JavaThread*, CompiledMethod*)+0xc8
>> V  [libjvm.so+0x155f1f0]  vframe::new_vframe(frame const*, RegisterMap const*, JavaThread*)+0xd8
>> V  [libjvm.so+0xaeb574]  Deoptimization::uncommon_trap_inner(JavaThread*, int)+0x1b4
>> V  [libjvm.so+0xaeca44]  Deoptimization::uncommon_trap(JavaThread*, int, int)+0x24
>> 
>> 
>>> I would like to know why trampoline_call() could be used for relocInfo::virtual_call_type and relocInfo::opt_virtual_call_type (based on assert). They should be inside CodeCache and you don't need trampoline for it. Right? 
>> 
>> If CodeCashe is bigger than 128M, we need a trampoline for them.
>> We generate:
>> 
>> bl trampoline
>> trampoline:
>> 
>> Initially, we have a call of `resolve_virtual_call_stub`. If the stub finds a compiled method, it patches:
>> - `bl` if the found method is near.
>> - `trampoline` if the found method is far.
>> For the 240M CodeCache, we patch `trampoline` when a C2 compiled method calls a C1 compiled method or vice verse. When a C2 compiled method calls a C2 compiled method or a C1 compiled method calls a C1 compiled method, we have a normal direct call.
>> If we used a `far_call` there, we would always have an indirect call because a call site would be:
>> 
>> adrp
>> add
>> blr
>> 
>> So the use of `trampoline_call` here is an optimization.
>> 
>>>I don't understand how call to trampoline is handled (lines 603-607) they do not reference stub value returned by emit_trampoline_stub(). Why not use far_call() to call trampoline code?
>> 
>> And they should not. Here it is `CodeBuffer`, calls are not linked yet. Trampoline code is put into the stub code section of `CodeBuffer`. We insert a self-calling instruction. When we move the generated code into `CodeCache` we link the calls by patching them based on `relocInfo` records. I think during linking we try to bypass trampolines if possible.
>
> Yes, we try to bypass trampolines during moving code into `CodeCache`:
> 
> void Relocation::pd_set_call_destination(address x) {
>   assert(is_call(), "should be a call here");
>   if (NativeCall::is_call_at(addr())) {
>     address trampoline = nativeCall_at(addr())->get_trampoline();
>     if (trampoline) {
>       nativeCall_at(addr())->set_destination_mt_safe(x, /* assert_lock */false);
>       return;
>     }
>   }
>   MacroAssembler::pd_patch_instruction(addr(), x);
>   assert(pd_call_destination(addr()) == x, "fail in reloc");
> }
> 
> void NativeCall::set_destination_mt_safe(address dest, bool assert_lock) {
> ....
>   ResourceMark rm;
>   int code_size = NativeInstruction::instruction_size;
>   address addr_call = addr_at(0);
>   bool reachable = Assembler::reachable_from_branch_at(addr_call, dest);
> ...
>   // Patch the constant in the call's trampoline stub.
>   address trampoline_stub_addr = get_trampoline();
> ...
>   // Patch the call.
>   if (reachable) {
>     set_destination(dest);
>   } else {
>     assert (trampoline_stub_addr != NULL, "we need a trampoline");
>     set_destination(trampoline_stub_addr);
>   }
> ...
> }

So my assumption that `trampoline_call()` could be used for call outside CodeCache is wrong (and the method's comment is correct). I looked on few places where runtime call is generated and `trampoline_call()` indeed is used only for CodeCache (at least in places are looked on):
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L3858

My points 1. and 2. are void. 

Thank you for answering my questions 3. and 4.

That also explains your `need_trampoline` code change.

You answered all my questions. Please, do latest merge and updates if needed so I can review final changes.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8403