[foreign-abi] RFR: 8248331: Intrinsify downcall handles in C2

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Wed Jul 1 13:05:07 UTC 2020


>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp:
>>
>> + void NativeInvokerGenerator::generate() {
>>
>> Though there's need_transition bit (both on JDK and JVM sides) and it 
>> affects code generation (e.g., in CallNode::guaranteed_safepoint()), 
>> NativeInvokerGenerator::generate() does perform thread state 
>> transition unconditionally. Is it intentional? As it is now, 
>> need_transition=false looks broken.
> 
> Yes, this is intentional.
> 
> NativeInvokerGenerator::generate() adds the thread state transition 
> unconditionally, but it is only used when need_transition is actually 
> set. See the code that creates the call_addr in graphKit:
> 
>    address call_addr = nep->entry_point();
>    if (nep->need_transition()) {
>      call_addr = SharedRuntime::make_native_invoker(call_addr,
> nep->shadow_space(),
> arg_regs, ret_regs);
>      C->add_native_stub(call_addr);
>    }
> 
> In the need_transition=false case, we don't call 
> NativeInvokerGenerator::generate() at all and instead use the target 
> address directly.

Ah, looks good then. Missed that logic in GraphKit::make_native_call().

>> src/hotspot/cpu/x86/macroAssembler_x86.cpp:
>>
>> + void MacroAssembler::spill_register(VMReg reg) {
>> +   assert(reg->is_reg(), "must be a register");
>> +   if (reg->is_Register()) {
>> +     push(reg->as_Register());
>> +   } else if (reg->is_XMMRegister()) {
>> +     subptr(rsp, 16); // 16 bytes
>> +     movdqu(Address(rsp, 0), reg->as_XMMRegister());
>> +   } else {
>> +     ShouldNotReachHere();
>> +   }
>> + }
>>
>> spill_register/fill_register assume that XMMRegister is 16 bytes in 
>> size. Though it may be the case for existing system ABIs, it's not 
>> necessarily the case for custom calling conventions (e.g., vector). 
>> I'd either conservatively operate on the largest vector register size 
>> supported or explicitly record the size of the value and issue moves 
>> of the right size.
> 
> Ok, I've added handling of larger register sizes based on the value of 
> UseAVX. Though, there's probably a lot more surgery needed to fully 
> support other carriers. I've also moved the spill_register/fill_register 
> functions to the NativeInvokerGenerator class as helper functions to 
> make it clear that they should only be used for this special case.

Sounds good. Irrespective of whether corresponding carriers are 
supported, I think it's still the right thing to do since 
NativeInvokerGenerator operates on VMReg which don't have any size 
information associated with them.

Best regards,
Vladimir Ivanov

>> src/hotspot/share/opto/callGenerator.cpp:
>>
>> +       } else {
>> +         // can this happen?
>> +         print_inlining_failure(C, callee, jvms->depth() - 1, 
>> jvms->bci(),
>> +                                "NativeEntryPoint not constant");
>> +       }
>>
>> It does happen in practice when bytecode for a lambda form is compiled 
>> stand-alone. In such case LambdaForm is not a compile-time constant, 
>> so NEP can't be a constant as well.
> 
> Ok, thanks. I'll remove the comment.
> 
> Jorn
> 
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 29.06.2020 14:59, Jorn Vernee wrote:
>>> Hi,
>>>
>>> This patch adds intrinsification of down call handles.
>>>
>>> This is done through a new method handle intrinsic called 
>>> linkToNative. We create a NativeMethodHandle that calls this
>>> intrinsic, which then replaces the leaf method handle in 
>>> ProgrammableInvoker::getBoundMethodHandle, basically replacing
>>> the call to invokeMoves. Before C2 kicks in, this intrinsic will call 
>>> a fallback method handle that we pass to it. The
>>> handle that is passed is a handle that points to 
>>> ProgrammableInvoker::invokeMoves, thus simulating the current
>>> behaviour.
>>>
>>> However, when a call to linkToNative is inlined, C2 will instead 
>>> generate either a direct call to the target function,
>>> or a call to a wrapper stub (which is generated on demand) that also 
>>> does the thread state transitions needed for long
>>> running native functions. Information about ABI, and which registers 
>>> to use are captured in a so-called 'appendix
>>> argument' of the type NativeEntryPoint, which is passed as the last 
>>> argument. This captures all the information needed
>>> to generate the call in C2 (note that previously in the patch shared 
>>> in the discussion thread this information was
>>> split over several classes, but I've condensed the info into just 
>>> NativeEntryPoint in order to reduce the amount of
>>> code needed in the vm to be able to access the information).
>>>
>>> With this, the overhead for downcalls is on par or slightly lower 
>>> than with JNI for calls that need to do thread state
>>> transitions, and it is even lower when the thread state transitions 
>>> are omitted (see that *_trivial runs).
>>>
>>> Benchmark                                            Mode Cnt 
>>> Score    Error  Units
>>> CallOverhead.jni_blank                               avgt 30    8.461 
>>> □  0.892  ns/op
>>> CallOverhead.jni_identity                            avgt   30 12.585 
>>> □  0.066  ns/op
>>> CallOverhead.panama_blank                            avgt 30    8.562 
>>> □  0.029  ns/op
>>> CallOverhead.panama_blank_trivial                    avgt 30    1.372 
>>> □  0.008  ns/op
>>> CallOverhead.panama_identity                         avgt   30 11.813 
>>> □  0.073  ns/op
>>> CallOverhead.panama_identity_trivial                 avgt 30    6.042 
>>> □  0.024  ns/op
>>> Finished running test 'micro:CallOverhead'
>>>
>>> Thanks,
>>> Jorn
>>>
>>> -------------
>>>
>>> Commit messages:
>>>   - Rebase fixes
>>>   - Remove RegionPtr
>>>   - re-write RegionPtr without using std::equal
>>>   - Add missing debug.hpp includes for files using assert
>>>   - Patch cleanup mostly updating copyright headers
>>>   - Reduce VM interface
>>>   - Implement down call intrinsics
>>>
>>> Changes: https://git.openjdk.java.net/panama-foreign/pull/219/files
>>>   Webrev: https://webrevs.openjdk.java.net/panama-foreign/219/webrev.00
>>>    Issue: https://bugs.openjdk.java.net/browse/JDK-8248331
>>>    Stats: 1848 lines in 73 files changed: 1741 ins; 28 del; 79 mod
>>>    Patch: https://git.openjdk.java.net/panama-foreign/pull/219.diff
>>>    Fetch: git fetch https://git.openjdk.java.net/panama-foreign 
>>> pull/219/head:pull/219
>>>
>>> PR: https://git.openjdk.java.net/panama-foreign/pull/219
>>>


More information about the panama-dev mailing list